GSE78960 Processing Pipeline
RNA-Seq
code_examples
3 steps
Publication
Genomic analysis of the molecular neuropathology of tuberous sclerosis using a human stem cell model.Genome medicine (2016) — PMID 27655340
Dataset
GSE78960Modeling the Neuropathology of Tuberous Sclerosis with Human Stem Cells Reveals a Role for Inflammation and Angiogenic Growth Factors [Treatment]
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Linker tags were removed from RNA sequencing and ribosome profiling reads by the FASTX Toolkit, v0.0.13 (http://hannonlab.cshl.edu/fastx_toolkit/)
$ Bash example
bash # Install FASTX Toolkit (if not already installed) # conda install -c bioconda fastx_toolkit # Define input and output file names INPUT_FASTQ="input_reads.fastq" OUTPUT_FASTQ="reads_linker_removed.fastq" # Placeholder for the linker tag sequence. This needs to be replaced with the actual linker sequence. # Example: ADAPTER_SEQUENCE="AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC" LINKER_SEQUENCE="YOUR_LINKER_TAG_SEQUENCE" # Remove linker tags using fastx_clipper # -a: adapter sequence to clip # -i: input FASTQ file # -o: output FASTQ file # -Q 33: specify quality score format (Phred+33, common for Illumina) fastx_clipper -a "${LINKER_SEQUENCE}" -i "${INPUT_FASTQ}" -o "${OUTPUT_FASTQ}" -Q 33 -
2
All reads that mapped to rRNAs, tRNAs or mitochondrial rRNAs were removed, and the remaining reads were mapped to RefSeq (v38) by TopHat v2.0.13.
$ Bash example
# Create a directory for reference data mkdir -p ref_data # Download GRCh38.p13 genome (RefSeq assembly GCF_000001405.39) # wget -P ref_data https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/397/GRCh38.p13/GCF_000001405.39_GRCh38.p13_genomic.fna.gz # gunzip ref_data/GCF_000001405.39_GRCh38.p13_genomic.fna.gz # mv ref_data/GCF_000001405.39_GRCh38.p13_genomic.fna ref_data/GRCh38.p13_genomic.fna # Build Bowtie2 index for GRCh38.p13 (TopHat v2 uses Bowtie2 by default) # bowtie2-build ref_data/GRCh38.p13_genomic.fna ref_data/grch38_refseq_index # Download RefSeq GTF for GRCh38.p13 # wget -P ref_data https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/397/GRCh38.p13/GCF_000001405.39_GRCh38.p13_genomic.gtf.gz # gunzip ref_data/GCF_000001405.39_GRCh38.p13_genomic.gtf.gz # mv ref_data/GCF_000001405.39_GRCh38.p13_genomic.gtf ref_data/grch38_refseq.gtf # Installation of TopHat # conda install -c bioconda tophat=2.0.13 # Define input and output INPUT_READS="filtered_reads.fastq" # Placeholder for reads after removing rRNA/tRNA/mito OUTPUT_DIR="tophat_output" GENOME_INDEX_PREFIX="ref_data/grch38_refseq_index" GTF_FILE="ref_data/grch38_refseq.gtf" # Create output directory mkdir -p "${OUTPUT_DIR}" # Execute TopHat tophat -o "${OUTPUT_DIR}" \ -G "${GTF_FILE}" \ "${GENOME_INDEX_PREFIX}" \ "${INPUT_READS}" -
3
Finally all read counts that mapped uniquely to genes were extracted for expression analysis with the help of samtools, v1.1.
$ Bash example
# Install samtools (if not already installed) # conda install -c bioconda samtools=1.1 # Extract read counts per reference sequence (chromosome/contig). # This command provides counts for mapped reads, unmapped reads, and number of bases per reference. # While samtools idxstats counts all mapped reads per reference, the description "mapped uniquely to genes" # implies further filtering or interpretation that might be handled by downstream tools or custom scripts # not explicitly mentioned here. For true gene-level unique counts, tools like featureCounts or htseq-count # are typically used on a BAM file that has been filtered for unique alignments (e.g., using samtools view -F 0x100 -F 0x4). # Replace 'input.bam' with your actual alignment file. # Replace 'output_read_counts.txt' with your desired output file name. # A placeholder reference genome (e.g., hg38) is assumed for context, though not directly used by idxstats. samtools idxstats input.bam > output_read_counts.txt
Raw Source Text
Linker tags were removed from RNA sequencing and ribosome profiling reads by the FASTX Toolkit, v0.0.13 (http://hannonlab.cshl.edu/fastx_toolkit/) All reads that mapped to rRNAs, tRNAs or mitochondrial rRNAs were removed, and the remaining reads were mapped to RefSeq (v38) by TopHat v2.0.13. Finally all read counts that mapped uniquely to genes were extracted for expression analysis with the help of samtools, v1.1. Genome_build: GRCh37.p13 Supplementary_files_format_and_content: .txt files report raw read counts that mapped uniquely to genes