GSE78961 Processing Pipeline

GSE code_examples 3 steps

Publication

Genomic analysis of the molecular neuropathology of tuberous sclerosis using a human stem cell model.

Genome medicine (2016) — PMID 27655340

Dataset

Modeling the Neuropathology of Tuberous Sclerosis with Human Stem Cells Reveals a Role for Inflammation and Angiogenic Growth Factors

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Linker tags were removed from RNA sequencing and ribosome profiling reads by the FASTX Toolkit, v0.0.13 (http://hannonlab.cshl.edu/fastx_toolkit/)

RNA-seq v0.0.13 GitHub

$ Bash example

# Install FASTX Toolkit (if not already installed)
# conda install -c bioconda fastx_toolkit

# Placeholder for linker sequence. This needs to be provided based on experimental design.
# Example linker sequence, replace with actual if known.
LINKER_SEQUENCE="AGATCGGAAGAG"

# Placeholder for input and output file names.
INPUT_FILE="raw_reads.fastq"
OUTPUT_FILE="linker_trimmed_reads.fastq"

# Remove linker tags using fastx_clipper
# -a: linker sequence to clip
# -i: input FASTQ file
# -o: output FASTQ file
fastx_clipper -a "${LINKER_SEQUENCE}" -i "${INPUT_FILE}" -o "${OUTPUT_FILE}"

View on GitHub

All reads that mapped to rRNAs, tRNAs or mitochondrial rRNAs were removed, and the remaining reads were mapped to RefSeq (v38) by TopHat v2.0.13.

TopHat v2.0.13 GitHub

$ Bash example

# Install TopHat (if not already installed)
# conda install -c bioconda tophat=2.0.13

# Define variables
# Replace with actual paths to your reference files and input reads
GENOME_FASTA="/path/to/GRCh38.fa" # Example: Human genome GRCh38 (RefSeq v38 implies GRCh38)
GTF_FILE="/path/to/GRCh38_RefSeq.gtf" # Example: RefSeq annotations for GRCh38
BOWTIE2_INDEX_PREFIX="/path/to/GRCh38_bowtie2_index/GRCh38" # Prefix for Bowtie2 index files
INPUT_READS="filtered_reads.fastq" # Reads after removal of r/t/mtRNA mappings
OUTPUT_DIR="tophat_output"
NUM_THREADS=8 # Example: Number of threads to use

# Build Bowtie2 index (if not already built)
# This step is usually done once for a given genome
# bowtie2-build $GENOME_FASTA $BOWTIE2_INDEX_PREFIX

# Create output directory
mkdir -p $OUTPUT_DIR

# Map reads to RefSeq (v38) using TopHat
# -p: number of threads
# -G: GTF file for known transcripts (improves splice junction detection)
# $BOWTIE2_INDEX_PREFIX: Path to the Bowtie2 index
# $INPUT_READS: Input FASTQ file (assumed to be single-end; for paired-end, use -1 and -2)
tophat2 -p $NUM_THREADS -G $GTF_FILE -o $OUTPUT_DIR $BOWTIE2_INDEX_PREFIX $INPUT_READS

View on GitHub

Finally all read counts that mapped uniquely to genes were extracted for expression analysis with the help of samtools, v1.1.

samtools v1.1 GitHub

$ Bash example

# Install samtools v1.1
# conda install -c bioconda samtools=1.1

# Placeholder: input alignment file (BAM)
INPUT_BAM="aligned_reads.bam"

# Output file for uniquely mapped reads (BAM format)
OUTPUT_BAM="uniquely_mapped_reads.bam"

# Extract reads that mapped uniquely to genes
# This command filters the input BAM file to retain only uniquely mapped reads.
# -F 0x100: Exclude secondary alignments (often indicative of multi-mappers or less reliable alignments)
# -q 20: Minimum mapping quality of 20 (a common threshold for uniquely mapped reads)
# -b: Output in BAM format
# The resulting BAM file (uniquely_mapped_reads.bam) would then be used by downstream
# expression analysis tools (e.g., featureCounts, htseq-count) to generate gene-level counts.
samtools view -F 0x100 -q 20 -b "${INPUT_BAM}" > "${OUTPUT_BAM}"

# Index the filtered BAM file (optional, but good practice for downstream tools)
samtools index "${OUTPUT_BAM}"

View on GitHub

Tools Used

RNA-seq TopHat

Raw Source Text

Linker tags were removed from RNA sequencing and ribosome profiling reads by the FASTX Toolkit, v0.0.13 (http://hannonlab.cshl.edu/fastx_toolkit/)
All reads that mapped to rRNAs, tRNAs or mitochondrial rRNAs were removed, and the remaining reads were mapped to RefSeq (v38) by TopHat v2.0.13.
Finally all read counts that mapped uniquely to genes were extracted for expression analysis with the help of samtools, v1.1.
Genome_build: GRCh37.p13
Supplementary_files_format_and_content: .txt files report raw read counts that mapped uniquely to genes

← Back to Analysis