GSE77703 Processing Pipeline

RNA-Seq code_examples 5 steps

Publication

Distinct and shared functions of ALS-associated proteins TDP-43, FUS and TAF15 revealed by multisystem analyses.

Nature communications (2016) — PMID 27378374

Dataset

Distinct and shared functions of ALS-associated TDP-43, FUS, and TAF15 revealed by comprehensive multi-system integrative analyses [RNA-Seq_mouse]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Sequencing reads from RNA-seq libraries were first trimmed of polyA tails, adapters, and low quality ends using cutadapt with parameters --match-read-wildcards --times 2 -e 0 -O 5 --quality-cutoff' 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b TGGAATTCTCGGGTGCCAAGG -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT.

cutadapt v4.1 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

bash
# Install cutadapt (example using conda)
# conda install -c bioconda cutadapt

# Define input and output files (placeholders)
INPUT_READS="input.fastq.gz"
OUTPUT_READS="output.trimmed.fastq.gz"

# Run cutadapt to trim polyA tails, adapters, and low quality ends
cutadapt \
  --match-read-wildcards \
  --times 2 \
  -e 0 \
  -O 5 \
  --quality-cutoff 6 \
  -m 18 \
  -b TCGTATGCCGTCTTCTGCTTG \
  -b ATCTCGTATGCCGTCTTCTGCTTG \
  -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC \
  -b TGGAATTCTCGGGTGCCAAGG \
  -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA \
  -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT \
  -o "${OUTPUT_READS}" \
  "${INPUT_READS}"

View on GitHub

Reads were then mapped against a database of repetitive elements derived from RepBase18.05.

STAR (Inferred with models/gemini-2.5-flash) v2.7.3a (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install STAR (example using conda)
# conda install -c bioconda star

# Define variables
# Replace with your actual input FASTQ file
INPUT_FASTQ="input_reads.fastq.gz"
# Replace with the path to your STAR index for RepBase18.05
# This index needs to be pre-built using STAR genomeGenerate command with RepBase18.05 sequences.
REPBASE_STAR_INDEX_DIR="/path/to/repbase_18_05_star_index"
# Prefix for output files
OUTPUT_PREFIX="sample_repbase_aligned"
# Number of threads to use
NUM_THREADS=8

# Align reads to the RepBase18.05 repetitive elements database
STAR \
  --runThreadN ${NUM_THREADS} \
  --genomeDir ${REPBASE_STAR_INDEX_DIR} \
  --readFilesIn ${INPUT_FASTQ} \
  --outFileNamePrefix ${OUTPUT_PREFIX} \
  --outFilterMultimapNmax 100 \
  --outFilterMismatchNmax 10 \
  --outFilterMismatchNoverLmax 0.05 \
  --outFilterScoreMin 10 \
  --outFilterScoreMinOverLread 0.3 \
  --outFilterMatchNoverLread 0.3 \
  --outSAMattributes All \
  --outSAMtype BAM Unsorted \
  --outSAMunmapped Within \
  --outReadsUnmapped Fastx

View on GitHub

3
Bowtie version 1.0.0 with parameters -S -q -p 16 -e 100 -l 20 was used to align reads against an index generated from Repbase sequences (Langmead et al., 2009).

Bowtie v1.0.0 GitHub
$ Bash example
```
# Install Bowtie (if not already installed)
# conda install -c bioconda bowtie

# Align reads using Bowtie
bowtie -S -q -p 16 -e 100 -l 20 repbase_index reads.fastq > output.sam
```
View on GitHub

Reads not mapped to Repbase sequences were aligned to the mm9 human genome (UCSC assembly) using STAR (Dobin et al., 2013) version 2.3.0e with parameters --outSAMunmapped Within âoutFilterMultimapNmax 1 âoutFilterMultimapScoreRange 1.

STAR v2.3.0e GitHub

$ Bash example

# Install STAR (example using conda)
# conda install -c bioconda star=2.3.0e

# Placeholder for STAR genome index directory (mm9 mouse genome, UCSC assembly)
# Note: The description states "mm9 human genome", which is a contradiction. mm9 is a mouse genome.
GENOME_DIR="/path/to/mm9_star_index"

# Placeholder for input FASTQ file (reads not mapped to Repbase sequences)
INPUT_FASTQ="input_reads_filtered_from_repbase.fastq.gz"

# Output directory for STAR alignment results
OUTPUT_DIR="star_alignment_output"

# Create output directory if it doesn't exist
mkdir -p "${OUTPUT_DIR}"

# Align reads using STAR with specified parameters
# --outSAMunmapped Within: Output unmapped reads in the SAM file, but only those that fall within the alignment boundaries.
# --outFilterMultimapNmax 1: Only output alignments for uniquely mapping reads (i.e., reads that map to at most 1 locus).
# --outFilterMultimapScoreRange 1: The best alignment score must be at least 1 point better than the second best alignment score.
STAR --genomeDir "${GENOME_DIR}" \
     --readFilesIn "${INPUT_FASTQ}" \
     --outFileNamePrefix "${OUTPUT_DIR}/" \
     --outSAMunmapped Within \
     --outFilterMultimapNmax 1 \
     --outFilterMultimapScoreRange 1 \
     --runThreadN 8 # Adjust number of threads as needed

View on GitHub

counts of reads for each gene annotated in gencode vM1 were calculated from featureCounts

featureCounts v(Inferred with models/gemini-2.5-flash)

$ Bash example

# Install featureCounts (part of the Subread package) if not already installed
# For example, using conda:
# conda install -c bioconda subread

# Define input and output files
INPUT_BAM="aligned_reads.bam" # Placeholder for your input BAM file
OUTPUT_COUNTS="gene_counts.txt"

# Define the Gencode vM1 annotation GTF file
# Gencode vM1 is an older mouse annotation release. 
# You might need to download it from the Gencode archives if not available locally.
# Example download (adjust URL for specific M1 release if needed):
# wget -O gencode.vM1.annotation.gtf.gz "ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M1/gencode.vM1.annotation.gtf.gz"
# gunzip gencode.vM1.annotation.gtf.gz
GENCODE_GTF="gencode.vM1.annotation.gtf" # Placeholder for the path to your Gencode vM1 GTF file

# Run featureCounts to calculate read counts for each gene
# -a: Annotation file (GTF)
# -o: Output file for counts
# -F GTF: Specify GTF format for annotation
# -t exon: Count features of type 'exon' (standard for gene-level counts)
# -g gene_id: Aggregate counts by 'gene_id' attribute in the GTF
# -T 8: Use 8 threads (adjust as needed for your system)
# -s 0: Unstranded (0), 1 for forward, 2 for reverse. Adjust if your library is stranded.
# # -p: Uncomment if your reads are paired-end
featureCounts -a "${GENCODE_GTF}" -o "${OUTPUT_COUNTS}" -F GTF -t exon -g gene_id -T 8 -s 0 "${INPUT_BAM}"

Tools Used

STAR

Raw Source Text

Sequencing reads from RNA-seq libraries were first trimmed of polyA tails, adapters, and low quality ends using cutadapt with parameters --match-read-wildcards --times 2 -e 0 -O 5 --quality-cutoff' 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b TGGAATTCTCGGGTGCCAAGG -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT.
Reads were then mapped against a database of repetitive elements derived from RepBase18.05. Bowtie version 1.0.0 with parameters -S -q -p 16 -e 100 -l 20 was used to align reads against an index generated from Repbase sequences (Langmead et al., 2009).
Reads not mapped to Repbase sequences were aligned to the mm9 human genome (UCSC assembly) using STAR (Dobin et al., 2013) version 2.3.0e with parameters --outSAMunmapped Within âoutFilterMultimapNmax 1 âoutFilterMultimapScoreRange 1.
counts of reads for each gene annotated in gencode vM1 were calculated from featureCounts
Genome_build: mm9
Supplementary_files_format_and_content: count file, contains counts of reads for each sample

← Back to Analysis