GSE77703 Processing Pipeline

RNA-Seq code_examples 5 steps

Publication

Distinct and shared functions of ALS-associated proteins TDP-43, FUS and TAF15 revealed by multisystem analyses.

Nature communications (2016) — PMID 27378374

Dataset

GSE77703

Distinct and shared functions of ALS-associated TDP-43, FUS, and TAF15 revealed by comprehensive multi-system integrative analyses [RNA-Seq_mouse]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Sequencing reads from RNA-seq libraries were first trimmed of polyA tails, adapters, and low quality ends using cutadapt with parameters --match-read-wildcards --times 2 -e 0 -O 5 --quality-cutoff' 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b TGGAATTCTCGGGTGCCAAGG -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT.

    cutadapt v4.1 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    bash
    # Install cutadapt (example using conda)
    # conda install -c bioconda cutadapt
    
    # Define input and output files (placeholders)
    INPUT_READS="input.fastq.gz"
    OUTPUT_READS="output.trimmed.fastq.gz"
    
    # Run cutadapt to trim polyA tails, adapters, and low quality ends
    cutadapt \
      --match-read-wildcards \
      --times 2 \
      -e 0 \
      -O 5 \
      --quality-cutoff 6 \
      -m 18 \
      -b TCGTATGCCGTCTTCTGCTTG \
      -b ATCTCGTATGCCGTCTTCTGCTTG \
      -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC \
      -b TGGAATTCTCGGGTGCCAAGG \
      -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA \
      -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT \
      -o "${OUTPUT_READS}" \
      "${INPUT_READS}"
    
  2. 2

    Reads were then mapped against a database of repetitive elements derived from RepBase18.05.

    STAR (Inferred with models/gemini-2.5-flash) v2.7.3a (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install STAR (example using conda)
    # conda install -c bioconda star
    
    # Define variables
    # Replace with your actual input FASTQ file
    INPUT_FASTQ="input_reads.fastq.gz"
    # Replace with the path to your STAR index for RepBase18.05
    # This index needs to be pre-built using STAR genomeGenerate command with RepBase18.05 sequences.
    REPBASE_STAR_INDEX_DIR="/path/to/repbase_18_05_star_index"
    # Prefix for output files
    OUTPUT_PREFIX="sample_repbase_aligned"
    # Number of threads to use
    NUM_THREADS=8
    
    # Align reads to the RepBase18.05 repetitive elements database
    STAR \
      --runThreadN ${NUM_THREADS} \
      --genomeDir ${REPBASE_STAR_INDEX_DIR} \
      --readFilesIn ${INPUT_FASTQ} \
      --outFileNamePrefix ${OUTPUT_PREFIX} \
      --outFilterMultimapNmax 100 \
      --outFilterMismatchNmax 10 \
      --outFilterMismatchNoverLmax 0.05 \
      --outFilterScoreMin 10 \
      --outFilterScoreMinOverLread 0.3 \
      --outFilterMatchNoverLread 0.3 \
      --outSAMattributes All \
      --outSAMtype BAM Unsorted \
      --outSAMunmapped Within \
      --outReadsUnmapped Fastx
    
  3. 3

    Bowtie version 1.0.0 with parameters -S -q -p 16 -e 100 -l 20 was used to align reads against an index generated from Repbase sequences (Langmead et al., 2009).

    Bowtie v1.0.0 GitHub
    $ Bash example
    # Install Bowtie (if not already installed)
    # conda install -c bioconda bowtie
    
    # Align reads using Bowtie
    bowtie -S -q -p 16 -e 100 -l 20 repbase_index reads.fastq > output.sam
  4. 4

    Reads not mapped to Repbase sequences were aligned to the mm9 human genome (UCSC assembly) using STAR (Dobin et al., 2013) version 2.3.0e with parameters --outSAMunmapped Within –outFilterMultimapNmax 1 –outFilterMultimapScoreRange 1.

    $ Bash example
    # Install STAR (example using conda)
    # conda install -c bioconda star=2.3.0e
    
    # Placeholder for STAR genome index directory (mm9 mouse genome, UCSC assembly)
    # Note: The description states "mm9 human genome", which is a contradiction. mm9 is a mouse genome.
    GENOME_DIR="/path/to/mm9_star_index"
    
    # Placeholder for input FASTQ file (reads not mapped to Repbase sequences)
    INPUT_FASTQ="input_reads_filtered_from_repbase.fastq.gz"
    
    # Output directory for STAR alignment results
    OUTPUT_DIR="star_alignment_output"
    
    # Create output directory if it doesn't exist
    mkdir -p "${OUTPUT_DIR}"
    
    # Align reads using STAR with specified parameters
    # --outSAMunmapped Within: Output unmapped reads in the SAM file, but only those that fall within the alignment boundaries.
    # --outFilterMultimapNmax 1: Only output alignments for uniquely mapping reads (i.e., reads that map to at most 1 locus).
    # --outFilterMultimapScoreRange 1: The best alignment score must be at least 1 point better than the second best alignment score.
    STAR --genomeDir "${GENOME_DIR}" \
         --readFilesIn "${INPUT_FASTQ}" \
         --outFileNamePrefix "${OUTPUT_DIR}/" \
         --outSAMunmapped Within \
         --outFilterMultimapNmax 1 \
         --outFilterMultimapScoreRange 1 \
         --runThreadN 8 # Adjust number of threads as needed
  5. 5

    counts of reads for each gene annotated in gencode vM1 were calculated from featureCounts

    featureCounts v(Inferred with models/gemini-2.5-flash)
    $ Bash example
    # Install featureCounts (part of the Subread package) if not already installed
    # For example, using conda:
    # conda install -c bioconda subread
    
    # Define input and output files
    INPUT_BAM="aligned_reads.bam" # Placeholder for your input BAM file
    OUTPUT_COUNTS="gene_counts.txt"
    
    # Define the Gencode vM1 annotation GTF file
    # Gencode vM1 is an older mouse annotation release. 
    # You might need to download it from the Gencode archives if not available locally.
    # Example download (adjust URL for specific M1 release if needed):
    # wget -O gencode.vM1.annotation.gtf.gz "ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M1/gencode.vM1.annotation.gtf.gz"
    # gunzip gencode.vM1.annotation.gtf.gz
    GENCODE_GTF="gencode.vM1.annotation.gtf" # Placeholder for the path to your Gencode vM1 GTF file
    
    # Run featureCounts to calculate read counts for each gene
    # -a: Annotation file (GTF)
    # -o: Output file for counts
    # -F GTF: Specify GTF format for annotation
    # -t exon: Count features of type 'exon' (standard for gene-level counts)
    # -g gene_id: Aggregate counts by 'gene_id' attribute in the GTF
    # -T 8: Use 8 threads (adjust as needed for your system)
    # -s 0: Unstranded (0), 1 for forward, 2 for reverse. Adjust if your library is stranded.
    # # -p: Uncomment if your reads are paired-end
    featureCounts -a "${GENCODE_GTF}" -o "${OUTPUT_COUNTS}" -F GTF -t exon -g gene_id -T 8 -s 0 "${INPUT_BAM}"

Tools Used

Raw Source Text
Sequencing reads from RNA-seq libraries were first trimmed of polyA tails, adapters, and low quality ends using cutadapt with parameters --match-read-wildcards --times 2 -e 0 -O 5 --quality-cutoff' 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b TGGAATTCTCGGGTGCCAAGG -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT.
Reads were then mapped against a database of repetitive elements derived from RepBase18.05. Bowtie version 1.0.0 with parameters -S -q -p 16 -e 100 -l 20 was used to align reads against an index generated from Repbase sequences (Langmead et al., 2009).
Reads not mapped to Repbase sequences were aligned to the mm9 human genome (UCSC assembly) using STAR (Dobin et al., 2013) version 2.3.0e with parameters --outSAMunmapped Within –outFilterMultimapNmax 1 –outFilterMultimapScoreRange 1.
counts of reads for each gene annotated in gencode vM1 were calculated from featureCounts
Genome_build: mm9
Supplementary_files_format_and_content: count file, contains counts of reads for each sample
← Back to Analysis