GSE77703 Processing Pipeline
Publication
Distinct and shared functions of ALS-associated proteins TDP-43, FUS and TAF15 revealed by multisystem analyses.Nature communications (2016) — PMID 27378374
Dataset
GSE77703Distinct and shared functions of ALS-associated TDP-43, FUS, and TAF15 revealed by comprehensive multi-system integrative analyses [RNA-Seq_mouse]
Processing Steps
Generate Jupyter Notebook-
1
Sequencing reads from RNA-seq libraries were first trimmed of polyA tails, adapters, and low quality ends using cutadapt with parameters --match-read-wildcards --times 2 -e 0 -O 5 --quality-cutoff' 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b TGGAATTCTCGGGTGCCAAGG -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT.
$ Bash example
bash # Install cutadapt (example using conda) # conda install -c bioconda cutadapt # Define input and output files (placeholders) INPUT_READS="input.fastq.gz" OUTPUT_READS="output.trimmed.fastq.gz" # Run cutadapt to trim polyA tails, adapters, and low quality ends cutadapt \ --match-read-wildcards \ --times 2 \ -e 0 \ -O 5 \ --quality-cutoff 6 \ -m 18 \ -b TCGTATGCCGTCTTCTGCTTG \ -b ATCTCGTATGCCGTCTTCTGCTTG \ -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC \ -b TGGAATTCTCGGGTGCCAAGG \ -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA \ -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT \ -o "${OUTPUT_READS}" \ "${INPUT_READS}" -
2
Reads were then mapped against a database of repetitive elements derived from RepBase18.05.
$ Bash example
# Install STAR (example using conda) # conda install -c bioconda star # Define variables # Replace with your actual input FASTQ file INPUT_FASTQ="input_reads.fastq.gz" # Replace with the path to your STAR index for RepBase18.05 # This index needs to be pre-built using STAR genomeGenerate command with RepBase18.05 sequences. REPBASE_STAR_INDEX_DIR="/path/to/repbase_18_05_star_index" # Prefix for output files OUTPUT_PREFIX="sample_repbase_aligned" # Number of threads to use NUM_THREADS=8 # Align reads to the RepBase18.05 repetitive elements database STAR \ --runThreadN ${NUM_THREADS} \ --genomeDir ${REPBASE_STAR_INDEX_DIR} \ --readFilesIn ${INPUT_FASTQ} \ --outFileNamePrefix ${OUTPUT_PREFIX} \ --outFilterMultimapNmax 100 \ --outFilterMismatchNmax 10 \ --outFilterMismatchNoverLmax 0.05 \ --outFilterScoreMin 10 \ --outFilterScoreMinOverLread 0.3 \ --outFilterMatchNoverLread 0.3 \ --outSAMattributes All \ --outSAMtype BAM Unsorted \ --outSAMunmapped Within \ --outReadsUnmapped Fastx -
3
Bowtie version 1.0.0 with parameters -S -q -p 16 -e 100 -l 20 was used to align reads against an index generated from Repbase sequences (Langmead et al., 2009).
$ Bash example
# Install Bowtie (if not already installed) # conda install -c bioconda bowtie # Align reads using Bowtie bowtie -S -q -p 16 -e 100 -l 20 repbase_index reads.fastq > output.sam
-
4
Reads not mapped to Repbase sequences were aligned to the mm9 human genome (UCSC assembly) using STAR (Dobin et al., 2013) version 2.3.0e with parameters --outSAMunmapped Within âoutFilterMultimapNmax 1 âoutFilterMultimapScoreRange 1.
$ Bash example
# Install STAR (example using conda) # conda install -c bioconda star=2.3.0e # Placeholder for STAR genome index directory (mm9 mouse genome, UCSC assembly) # Note: The description states "mm9 human genome", which is a contradiction. mm9 is a mouse genome. GENOME_DIR="/path/to/mm9_star_index" # Placeholder for input FASTQ file (reads not mapped to Repbase sequences) INPUT_FASTQ="input_reads_filtered_from_repbase.fastq.gz" # Output directory for STAR alignment results OUTPUT_DIR="star_alignment_output" # Create output directory if it doesn't exist mkdir -p "${OUTPUT_DIR}" # Align reads using STAR with specified parameters # --outSAMunmapped Within: Output unmapped reads in the SAM file, but only those that fall within the alignment boundaries. # --outFilterMultimapNmax 1: Only output alignments for uniquely mapping reads (i.e., reads that map to at most 1 locus). # --outFilterMultimapScoreRange 1: The best alignment score must be at least 1 point better than the second best alignment score. STAR --genomeDir "${GENOME_DIR}" \ --readFilesIn "${INPUT_FASTQ}" \ --outFileNamePrefix "${OUTPUT_DIR}/" \ --outSAMunmapped Within \ --outFilterMultimapNmax 1 \ --outFilterMultimapScoreRange 1 \ --runThreadN 8 # Adjust number of threads as needed -
5
counts of reads for each gene annotated in gencode vM1 were calculated from featureCounts
featureCounts v(Inferred with models/gemini-2.5-flash)$ Bash example
# Install featureCounts (part of the Subread package) if not already installed # For example, using conda: # conda install -c bioconda subread # Define input and output files INPUT_BAM="aligned_reads.bam" # Placeholder for your input BAM file OUTPUT_COUNTS="gene_counts.txt" # Define the Gencode vM1 annotation GTF file # Gencode vM1 is an older mouse annotation release. # You might need to download it from the Gencode archives if not available locally. # Example download (adjust URL for specific M1 release if needed): # wget -O gencode.vM1.annotation.gtf.gz "ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M1/gencode.vM1.annotation.gtf.gz" # gunzip gencode.vM1.annotation.gtf.gz GENCODE_GTF="gencode.vM1.annotation.gtf" # Placeholder for the path to your Gencode vM1 GTF file # Run featureCounts to calculate read counts for each gene # -a: Annotation file (GTF) # -o: Output file for counts # -F GTF: Specify GTF format for annotation # -t exon: Count features of type 'exon' (standard for gene-level counts) # -g gene_id: Aggregate counts by 'gene_id' attribute in the GTF # -T 8: Use 8 threads (adjust as needed for your system) # -s 0: Unstranded (0), 1 for forward, 2 for reverse. Adjust if your library is stranded. # # -p: Uncomment if your reads are paired-end featureCounts -a "${GENCODE_GTF}" -o "${OUTPUT_COUNTS}" -F GTF -t exon -g gene_id -T 8 -s 0 "${INPUT_BAM}"
Tools Used
Raw Source Text
Sequencing reads from RNA-seq libraries were first trimmed of polyA tails, adapters, and low quality ends using cutadapt with parameters --match-read-wildcards --times 2 -e 0 -O 5 --quality-cutoff' 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b TGGAATTCTCGGGTGCCAAGG -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT. Reads were then mapped against a database of repetitive elements derived from RepBase18.05. Bowtie version 1.0.0 with parameters -S -q -p 16 -e 100 -l 20 was used to align reads against an index generated from Repbase sequences (Langmead et al., 2009). Reads not mapped to Repbase sequences were aligned to the mm9 human genome (UCSC assembly) using STAR (Dobin et al., 2013) version 2.3.0e with parameters --outSAMunmapped Within âoutFilterMultimapNmax 1 âoutFilterMultimapScoreRange 1. counts of reads for each gene annotated in gencode vM1 were calculated from featureCounts Genome_build: mm9 Supplementary_files_format_and_content: count file, contains counts of reads for each sample