GSE77704 Processing Pipeline
Publication
Distinct and shared functions of ALS-associated proteins TDP-43, FUS and TAF15 revealed by multisystem analyses.Nature communications (2016) — PMID 27378374
Dataset
GSE77704Distinct and shared functions of ALS-associated TDP-43, FUS, and TAF15 revealed by comprehensive multi-system integrative analyses [RNA-Seq_Stability]
Processing Steps
Generate Jupyter Notebook-
1
RNA-seq libraries were first trimmed of polyA tails, adapters, and low quality ends using cutadapt with parameters --match-read-wildcards --times 2 -e 0 -O 5 --quality-cutoff' 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b TGGAATTCTCGGGTGCCAAGG -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT.
$ Bash example
# Install cutadapt (example using conda) # conda install -c bioconda cutadapt # Define input and output file names (placeholders) # For single-end reads: INPUT_FASTQ="input.fastq.gz" OUTPUT_FASTQ="trimmed_output.fastq.gz" # For paired-end reads, uncomment and adjust: # INPUT_FASTQ_R1="input_R1.fastq.gz" # INPUT_FASTQ_R2="input_R2.fastq.gz" # OUTPUT_FASTQ_R1="trimmed_output_R1.fastq.gz" # OUTPUT_FASTQ_R2="trimmed_output_R2.fastq.gz" # Run cutadapt for single-end reads cutadapt \ --match-read-wildcards \ --times 2 \ -e 0 \ -O 5 \ --quality-cutoff 6 \ -m 18 \ -b TCGTATGCCGTCTTCTGCTTG \ -b ATCTCGTATGCCGTCTTCTGCTTG \ -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC \ -b TGGAATTCTCGGGTGCCAAGG \ -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA \ -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT \ -o "${OUTPUT_FASTQ}" \ "${INPUT_FASTQ}" # If processing paired-end reads, use the following command instead: # cutadapt \ # --match-read-wildcards \ # --times 2 \ # -e 0 \ # -O 5 \ # --quality-cutoff 6 \ # -m 18 \ # -b TCGTATGCCGTCTTCTGCTTG \ # -b ATCTCGTATGCCGTCTTCTGCTTG \ # -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC \ # -b TGGAATTCTCGGGTGCCAAGG \ # -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA \ # -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT \ # -o "${OUTPUT_FASTQ_R1}" \ # -p "${OUTPUT_FASTQ_R2}" \ # "${INPUT_FASTQ_R1}" \ # "${INPUT_FASTQ_R2}" -
2
Reads were then mapped against a database of repetitive elements derived from RepBase18.05.
bowtie2 (Inferred with models/gemini-2.5-flash) v2.5.0 (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install bowtie2 if not already installed # conda install -c bioconda bowtie2 # --- Prepare RepBase18.05 index --- # The RepBase18.05 database needs to be obtained (e.g., as a FASTA file). # For demonstration, we assume 'RepBase18.05.fasta' is the file containing repetitive elements. # This file would typically be derived from RepBase (e.g., from www.girinst.org, which often requires a license). # Build bowtie2 index for RepBase18.05 # 'RepBase18.05_index' will be the prefix for the generated index files. bowtie2-build RepBase18.05.fasta RepBase18.05_index # --- Mapping reads --- # Assuming 'reads.fastq' are the input single-end reads. # Reads are mapped against the RepBase18.05 index. bowtie2 -x RepBase18.05_index -U reads.fastq -S repbase_mapped.sam # Optional: Convert SAM to BAM and sort for downstream analysis # samtools view -bS repbase_mapped.sam > repbase_mapped.bam # samtools sort repbase_mapped.bam -o repbase_mapped_sorted.bam # samtools index repbase_mapped_sorted.bam
-
3
Bowtie version 1.0.0 with parameters -S -q -p 16 -e 100 -l 20 was used to align reads against an index generated from Repbase sequences (Langmead et al., 2009).
Bowtie v1.0.0$ Bash example
# Install Bowtie (if not already installed) # For example, using conda: # conda install -c bioconda bowtie=1.0.0 # Placeholder for Repbase index and input reads # Replace 'path/to/repbase_index' with the actual path to your Bowtie index prefix (e.g., 'repbase_index' if files are repbase_index.1.ebwt, etc.) # Replace 'path/to/input_reads.fastq' with the actual path to your input FASTQ file # Replace 'path/to/output_aligned_reads.sam' with your desired output path bowtie -S -q -p 16 -e 100 -l 20 path/to/repbase_index path/to/input_reads.fastq > path/to/output_aligned_reads.sam
-
4
Reads not mapped to Repbase sequences were aligned to the hg19 human genome (UCSC assembly) using STAR (Dobin et al., 2013) version 2.3.0e with parameters --outSAMunmapped Within âoutFilterMultimapNmax 1 âoutFilterMultimapScoreRange 1.
STAR v2.3.0e$ Bash example
# Install STAR (example using conda) # conda install -c bioconda star=2.3.0e # Placeholder for STAR genome index generation (if not already done) # STAR --runMode genomeGenerate --genomeDir hg19_star_index --genomeFastaFiles hg19.fa --sjdbGTFfile hg19.gtf --runThreadN 8 # Align reads to hg19 using STAR STAR \ --genomeDir hg19_star_index \ --readFilesIn filtered_repbase_reads.fastq \ --outFileNamePrefix star_alignment_output_ \ --outSAMtype BAM SortedByCoordinate \ --outSAMunmapped Within \ --outFilterMultimapNmax 1 \ --outFilterMultimapScoreRange 1 \ --runThreadN 8 # Example: use 8 threads, adjust as needed # The output will be star_alignment_output_Aligned.sortedByCoord.out.bam
-
5
counts of reads for each gene annotated in gencode v17 were calculated from featureCounts
featureCounts v2.0.3$ Bash example
# Install Subread (which includes featureCounts) # conda install -c bioconda subread # Define input and output paths # Placeholder for Gencode v17 GTF file. You may need to download it from the Gencode archive. ANNOTATION_GTF="/path/to/gencode.v17.annotation.gtf" # Placeholder for input BAM files (e.g., from alignment step) INPUT_BAMS="aligned_reads/*.bam" OUTPUT_COUNTS="gene_counts.txt" # Run featureCounts to calculate gene-level read counts # -a: Specify the annotation file (GTF/GFF format) # -o: Specify the output file name # -F GTF: Specify that the annotation file is in GTF format (default, but good to be explicit) # -t exon: Specify feature type to count (e.g., 'exon' for gene-level counts) # -g gene_id: Specify attribute type to use for grouping features (e.g., 'gene_id') # -s 0: Specify strand specificity (0=unstranded, 1=stranded, 2=reverse stranded). Unstranded is a common default. # -T 8: Number of threads to use for parallel processing featureCounts \ -a ${ANNOTATION_GTF} \ -o ${OUTPUT_COUNTS} \ -F GTF \ -t exon \ -g gene_id \ -s 0 \ -T 8 \ ${INPUT_BAMS}
Tools Used
Raw Source Text
RNA-seq libraries were first trimmed of polyA tails, adapters, and low quality ends using cutadapt with parameters --match-read-wildcards --times 2 -e 0 -O 5 --quality-cutoff' 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b TGGAATTCTCGGGTGCCAAGG -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT. Reads were then mapped against a database of repetitive elements derived from RepBase18.05. Bowtie version 1.0.0 with parameters -S -q -p 16 -e 100 -l 20 was used to align reads against an index generated from Repbase sequences (Langmead et al., 2009). Reads not mapped to Repbase sequences were aligned to the hg19 human genome (UCSC assembly) using STAR (Dobin et al., 2013) version 2.3.0e with parameters --outSAMunmapped Within âoutFilterMultimapNmax 1 âoutFilterMultimapScoreRange 1. counts of reads for each gene annotated in gencode v17 were calculated from featureCounts Genome_build: hg19 Supplementary_files_format_and_content: count file, contains counts of reads for each sample