GSE157467 Processing Pipeline
RNA-Seq
code_examples
6 steps
Publication
Persistent mRNA localization defects and cell death in ALS neurons caused by transient cellular stress.Cell reports (2021) — PMID 34496257
Dataset
GSE157467Persistent mRNA localization defects and cell death in ALS neurons caused by transient cellular stress
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Fastq files were trimmed for adapters using cutadapt
$ Bash example
# Install cutadapt (example using conda) # conda install -c bioconda cutadapt # Define input and output file names INPUT_R1="input_R1.fastq.gz" INPUT_R2="input_R2.fastq.gz" OUTPUT_R1="output_R1_trimmed.fastq.gz" OUTPUT_R2="output_R2_trimmed.fastq.gz" REPORT="cutadapt_report.txt" # Define adapter sequences (replace with actual sequences if known) # Example: Illumina universal adapter for Read 1 (3' end) ADAPTER_R1="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" # Example: Illumina universal adapter for Read 2 (3' end) ADAPTER_R2="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT" # Or reverse complement of R1 adapter, depending on library prep # Execute cutadapt for paired-end trimming # -a ADAPTER_R1: 3' adapter for Read 1 # -A ADAPTER_R2: 3' adapter for Read 2 # -q 20,20: Trim low-quality bases from 5' and 3' ends (quality threshold 20) # --minimum-length 20: Discard reads shorter than 20 bp after trimming # -o: Output file for Read 1 # -p: Output file for Read 2 # --cores: Number of CPU cores to use (if available in version) # --report=file: Write a detailed report to a file cutadapt -a "${ADAPTER_R1}" -A "${ADAPTER_R2}" \ -q 20,20 --minimum-length 20 \ -o "${OUTPUT_R1}" -p "${OUTPUT_R2}" \ "${INPUT_R1}" "${INPUT_R2}" > "${REPORT}" 2>&1 -
2
Trimmed reads were aligned to the human genome build hg19 using STAR aligner
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star # Define variables GENOME_DIR="/path/to/star_index/hg19" # Path to the pre-built STAR genome index for hg19 READS_R1="trimmed_reads_R1.fastq.gz" # Placeholder for trimmed forward reads READS_R2="trimmed_reads_R2.fastq.gz" # Placeholder for trimmed reverse reads (if paired-end) OUTPUT_PREFIX="star_aligned" # Prefix for output files THREADS=8 # Number of threads to use # --- Genome Index Generation (if not already done) --- # To build the hg19 STAR index, you would typically need: # 1. The hg19 primary assembly FASTA file (e.g., from UCSC: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz) # 2. The hg19 GTF annotation file (e.g., from GENCODE or UCSC: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/genes/hg19.ncbiRefSeq.gtf.gz) # # Example command to build index (uncomment and modify if needed): # mkdir -p ${GENOME_DIR} # STAR --runMode genomeGenerate \ # --genomeDir ${GENOME_DIR} \ # --genomeFastaFiles /path/to/hg19.fa \ # --sjdbGTFfile /path/to/hg19.gtf \ # --sjdbOverhang 100 \ # --runThreadN ${THREADS} # Perform STAR alignment # The description implies trimmed reads, assuming paired-end for typical RNA-seq. STAR --genomeDir ${GENOME_DIR} \ --readFilesIn ${READS_R1} ${READS_R2} \ --runThreadN ${THREADS} \ --outFileNamePrefix ${OUTPUT_PREFIX}_ \ --outSAMtype BAM SortedByCoordinate \ --outSAMunmapped Within \ --outSAMattributes Standard \ --quantMode GeneCounts # Optional: output gene counts with RSEM-like quantification -
3
seCLIP : Binding region peaks were identified based on read pileup densities (Clipper)
$ Bash example
# Clone the CLIPper repository # git clone https://github.com/yeolab/clipper.git # cd clipper # CLIPper is a Python script, ensure Python 3 is available. # Common Python dependencies include numpy, scipy, pysam. # pip install numpy scipy pysam # Placeholder for input BAM files and chromosome sizes IP_BAM="path/to/your/ip_sample.bam" CONTROL_BAM="path/to/your/control_sample.bam" CHROM_SIZES="path/to/your/genome/hg38.chrom.sizes" # Example: from UCSC goldenPath (e.g., http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes) OUTPUT_PREFIX="sample_binding_regions" # Execute CLIPper to identify binding region peaks # -b: IP BAM file # -c: Control BAM file # -s: Chromosome sizes file # -o: Output prefix (CLIPper will append .bed) # -p: P-value threshold (default is 0.01) # -f: FDR threshold (default is 0.05) # -v: Verbose output # -t: Number of threads python clipper.py \ -b "${IP_BAM}" \ -c "${CONTROL_BAM}" \ -s "${CHROM_SIZES}" \ -o "${OUTPUT_PREFIX}" \ -p 0.01 \ -f 0.05 \ -v \ -t 8 -
4
seCLIP : Reproducible peaks were identified by comparing replicate read pileups along binding regions
merge_peaks (Inferred with models/gemini-2.5-flash) vlatest (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install idr (Irreproducible Discovery Rate tool), a dependency for merge_peaks # conda install -c bioconda idr # Clone the merge_peaks repository # git clone https://github.com/yeolab/merge_peaks.git # cd merge_peaks # Example: Prepare a genome size file (e.g., for hg38). This file lists chromosome names and their lengths. # You might need to generate this from a reference genome .fasta file or download it. # For example, using samtools faidx: # samtools faidx hg38.fa # cut -f1,2 hg38.fa.fai > hg38.chrom.sizes # Run the IDR analysis using run_idr.py from the merge_peaks pipeline. # This command assumes you have already called peaks for two replicates (rep1_peaks.bed, rep2_peaks.bed) # and a pooled sample (pooled_peaks.bed), typically in narrowPeak format. python merge_peaks/run_idr.py \ --peak_file_rep1 rep1_peaks.bed \ --peak_file_rep2 rep2_peaks.bed \ --peak_file_pooled pooled_peaks.bed \ --output_prefix reproducible_peaks \ --genome_size hg38.chrom.sizes \ --idr_threshold 0.05 \ --peak_type narrowPeak
-
5
wiggle files
deepTools (Inferred with models/gemini-2.5-flash) v3.5.1 (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install deepTools if not available # conda install -c bioconda deeptools # Example: Generate bigWig coverage file from a BAM file # This command generates a bigWig file normalized by RPKM, with a bin size of 10 bp. # Replace 'aligned_reads.bam' with your input BAM file. # Replace 'coverage.bw' with your desired output bigWig file name. # For effective genome size, refer to deepTools documentation or ENCODE guidelines for your specific organism/assembly. # Example effective genome size for hg38 (from ENCODE): 2913022398 bamCoverage \ -b aligned_reads.bam \ -o coverage.bw \ --normalizeUsingRPKM \ --binSize 10 \ --effectiveGenomeSize 2913022398 \ --numberOfProcessors auto -
6
peak text files
$ Bash example
# Install clipper using Docker # docker pull yeolab/clipper:0.0.1 # Define input files and parameters IP_BAM="ip.bam" # Placeholder for IP BAM file (e.g., from eCLIP IP sample) INPUT_BAM="input.bam" # Placeholder for input/control BAM file (e.g., from eCLIP input sample) SPECIES="hg38" # Placeholder for species (e.g., hg38, mm10). Reference genome source: UCSC/Ensembl OUTPUT_DIR="." # Output directory OUTPUT_PREFIX="peaks" # Prefix for output peak files # Run clipper for peak calling docker run --rm -v "$(pwd):/data" yeolab/clipper:0.0.1 \ --species "${SPECIES}" \ --output-dir "/data/${OUTPUT_DIR}" \ --output-prefix "${OUTPUT_PREFIX}" \ "/data/${IP_BAM}" \ "/data/${INPUT_BAM}"
Tools Used
Raw Source Text
Fastq files were trimmed for adapters using cutadapt Trimmed reads were aligned to the human genome build hg19 using STAR aligner seCLIP : Binding region peaks were identified based on read pileup densities (Clipper) seCLIP : Reproducible peaks were identified by comparing replicate read pileups along binding regions Genome_build: hg19 Supplementary_files_format_and_content: normalized read counts text file wiggle files peak text files