GSE157467 Processing Pipeline

RNA-Seq code_examples 6 steps

Publication

Persistent mRNA localization defects and cell death in ALS neurons caused by transient cellular stress.

Cell reports (2021) — PMID 34496257

Dataset

GSE157467

Persistent mRNA localization defects and cell death in ALS neurons caused by transient cellular stress

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Fastq files were trimmed for adapters using cutadapt

    cutadapt v4.0 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install cutadapt (example using conda)
    # conda install -c bioconda cutadapt
    
    # Define input and output file names
    INPUT_R1="input_R1.fastq.gz"
    INPUT_R2="input_R2.fastq.gz"
    OUTPUT_R1="output_R1_trimmed.fastq.gz"
    OUTPUT_R2="output_R2_trimmed.fastq.gz"
    REPORT="cutadapt_report.txt"
    
    # Define adapter sequences (replace with actual sequences if known)
    # Example: Illumina universal adapter for Read 1 (3' end)
    ADAPTER_R1="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA"
    # Example: Illumina universal adapter for Read 2 (3' end)
    ADAPTER_R2="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT" # Or reverse complement of R1 adapter, depending on library prep
    
    # Execute cutadapt for paired-end trimming
    # -a ADAPTER_R1: 3' adapter for Read 1
    # -A ADAPTER_R2: 3' adapter for Read 2
    # -q 20,20: Trim low-quality bases from 5' and 3' ends (quality threshold 20)
    # --minimum-length 20: Discard reads shorter than 20 bp after trimming
    # -o: Output file for Read 1
    # -p: Output file for Read 2
    # --cores: Number of CPU cores to use (if available in version)
    # --report=file: Write a detailed report to a file
    cutadapt -a "${ADAPTER_R1}" -A "${ADAPTER_R2}" \
             -q 20,20 --minimum-length 20 \
             -o "${OUTPUT_R1}" -p "${OUTPUT_R2}" \
             "${INPUT_R1}" "${INPUT_R2}" > "${REPORT}" 2>&1
  2. 2

    Trimmed reads were aligned to the human genome build hg19 using STAR aligner

    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star
    
    # Define variables
    GENOME_DIR="/path/to/star_index/hg19" # Path to the pre-built STAR genome index for hg19
    READS_R1="trimmed_reads_R1.fastq.gz" # Placeholder for trimmed forward reads
    READS_R2="trimmed_reads_R2.fastq.gz" # Placeholder for trimmed reverse reads (if paired-end)
    OUTPUT_PREFIX="star_aligned" # Prefix for output files
    THREADS=8 # Number of threads to use
    
    # --- Genome Index Generation (if not already done) ---
    # To build the hg19 STAR index, you would typically need:
    # 1. The hg19 primary assembly FASTA file (e.g., from UCSC: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz)
    # 2. The hg19 GTF annotation file (e.g., from GENCODE or UCSC: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/genes/hg19.ncbiRefSeq.gtf.gz)
    #
    # Example command to build index (uncomment and modify if needed):
    # mkdir -p ${GENOME_DIR}
    # STAR --runMode genomeGenerate \
    #      --genomeDir ${GENOME_DIR} \
    #      --genomeFastaFiles /path/to/hg19.fa \
    #      --sjdbGTFfile /path/to/hg19.gtf \
    #      --sjdbOverhang 100 \
    #      --runThreadN ${THREADS}
    
    # Perform STAR alignment
    # The description implies trimmed reads, assuming paired-end for typical RNA-seq.
    STAR --genomeDir ${GENOME_DIR} \
         --readFilesIn ${READS_R1} ${READS_R2} \
         --runThreadN ${THREADS} \
         --outFileNamePrefix ${OUTPUT_PREFIX}_ \
         --outSAMtype BAM SortedByCoordinate \
         --outSAMunmapped Within \
         --outSAMattributes Standard \
         --quantMode GeneCounts # Optional: output gene counts with RSEM-like quantification
    
  3. 3

    seCLIP : Binding region peaks were identified based on read pileup densities (Clipper)

    CLIPper vlatest (from source) GitHub
    $ Bash example
    # Clone the CLIPper repository
    # git clone https://github.com/yeolab/clipper.git
    # cd clipper
    # CLIPper is a Python script, ensure Python 3 is available.
    # Common Python dependencies include numpy, scipy, pysam.
    # pip install numpy scipy pysam
    
    # Placeholder for input BAM files and chromosome sizes
    IP_BAM="path/to/your/ip_sample.bam"
    CONTROL_BAM="path/to/your/control_sample.bam"
    CHROM_SIZES="path/to/your/genome/hg38.chrom.sizes" # Example: from UCSC goldenPath (e.g., http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes)
    OUTPUT_PREFIX="sample_binding_regions"
    
    # Execute CLIPper to identify binding region peaks
    # -b: IP BAM file
    # -c: Control BAM file
    # -s: Chromosome sizes file
    # -o: Output prefix (CLIPper will append .bed)
    # -p: P-value threshold (default is 0.01)
    # -f: FDR threshold (default is 0.05)
    # -v: Verbose output
    # -t: Number of threads
    python clipper.py \
        -b "${IP_BAM}" \
        -c "${CONTROL_BAM}" \
        -s "${CHROM_SIZES}" \
        -o "${OUTPUT_PREFIX}" \
        -p 0.01 \
        -f 0.05 \
        -v \
        -t 8
  4. 4

    seCLIP : Reproducible peaks were identified by comparing replicate read pileups along binding regions

    merge_peaks (Inferred with models/gemini-2.5-flash) vlatest (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install idr (Irreproducible Discovery Rate tool), a dependency for merge_peaks
    # conda install -c bioconda idr
    
    # Clone the merge_peaks repository
    # git clone https://github.com/yeolab/merge_peaks.git
    # cd merge_peaks
    
    # Example: Prepare a genome size file (e.g., for hg38). This file lists chromosome names and their lengths.
    # You might need to generate this from a reference genome .fasta file or download it.
    # For example, using samtools faidx:
    # samtools faidx hg38.fa
    # cut -f1,2 hg38.fa.fai > hg38.chrom.sizes
    
    # Run the IDR analysis using run_idr.py from the merge_peaks pipeline.
    # This command assumes you have already called peaks for two replicates (rep1_peaks.bed, rep2_peaks.bed)
    # and a pooled sample (pooled_peaks.bed), typically in narrowPeak format.
    python merge_peaks/run_idr.py \
      --peak_file_rep1 rep1_peaks.bed \
      --peak_file_rep2 rep2_peaks.bed \
      --peak_file_pooled pooled_peaks.bed \
      --output_prefix reproducible_peaks \
      --genome_size hg38.chrom.sizes \
      --idr_threshold 0.05 \
      --peak_type narrowPeak
  5. 5

    wiggle files

    deepTools (Inferred with models/gemini-2.5-flash) v3.5.1 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install deepTools if not available
    # conda install -c bioconda deeptools
    
    # Example: Generate bigWig coverage file from a BAM file
    # This command generates a bigWig file normalized by RPKM, with a bin size of 10 bp.
    # Replace 'aligned_reads.bam' with your input BAM file.
    # Replace 'coverage.bw' with your desired output bigWig file name.
    # For effective genome size, refer to deepTools documentation or ENCODE guidelines for your specific organism/assembly.
    # Example effective genome size for hg38 (from ENCODE): 2913022398
    
    bamCoverage \
        -b aligned_reads.bam \
        -o coverage.bw \
        --normalizeUsingRPKM \
        --binSize 10 \
        --effectiveGenomeSize 2913022398 \
        --numberOfProcessors auto
  6. 6

    peak text files

    clipper (Inferred with models/gemini-2.5-flash) v0.0.1 GitHub
    $ Bash example
    # Install clipper using Docker
    # docker pull yeolab/clipper:0.0.1
    
    # Define input files and parameters
    IP_BAM="ip.bam" # Placeholder for IP BAM file (e.g., from eCLIP IP sample)
    INPUT_BAM="input.bam" # Placeholder for input/control BAM file (e.g., from eCLIP input sample)
    SPECIES="hg38" # Placeholder for species (e.g., hg38, mm10). Reference genome source: UCSC/Ensembl
    OUTPUT_DIR="." # Output directory
    OUTPUT_PREFIX="peaks" # Prefix for output peak files
    
    # Run clipper for peak calling
    docker run --rm -v "$(pwd):/data" yeolab/clipper:0.0.1 \
        --species "${SPECIES}" \
        --output-dir "/data/${OUTPUT_DIR}" \
        --output-prefix "${OUTPUT_PREFIX}" \
        "/data/${IP_BAM}" \
        "/data/${INPUT_BAM}"

Tools Used

Raw Source Text
Fastq files were trimmed for adapters using cutadapt
Trimmed reads were aligned to the human genome build hg19 using STAR aligner
seCLIP : Binding region peaks were identified based on read pileup densities (Clipper)
seCLIP : Reproducible peaks were identified by comparing replicate read pileups along binding regions
Genome_build: hg19
Supplementary_files_format_and_content: normalized read counts text file
wiggle files
peak text files
← Back to Analysis