GSE176060 Processing Pipeline

OTHER code_examples 3 steps

Publication

Gain-of-function cardiomyopathic mutations in RBM20 rewire splicing regulation and re-distribute ribonucleoprotein granules within processing bodies.

Nature communications (2021) — PMID 34732726

Dataset

GSE176060

RNA-Seq of isogenic human iPS cell-derived cardiomyocytes with RBM20 mutations created by genome editing (eCLIP)

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    library strategy: eCLIP

    $ Bash example
    # Assuming you have cloned the skipper repository and are in its root directory:
    # git clone https://github.com/yeolab/skipper.git
    # cd skipper
    
    # Create a placeholder configuration file (config.yaml)
    # This file defines parameters, input/output directories, and reference genome paths.
    # Replace placeholder paths with actual paths to your data and reference files.
    cat << EOF > config.yaml
    # General settings
    output_dir: results
    threads: 8
    
    # Reference genome settings (example for human hg38)
    genome_build: hg38
    genome_fasta: /path/to/reference/hg38.fa
    genome_gtf: /path/to/reference/gencode.v38.annotation.gtf
    genome_star_index: /path/to/reference/STAR_index_hg38
    genome_chrom_sizes: /path/to/reference/hg38.chrom.sizes
    genome_blacklist: /path/to/reference/hg38_blacklist.bed
    
    # Adapter sequences for trimming (example)
    adapters_fasta: /path/to/adapters/truseq_adapters.fa
    
    # Peak calling parameters (clipper)
    clipper_min_read_length: 15
    clipper_window_size: 20
    clipper_step_size: 1
    clipper_fdr_threshold: 0.05
    
    # IDR parameters (merge_peaks)
    idr_threshold: 0.05
    
    # Other tool-specific parameters can be added here
    EOF
    
    # Create a placeholder samplesheet (samples.tsv)
    # This file lists your eCLIP and input samples, their FASTQ files, and metadata.
    # Replace placeholder paths with actual paths to your FASTQ files.
    cat << EOF > samples.tsv
    sample_id	fastq_r1	fastq_r2	antibody	replicate	condition
    eCLIP_sample1_rep1	/path/to/fastq/eCLIP_sample1_rep1_R1.fastq.gz	/path/to/fastq/eCLIP_sample1_rep1_R2.fastq.gz	RBFOX2	1	treatment
    eCLIP_sample1_rep2	/path/to/fastq/eCLIP_sample1_rep2_R1.fastq.gz	/path/to/fastq/eCLIP_sample1_rep2_R2.fastq.gz	RBFOX2	2	treatment
    input_sample1_rep1	/path/to/fastq/input_sample1_rep1_R1.fastq.gz	/path/to/fastq/input_sample1_rep1_R2.fastq.gz	Input	1	treatment
    EOF
    
    # Execute the eCLIP Snakemake workflow using the created config and samplesheet.
    # --use-conda: Automatically creates and manages conda environments for tools.
    # --cores 8: Use 8 CPU cores for parallel execution. Adjust as needed.
    # --configfile config.yaml: Specifies the configuration file.
    # --profile profiles/conda: Uses a predefined profile for conda environment management.
    # Ensure Snakemake is installed and accessible in your PATH.
    # conda install -c conda-forge -c bioconda snakemake
    snakemake -s Snakefile --use-conda --cores 8 --configfile config.yaml --profile profiles/conda
  2. 2

    Reproducible RBM20 peaks (hg19) obtained from replicate WT and R636S HMZ iPSC-CMs compared to size-matched input controls, were used for all down-stream analyses.

    Clipper (Inferred with models/gemini-2.5-flash), merge_peaks (Inferred with models/gemini-2.5-flash) vlatest (Clipper), latest (merge_peaks)
    $ Bash example
    # --- Setup Environment ---
    # It's recommended to use a virtual environment or conda for managing dependencies.
    # For example, to install clipper and its dependencies:
    # conda create -n eclip_env python=3.8
    # conda activate eclip_env
    # pip install numpy scipy pysam
    # git clone https://github.com/yeolab/clipper.git
    # git clone https://github.com/yeolab/merge_peaks.git
    # export PATH=$PATH:$(pwd)/clipper:$(pwd)/merge_peaks # Add scripts to PATH if not installed globally
    
    # --- Define Variables ---
    GENOME="hg19"
    GENOME_SIZE_FILE="${GENOME}.chrom.sizes" # Placeholder for genome size file
    # Download hg19 chrom.sizes if not available
    # wget -O ${GENOME_SIZE_FILE} http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.chrom.sizes
    
    # Input BAM files (placeholders - replace with actual paths)
    # Assuming two replicates for WT and R636S, and two size-matched input controls
    WT_REP1_BAM="WT_iPSC_CM_rep1.bam"
    WT_REP2_BAM="WT_iPSC_CM_rep2.bam"
    R636S_REP1_BAM="R636S_iPSC_CM_rep1.bam"
    R636S_REP2_BAM="R636S_iPSC_CM_rep2.bam"
    INPUT_REP1_BAM="Input_control_rep1.bam"
    INPUT_REP2_BAM="Input_control_rep2.bam"
    
    OUTPUT_DIR="RBM20_peaks_analysis"
    mkdir -p ${OUTPUT_DIR}
    
    # --- 1. Peak Calling with Clipper ---
    # Call peaks for each replicate against its size-matched input control
    
    echo "Calling RBM20 peaks for WT replicates..."
    python clipper/clipper.py -b ${WT_REP1_BAM} -c ${INPUT_REP1_BAM} -s ${GENOME_SIZE_FILE} -o ${OUTPUT_DIR}/WT_rep1_RBM20_peaks.bed
    python clipper/clipper.py -b ${WT_REP2_BAM} -c ${INPUT_REP2_BAM} -s ${GENOME_SIZE_FILE} -o ${OUTPUT_DIR}/WT_rep2_RBM20_peaks.bed
    
    echo "Calling RBM20 peaks for R636S replicates..."
    python clipper/clipper.py -b ${R636S_REP1_BAM} -c ${INPUT_REP1_BAM} -s ${GENOME_SIZE_FILE} -o ${OUTPUT_DIR}/R636S_rep1_RBM20_peaks.bed
    python clipper/clipper.py -b ${R636S_REP2_BAM} -c ${INPUT_REP2_BAM} -s ${GENOME_SIZE_FILE} -o ${OUTPUT_DIR}/R636S_rep2_RBM20_peaks.bed
    
    # --- 2. Identifying Reproducible Peaks with merge_peaks (IDR) ---
    # Perform IDR analysis on replicates for each condition (WT and R636S)
    # A common IDR threshold is 0.05
    
    echo "Performing IDR for WT RBM20 peaks..."
    python merge_peaks/merge_peaks.py -i ${OUTPUT_DIR}/WT_rep1_RBM20_peaks.bed ${OUTPUT_DIR}/WT_rep2_RBM20_peaks.bed -o ${OUTPUT_DIR}/WT_RBM20_reproducible_peaks -t 0.05
    
    echo "Performing IDR for R636S RBM20 peaks..."
    python merge_peaks/merge_peaks.py -i ${OUTPUT_DIR}/R636S_rep1_RBM20_peaks.bed ${OUTPUT_DIR}/R636S_rep2_RBM20_peaks.bed -o ${OUTPUT_DIR}/R636S_RBM20_reproducible_peaks -t 0.05
    
    echo "Reproducible RBM20 peaks for WT and R636S conditions are generated in ${OUTPUT_DIR}/ (look for *_idr_peaks.bed files)"
  3. 3

    Downstream bioinformatics were performed according to the default ENCODE eCLIP bioinformatics pipeline as described at from https://www.encodeproject.org/eclip/.

    eCLIP vSTAR 2.7.10a, CLIPper 1.0.0, merge_peaks 1.0.0 (from yeolab/eclip CWL workflow) GitHub
    $ Bash example
    # Install cwltool (if not already installed)
    # pip install cwltool
    
    # Clone the ENCODE eCLIP CWL workflow repository
    # git clone https://github.com/yeolab/eclip.git
    # cd eclip
    
    # --- Placeholder for reference genome data ---
    # Download human genome (hg38) FASTA, GTF, chromosome sizes, and blacklist regions
    # mkdir -p /path/to/genome_data/hg38
    # cd /path/to/genome_data/hg38
    # wget -c https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
    # gunzip hg38.fa.gz
    # wget -c https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/hg38.ncbiRefSeq.gtf.gz
    # gunzip hg38.ncbiRefSeq.gtf.gz
    # wget -c https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes
    # wget -c https://raw.githubusercontent.com/Boyle-Lab/Blacklist/master/lists/hg38-blacklist.v2.bed.gz
    # gunzip hg38-blacklist.v2.bed.gz
    
    # --- Placeholder for STAR index generation (if not pre-built) ---
    # mkdir -p /path/to/genome_data/hg38/STAR_index
    # STAR \
    #   --runThreadN 8 \
    #   --runMode genomeGenerate \
    #   --genomeDir /path/to/genome_data/hg38/STAR_index \
    #   --genomeFastaFiles /path/to/genome_data/hg38/hg38.fa \
    #   --sjdbGTFfile /path/to/genome_data/hg38/hg38.ncbiRefSeq.gtf \
    #   --sjdbOverhang 100 # Adjust based on read length - 1
    
    # Define input files and parameters for the eCLIP pipeline
    # Replace with actual paths to your FASTQ files and genome data
    # Assuming single-end reads for simplicity. Adjust for paired-end if needed.
    cat << EOF > eclip_job.yaml
    fastq_rep1_r1:
      class: File
      path: /path/to/your/eclip_rep1.fastq.gz
    fastq_input_r1:
      class: File
      path: /path/to/your/input_control.fastq.gz
    genome_fasta:
      class: File
      path: /path/to/genome_data/hg38/hg38.fa
    genome_gtf:
      class: File
      path: /path/to/genome_data/hg38/hg38.ncbiRefSeq.gtf
    chrom_sizes:
      class: File
      path: /path/to/genome_data/hg38/hg38.chrom.sizes
    blacklist_regions:
      class: File
      path: /path/to/genome_data/hg38/hg38-blacklist.v2.bed
    output_prefix: my_eclip_experiment
    threads: 8
    # Optional parameters (uncomment and adjust as needed)
    # read_length: 50
    # min_read_length: 18
    # max_read_length: 100
    # min_mapq: 20
    # min_peak_width: 5
    # max_peak_width: 500
    # fdr_threshold: 0.05
    # idr_threshold: 0.1
    # min_fold_enrichment: 2.0
    # min_reads_in_peak: 10
    EOF
    
    # Execute the eCLIP CWL workflow using cwltool
    # Ensure you are in the directory containing eclip.cwl or provide its full path
    cwltool /path/to/eclip/eclip.cwl eclip_job.yaml
    

Tools Used

Raw Source Text
library strategy: eCLIP
Reproducible RBM20 peaks (hg19) obtained from replicate WT and R636S HMZ iPSC-CMs compared to size-matched input controls, were used for all down-stream analyses. Downstream bioinformatics were performed according to the default ENCODE eCLIP bioinformatics pipeline as described at from https://www.encodeproject.org/eclip/.
Genome_build: hg19
Supplementary_files_format_and_content: BED format text files of hg19-aligned RBM20 eCLIP in WT iPSC-CM  peak genomic coordinates and annotations
Supplementary_files_format_and_content: BED format text files of hg19-aligned RBM20 eCLIP in RBM20 R636S-HMZ iPSC-CM  peak genomic coordinates and annotations
← Back to Analysis