GSE176060 Processing Pipeline
OTHER
code_examples
3 steps
Publication
Gain-of-function cardiomyopathic mutations in RBM20 rewire splicing regulation and re-distribute ribonucleoprotein granules within processing bodies.Nature communications (2021) — PMID 34732726
Dataset
GSE176060RNA-Seq of isogenic human iPS cell-derived cardiomyocytes with RBM20 mutations created by genome editing (eCLIP)
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
library strategy: eCLIP
$ Bash example
# Assuming you have cloned the skipper repository and are in its root directory: # git clone https://github.com/yeolab/skipper.git # cd skipper # Create a placeholder configuration file (config.yaml) # This file defines parameters, input/output directories, and reference genome paths. # Replace placeholder paths with actual paths to your data and reference files. cat << EOF > config.yaml # General settings output_dir: results threads: 8 # Reference genome settings (example for human hg38) genome_build: hg38 genome_fasta: /path/to/reference/hg38.fa genome_gtf: /path/to/reference/gencode.v38.annotation.gtf genome_star_index: /path/to/reference/STAR_index_hg38 genome_chrom_sizes: /path/to/reference/hg38.chrom.sizes genome_blacklist: /path/to/reference/hg38_blacklist.bed # Adapter sequences for trimming (example) adapters_fasta: /path/to/adapters/truseq_adapters.fa # Peak calling parameters (clipper) clipper_min_read_length: 15 clipper_window_size: 20 clipper_step_size: 1 clipper_fdr_threshold: 0.05 # IDR parameters (merge_peaks) idr_threshold: 0.05 # Other tool-specific parameters can be added here EOF # Create a placeholder samplesheet (samples.tsv) # This file lists your eCLIP and input samples, their FASTQ files, and metadata. # Replace placeholder paths with actual paths to your FASTQ files. cat << EOF > samples.tsv sample_id fastq_r1 fastq_r2 antibody replicate condition eCLIP_sample1_rep1 /path/to/fastq/eCLIP_sample1_rep1_R1.fastq.gz /path/to/fastq/eCLIP_sample1_rep1_R2.fastq.gz RBFOX2 1 treatment eCLIP_sample1_rep2 /path/to/fastq/eCLIP_sample1_rep2_R1.fastq.gz /path/to/fastq/eCLIP_sample1_rep2_R2.fastq.gz RBFOX2 2 treatment input_sample1_rep1 /path/to/fastq/input_sample1_rep1_R1.fastq.gz /path/to/fastq/input_sample1_rep1_R2.fastq.gz Input 1 treatment EOF # Execute the eCLIP Snakemake workflow using the created config and samplesheet. # --use-conda: Automatically creates and manages conda environments for tools. # --cores 8: Use 8 CPU cores for parallel execution. Adjust as needed. # --configfile config.yaml: Specifies the configuration file. # --profile profiles/conda: Uses a predefined profile for conda environment management. # Ensure Snakemake is installed and accessible in your PATH. # conda install -c conda-forge -c bioconda snakemake snakemake -s Snakefile --use-conda --cores 8 --configfile config.yaml --profile profiles/conda
-
2
Reproducible RBM20 peaks (hg19) obtained from replicate WT and R636S HMZ iPSC-CMs compared to size-matched input controls, were used for all down-stream analyses.
Clipper (Inferred with models/gemini-2.5-flash), merge_peaks (Inferred with models/gemini-2.5-flash) vlatest (Clipper), latest (merge_peaks)$ Bash example
# --- Setup Environment --- # It's recommended to use a virtual environment or conda for managing dependencies. # For example, to install clipper and its dependencies: # conda create -n eclip_env python=3.8 # conda activate eclip_env # pip install numpy scipy pysam # git clone https://github.com/yeolab/clipper.git # git clone https://github.com/yeolab/merge_peaks.git # export PATH=$PATH:$(pwd)/clipper:$(pwd)/merge_peaks # Add scripts to PATH if not installed globally # --- Define Variables --- GENOME="hg19" GENOME_SIZE_FILE="${GENOME}.chrom.sizes" # Placeholder for genome size file # Download hg19 chrom.sizes if not available # wget -O ${GENOME_SIZE_FILE} http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.chrom.sizes # Input BAM files (placeholders - replace with actual paths) # Assuming two replicates for WT and R636S, and two size-matched input controls WT_REP1_BAM="WT_iPSC_CM_rep1.bam" WT_REP2_BAM="WT_iPSC_CM_rep2.bam" R636S_REP1_BAM="R636S_iPSC_CM_rep1.bam" R636S_REP2_BAM="R636S_iPSC_CM_rep2.bam" INPUT_REP1_BAM="Input_control_rep1.bam" INPUT_REP2_BAM="Input_control_rep2.bam" OUTPUT_DIR="RBM20_peaks_analysis" mkdir -p ${OUTPUT_DIR} # --- 1. Peak Calling with Clipper --- # Call peaks for each replicate against its size-matched input control echo "Calling RBM20 peaks for WT replicates..." python clipper/clipper.py -b ${WT_REP1_BAM} -c ${INPUT_REP1_BAM} -s ${GENOME_SIZE_FILE} -o ${OUTPUT_DIR}/WT_rep1_RBM20_peaks.bed python clipper/clipper.py -b ${WT_REP2_BAM} -c ${INPUT_REP2_BAM} -s ${GENOME_SIZE_FILE} -o ${OUTPUT_DIR}/WT_rep2_RBM20_peaks.bed echo "Calling RBM20 peaks for R636S replicates..." python clipper/clipper.py -b ${R636S_REP1_BAM} -c ${INPUT_REP1_BAM} -s ${GENOME_SIZE_FILE} -o ${OUTPUT_DIR}/R636S_rep1_RBM20_peaks.bed python clipper/clipper.py -b ${R636S_REP2_BAM} -c ${INPUT_REP2_BAM} -s ${GENOME_SIZE_FILE} -o ${OUTPUT_DIR}/R636S_rep2_RBM20_peaks.bed # --- 2. Identifying Reproducible Peaks with merge_peaks (IDR) --- # Perform IDR analysis on replicates for each condition (WT and R636S) # A common IDR threshold is 0.05 echo "Performing IDR for WT RBM20 peaks..." python merge_peaks/merge_peaks.py -i ${OUTPUT_DIR}/WT_rep1_RBM20_peaks.bed ${OUTPUT_DIR}/WT_rep2_RBM20_peaks.bed -o ${OUTPUT_DIR}/WT_RBM20_reproducible_peaks -t 0.05 echo "Performing IDR for R636S RBM20 peaks..." python merge_peaks/merge_peaks.py -i ${OUTPUT_DIR}/R636S_rep1_RBM20_peaks.bed ${OUTPUT_DIR}/R636S_rep2_RBM20_peaks.bed -o ${OUTPUT_DIR}/R636S_RBM20_reproducible_peaks -t 0.05 echo "Reproducible RBM20 peaks for WT and R636S conditions are generated in ${OUTPUT_DIR}/ (look for *_idr_peaks.bed files)" -
3
Downstream bioinformatics were performed according to the default ENCODE eCLIP bioinformatics pipeline as described at from https://www.encodeproject.org/eclip/.
$ Bash example
# Install cwltool (if not already installed) # pip install cwltool # Clone the ENCODE eCLIP CWL workflow repository # git clone https://github.com/yeolab/eclip.git # cd eclip # --- Placeholder for reference genome data --- # Download human genome (hg38) FASTA, GTF, chromosome sizes, and blacklist regions # mkdir -p /path/to/genome_data/hg38 # cd /path/to/genome_data/hg38 # wget -c https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz # gunzip hg38.fa.gz # wget -c https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/hg38.ncbiRefSeq.gtf.gz # gunzip hg38.ncbiRefSeq.gtf.gz # wget -c https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes # wget -c https://raw.githubusercontent.com/Boyle-Lab/Blacklist/master/lists/hg38-blacklist.v2.bed.gz # gunzip hg38-blacklist.v2.bed.gz # --- Placeholder for STAR index generation (if not pre-built) --- # mkdir -p /path/to/genome_data/hg38/STAR_index # STAR \ # --runThreadN 8 \ # --runMode genomeGenerate \ # --genomeDir /path/to/genome_data/hg38/STAR_index \ # --genomeFastaFiles /path/to/genome_data/hg38/hg38.fa \ # --sjdbGTFfile /path/to/genome_data/hg38/hg38.ncbiRefSeq.gtf \ # --sjdbOverhang 100 # Adjust based on read length - 1 # Define input files and parameters for the eCLIP pipeline # Replace with actual paths to your FASTQ files and genome data # Assuming single-end reads for simplicity. Adjust for paired-end if needed. cat << EOF > eclip_job.yaml fastq_rep1_r1: class: File path: /path/to/your/eclip_rep1.fastq.gz fastq_input_r1: class: File path: /path/to/your/input_control.fastq.gz genome_fasta: class: File path: /path/to/genome_data/hg38/hg38.fa genome_gtf: class: File path: /path/to/genome_data/hg38/hg38.ncbiRefSeq.gtf chrom_sizes: class: File path: /path/to/genome_data/hg38/hg38.chrom.sizes blacklist_regions: class: File path: /path/to/genome_data/hg38/hg38-blacklist.v2.bed output_prefix: my_eclip_experiment threads: 8 # Optional parameters (uncomment and adjust as needed) # read_length: 50 # min_read_length: 18 # max_read_length: 100 # min_mapq: 20 # min_peak_width: 5 # max_peak_width: 500 # fdr_threshold: 0.05 # idr_threshold: 0.1 # min_fold_enrichment: 2.0 # min_reads_in_peak: 10 EOF # Execute the eCLIP CWL workflow using cwltool # Ensure you are in the directory containing eclip.cwl or provide its full path cwltool /path/to/eclip/eclip.cwl eclip_job.yaml
Tools Used
Raw Source Text
library strategy: eCLIP Reproducible RBM20 peaks (hg19) obtained from replicate WT and R636S HMZ iPSC-CMs compared to size-matched input controls, were used for all down-stream analyses. Downstream bioinformatics were performed according to the default ENCODE eCLIP bioinformatics pipeline as described at from https://www.encodeproject.org/eclip/. Genome_build: hg19 Supplementary_files_format_and_content: BED format text files of hg19-aligned RBM20 eCLIP in WT iPSC-CM peak genomic coordinates and annotations Supplementary_files_format_and_content: BED format text files of hg19-aligned RBM20 eCLIP in RBM20 R636S-HMZ iPSC-CM peak genomic coordinates and annotations