GSE176060 Processing Pipeline

OTHER code_examples 3 steps

Publication

Gain-of-function cardiomyopathic mutations in RBM20 rewire splicing regulation and re-distribute ribonucleoprotein granules within processing bodies.

Nature communications (2021) — PMID 34732726

Dataset

GSE176060

RNA-Seq of isogenic human iPS cell-derived cardiomyocytes with RBM20 mutations created by genome editing (eCLIP)

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

library strategy: eCLIP

eCLIP vv0.1.0 GitHub

$ Bash example

# Assuming you have cloned the skipper repository and are in its root directory:
# git clone https://github.com/yeolab/skipper.git
# cd skipper

# Create a placeholder configuration file (config.yaml)
# This file defines parameters, input/output directories, and reference genome paths.
# Replace placeholder paths with actual paths to your data and reference files.
cat << EOF > config.yaml
# General settings
output_dir: results
threads: 8

# Reference genome settings (example for human hg38)
genome_build: hg38
genome_fasta: /path/to/reference/hg38.fa
genome_gtf: /path/to/reference/gencode.v38.annotation.gtf
genome_star_index: /path/to/reference/STAR_index_hg38
genome_chrom_sizes: /path/to/reference/hg38.chrom.sizes
genome_blacklist: /path/to/reference/hg38_blacklist.bed

# Adapter sequences for trimming (example)
adapters_fasta: /path/to/adapters/truseq_adapters.fa

# Peak calling parameters (clipper)
clipper_min_read_length: 15
clipper_window_size: 20
clipper_step_size: 1
clipper_fdr_threshold: 0.05

# IDR parameters (merge_peaks)
idr_threshold: 0.05

# Other tool-specific parameters can be added here
EOF

# Create a placeholder samplesheet (samples.tsv)
# This file lists your eCLIP and input samples, their FASTQ files, and metadata.
# Replace placeholder paths with actual paths to your FASTQ files.
cat << EOF > samples.tsv
sample_id	fastq_r1	fastq_r2	antibody	replicate	condition
eCLIP_sample1_rep1	/path/to/fastq/eCLIP_sample1_rep1_R1.fastq.gz	/path/to/fastq/eCLIP_sample1_rep1_R2.fastq.gz	RBFOX2	1	treatment
eCLIP_sample1_rep2	/path/to/fastq/eCLIP_sample1_rep2_R1.fastq.gz	/path/to/fastq/eCLIP_sample1_rep2_R2.fastq.gz	RBFOX2	2	treatment
input_sample1_rep1	/path/to/fastq/input_sample1_rep1_R1.fastq.gz	/path/to/fastq/input_sample1_rep1_R2.fastq.gz	Input	1	treatment
EOF

# Execute the eCLIP Snakemake workflow using the created config and samplesheet.
# --use-conda: Automatically creates and manages conda environments for tools.
# --cores 8: Use 8 CPU cores for parallel execution. Adjust as needed.
# --configfile config.yaml: Specifies the configuration file.
# --profile profiles/conda: Uses a predefined profile for conda environment management.
# Ensure Snakemake is installed and accessible in your PATH.
# conda install -c conda-forge -c bioconda snakemake
snakemake -s Snakefile --use-conda --cores 8 --configfile config.yaml --profile profiles/conda

View on GitHub

Reproducible RBM20 peaks (hg19) obtained from replicate WT and R636S HMZ iPSC-CMs compared to size-matched input controls, were used for all down-stream analyses.

Clipper (Inferred with models/gemini-2.5-flash), merge_peaks (Inferred with models/gemini-2.5-flash) vlatest (Clipper), latest (merge_peaks)

$ Bash example

# --- Setup Environment ---
# It's recommended to use a virtual environment or conda for managing dependencies.
# For example, to install clipper and its dependencies:
# conda create -n eclip_env python=3.8
# conda activate eclip_env
# pip install numpy scipy pysam
# git clone https://github.com/yeolab/clipper.git
# git clone https://github.com/yeolab/merge_peaks.git
# export PATH=$PATH:$(pwd)/clipper:$(pwd)/merge_peaks # Add scripts to PATH if not installed globally

# --- Define Variables ---
GENOME="hg19"
GENOME_SIZE_FILE="${GENOME}.chrom.sizes" # Placeholder for genome size file
# Download hg19 chrom.sizes if not available
# wget -O ${GENOME_SIZE_FILE} http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.chrom.sizes

# Input BAM files (placeholders - replace with actual paths)
# Assuming two replicates for WT and R636S, and two size-matched input controls
WT_REP1_BAM="WT_iPSC_CM_rep1.bam"
WT_REP2_BAM="WT_iPSC_CM_rep2.bam"
R636S_REP1_BAM="R636S_iPSC_CM_rep1.bam"
R636S_REP2_BAM="R636S_iPSC_CM_rep2.bam"
INPUT_REP1_BAM="Input_control_rep1.bam"
INPUT_REP2_BAM="Input_control_rep2.bam"

OUTPUT_DIR="RBM20_peaks_analysis"
mkdir -p ${OUTPUT_DIR}

# --- 1. Peak Calling with Clipper ---
# Call peaks for each replicate against its size-matched input control

echo "Calling RBM20 peaks for WT replicates..."
python clipper/clipper.py -b ${WT_REP1_BAM} -c ${INPUT_REP1_BAM} -s ${GENOME_SIZE_FILE} -o ${OUTPUT_DIR}/WT_rep1_RBM20_peaks.bed
python clipper/clipper.py -b ${WT_REP2_BAM} -c ${INPUT_REP2_BAM} -s ${GENOME_SIZE_FILE} -o ${OUTPUT_DIR}/WT_rep2_RBM20_peaks.bed

echo "Calling RBM20 peaks for R636S replicates..."
python clipper/clipper.py -b ${R636S_REP1_BAM} -c ${INPUT_REP1_BAM} -s ${GENOME_SIZE_FILE} -o ${OUTPUT_DIR}/R636S_rep1_RBM20_peaks.bed
python clipper/clipper.py -b ${R636S_REP2_BAM} -c ${INPUT_REP2_BAM} -s ${GENOME_SIZE_FILE} -o ${OUTPUT_DIR}/R636S_rep2_RBM20_peaks.bed

# --- 2. Identifying Reproducible Peaks with merge_peaks (IDR) ---
# Perform IDR analysis on replicates for each condition (WT and R636S)
# A common IDR threshold is 0.05

echo "Performing IDR for WT RBM20 peaks..."
python merge_peaks/merge_peaks.py -i ${OUTPUT_DIR}/WT_rep1_RBM20_peaks.bed ${OUTPUT_DIR}/WT_rep2_RBM20_peaks.bed -o ${OUTPUT_DIR}/WT_RBM20_reproducible_peaks -t 0.05

echo "Performing IDR for R636S RBM20 peaks..."
python merge_peaks/merge_peaks.py -i ${OUTPUT_DIR}/R636S_rep1_RBM20_peaks.bed ${OUTPUT_DIR}/R636S_rep2_RBM20_peaks.bed -o ${OUTPUT_DIR}/R636S_RBM20_reproducible_peaks -t 0.05

echo "Reproducible RBM20 peaks for WT and R636S conditions are generated in ${OUTPUT_DIR}/ (look for *_idr_peaks.bed files)"

Downstream bioinformatics were performed according to the default ENCODE eCLIP bioinformatics pipeline as described at from https://www.encodeproject.org/eclip/.

eCLIP vSTAR 2.7.10a, CLIPper 1.0.0, merge_peaks 1.0.0 (from yeolab/eclip CWL workflow) GitHub

$ Bash example

# Install cwltool (if not already installed)
# pip install cwltool

# Clone the ENCODE eCLIP CWL workflow repository
# git clone https://github.com/yeolab/eclip.git
# cd eclip

# --- Placeholder for reference genome data ---
# Download human genome (hg38) FASTA, GTF, chromosome sizes, and blacklist regions
# mkdir -p /path/to/genome_data/hg38
# cd /path/to/genome_data/hg38
# wget -c https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
# gunzip hg38.fa.gz
# wget -c https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/hg38.ncbiRefSeq.gtf.gz
# gunzip hg38.ncbiRefSeq.gtf.gz
# wget -c https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes
# wget -c https://raw.githubusercontent.com/Boyle-Lab/Blacklist/master/lists/hg38-blacklist.v2.bed.gz
# gunzip hg38-blacklist.v2.bed.gz

# --- Placeholder for STAR index generation (if not pre-built) ---
# mkdir -p /path/to/genome_data/hg38/STAR_index
# STAR \
#   --runThreadN 8 \
#   --runMode genomeGenerate \
#   --genomeDir /path/to/genome_data/hg38/STAR_index \
#   --genomeFastaFiles /path/to/genome_data/hg38/hg38.fa \
#   --sjdbGTFfile /path/to/genome_data/hg38/hg38.ncbiRefSeq.gtf \
#   --sjdbOverhang 100 # Adjust based on read length - 1

# Define input files and parameters for the eCLIP pipeline
# Replace with actual paths to your FASTQ files and genome data
# Assuming single-end reads for simplicity. Adjust for paired-end if needed.
cat << EOF > eclip_job.yaml
fastq_rep1_r1:
  class: File
  path: /path/to/your/eclip_rep1.fastq.gz
fastq_input_r1:
  class: File
  path: /path/to/your/input_control.fastq.gz
genome_fasta:
  class: File
  path: /path/to/genome_data/hg38/hg38.fa
genome_gtf:
  class: File
  path: /path/to/genome_data/hg38/hg38.ncbiRefSeq.gtf
chrom_sizes:
  class: File
  path: /path/to/genome_data/hg38/hg38.chrom.sizes
blacklist_regions:
  class: File
  path: /path/to/genome_data/hg38/hg38-blacklist.v2.bed
output_prefix: my_eclip_experiment
threads: 8
# Optional parameters (uncomment and adjust as needed)
# read_length: 50
# min_read_length: 18
# max_read_length: 100
# min_mapq: 20
# min_peak_width: 5
# max_peak_width: 500
# fdr_threshold: 0.05
# idr_threshold: 0.1
# min_fold_enrichment: 2.0
# min_reads_in_peak: 10
EOF

# Execute the eCLIP CWL workflow using cwltool
# Ensure you are in the directory containing eclip.cwl or provide its full path
cwltool /path/to/eclip/eclip.cwl eclip_job.yaml

View on GitHub

Tools Used

eCLIP

Raw Source Text

library strategy: eCLIP
Reproducible RBM20 peaks (hg19) obtained from replicate WT and R636S HMZ iPSC-CMs compared to size-matched input controls, were used for all down-stream analyses. Downstream bioinformatics were performed according to the default ENCODE eCLIP bioinformatics pipeline as described at from https://www.encodeproject.org/eclip/.
Genome_build: hg19
Supplementary_files_format_and_content: BED format text files of hg19-aligned RBM20 eCLIP in WT iPSC-CM  peak genomic coordinates and annotations
Supplementary_files_format_and_content: BED format text files of hg19-aligned RBM20 eCLIP in RBM20 R636S-HMZ iPSC-CM  peak genomic coordinates and annotations

← Back to Analysis