GSE171553 Processing Pipeline

RIP-Seq code_examples 2 steps

Publication

A multi-scale map of cell structure fusing protein images and interactions.

Nature (2021) — PMID 34819669

Dataset

Mapping cell structure across scales by fusing protein images and interactions

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

After sequencing, raw reads were aligned to GRCh38 and analyzed following the detailed instructions in ENCODE eCLIP-seq Processing Pipeline v2.2 (https://www.encodeproject.org/pipelines/ENCPL357ADL/).

eCLIP v2.2 GitHub

$ Bash example

# Install STAR (example using conda)
# conda install -c bioconda star

# Define input files and reference genome index
READS_R1="raw_reads_R1.fastq.gz"
READS_R2="raw_reads_R2.fastq.gz"
GENOME_DIR="/path/to/STAR_index/GRCh38" # Placeholder for GRCh38 STAR index (e.g., from ENCODE or UCSC)
OUTPUT_PREFIX="aligned_eCLIP_sample"

# Align raw reads to GRCh38 using STAR, following ENCODE eCLIP pipeline recommendations
STAR \
  --runThreadN 8 \
  --genomeDir "${GENOME_DIR}" \
  --readFilesIn "${READS_R1}" "${READS_R2}" \
  --readFilesCommand zcat \
  --outFileNamePrefix "${OUTPUT_PREFIX}_" \
  --outSAMtype BAM SortedByCoordinate \
  --outSAMattributes All \
  --outFilterMultimapNmax 20 \
  --outFilterMismatchNmax 999 \
  --outFilterMismatchNoverLmax 0.04 \
  --alignIntronMin 20 \
  --alignIntronMax 1000000 \
  --alignMatesGapMax 1000000 \
  --outFilterScoreMinOverLread 0.75 \
  --outFilterMatchNminOverLread 0.75 \
  --limitBAMsortRAM 30000000000

# The ENCODE eCLIP-seq Processing Pipeline v2.2 continues with steps such as:
# - Adapter trimming and deduplication (often handled by UMI-tools or custom scripts)
# - Filtering and sorting of BAM files (e.g., using samtools and bedtools)
# - Peak calling (e.g., using CLIPper: https://github.com/yeolab/clipper)
# - Control peak calling (e.g., using MACS2 for input controls)
# - IDR analysis for reproducible peaks (e.g., using merge_peaks: https://github.com/yeolab/merge_peaks)
# - Generation of bigWig tracks for visualization

View on GitHub

Consistent with the ENCODE standard, reads aligning to artifact-enriched or repetitive genomic regions were removed.

bedtools (Inferred with models/gemini-2.5-flash) v2.30.0 GitHub

$ Bash example

# Install bedtools if not already installed
# conda install -c bioconda bedtools=2.30.0

# Define input and output file paths
INPUT_BAM="aligned_reads.bam"
OUTPUT_BAM="filtered_reads.bam"
BLACKLIST_BED="GRCh38_unified_blacklist_V2.bed"

# Download ENCODE blacklist file for GRCh38 if not available
# mkdir -p reference
# wget -O "${BLACKLIST_BED}.gz" https://raw.githubusercontent.com/ENCODE-DCC/chip-seq-pipeline2/master/references/GRCh38_unified_blacklist_V2.bed.gz
# gunzip -f "${BLACKLIST_BED}.gz"

# Remove reads aligning to artifact-enriched or repetitive genomic regions using bedtools intersect -v
bedtools intersect -v -a "${INPUT_BAM}" -b "${BLACKLIST_BED}" > "${OUTPUT_BAM}"

View on GitHub

Tools Used

eCLIP

Raw Source Text

After sequencing, raw reads were aligned to GRCh38 and analyzed following the detailed instructions in ENCODE eCLIP-seq Processing Pipeline v2.2 (https://www.encodeproject.org/pipelines/ENCPL357ADL/).
Consistent with the ENCODE standard, reads aligning to artifact-enriched or repetitive genomic regions were removed.
Genome_build: GRCh38
Supplementary_files_format_and_content: The processed file contains reproducible and significant peaks of aligned reads at IDR cutoff of 0.01, P â¤ 0.001, and fold enrichment â¥ 8. All peaks have annotated genic region based on overlap with GENCODE v26 transcripts.

← Back to Analysis