GSE171553 Processing Pipeline
RIP-Seq
code_examples
2 steps
Publication
A multi-scale map of cell structure fusing protein images and interactions.Nature (2021) — PMID 34819669
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
After sequencing, raw reads were aligned to GRCh38 and analyzed following the detailed instructions in ENCODE eCLIP-seq Processing Pipeline v2.2 (https://www.encodeproject.org/pipelines/ENCPL357ADL/).
$ Bash example
# Install STAR (example using conda) # conda install -c bioconda star # Define input files and reference genome index READS_R1="raw_reads_R1.fastq.gz" READS_R2="raw_reads_R2.fastq.gz" GENOME_DIR="/path/to/STAR_index/GRCh38" # Placeholder for GRCh38 STAR index (e.g., from ENCODE or UCSC) OUTPUT_PREFIX="aligned_eCLIP_sample" # Align raw reads to GRCh38 using STAR, following ENCODE eCLIP pipeline recommendations STAR \ --runThreadN 8 \ --genomeDir "${GENOME_DIR}" \ --readFilesIn "${READS_R1}" "${READS_R2}" \ --readFilesCommand zcat \ --outFileNamePrefix "${OUTPUT_PREFIX}_" \ --outSAMtype BAM SortedByCoordinate \ --outSAMattributes All \ --outFilterMultimapNmax 20 \ --outFilterMismatchNmax 999 \ --outFilterMismatchNoverLmax 0.04 \ --alignIntronMin 20 \ --alignIntronMax 1000000 \ --alignMatesGapMax 1000000 \ --outFilterScoreMinOverLread 0.75 \ --outFilterMatchNminOverLread 0.75 \ --limitBAMsortRAM 30000000000 # The ENCODE eCLIP-seq Processing Pipeline v2.2 continues with steps such as: # - Adapter trimming and deduplication (often handled by UMI-tools or custom scripts) # - Filtering and sorting of BAM files (e.g., using samtools and bedtools) # - Peak calling (e.g., using CLIPper: https://github.com/yeolab/clipper) # - Control peak calling (e.g., using MACS2 for input controls) # - IDR analysis for reproducible peaks (e.g., using merge_peaks: https://github.com/yeolab/merge_peaks) # - Generation of bigWig tracks for visualization -
2
Consistent with the ENCODE standard, reads aligning to artifact-enriched or repetitive genomic regions were removed.
$ Bash example
# Install bedtools if not already installed # conda install -c bioconda bedtools=2.30.0 # Define input and output file paths INPUT_BAM="aligned_reads.bam" OUTPUT_BAM="filtered_reads.bam" BLACKLIST_BED="GRCh38_unified_blacklist_V2.bed" # Download ENCODE blacklist file for GRCh38 if not available # mkdir -p reference # wget -O "${BLACKLIST_BED}.gz" https://raw.githubusercontent.com/ENCODE-DCC/chip-seq-pipeline2/master/references/GRCh38_unified_blacklist_V2.bed.gz # gunzip -f "${BLACKLIST_BED}.gz" # Remove reads aligning to artifact-enriched or repetitive genomic regions using bedtools intersect -v bedtools intersect -v -a "${INPUT_BAM}" -b "${BLACKLIST_BED}" > "${OUTPUT_BAM}"
Tools Used
Raw Source Text
After sequencing, raw reads were aligned to GRCh38 and analyzed following the detailed instructions in ENCODE eCLIP-seq Processing Pipeline v2.2 (https://www.encodeproject.org/pipelines/ENCPL357ADL/). Consistent with the ENCODE standard, reads aligning to artifact-enriched or repetitive genomic regions were removed. Genome_build: GRCh38 Supplementary_files_format_and_content: The processed file contains reproducible and significant peaks of aligned reads at IDR cutoff of 0.01, P ⤠0.001, and fold enrichment ⥠8. All peaks have annotated genic region based on overlap with GENCODE v26 transcripts.