RNA binding proteins (RBPs) play essential roles in cellular physiology by interacting with target RNAs. As defects in protein-RNA recognition lead to human disease, UV-crosslinking and immunoprecipitation (CLIP) of ribonuclear complexes followed by deep sequencing (-seq) is critical in constructing protein-RNA maps to expand our understanding of RBP function. However, current CLIP protocols are technically demanding and involve low complexity libraries that yield squandered sequencing of PCR duplicates and high experimental failure rates. To enable truly large-scale implementation of CLIP-seq, we have developed an enhanced CLIP methodology (eCLIP) that features a decrease of ~10 cycles of requisite amplification with a concomitant >60% decrease in discarded PCR duplicate reads, while maintaining the ability to identify RNA binding with single-nucleotide resolution. By simplifying the generation of paired IgG and size-matched input controls, eCLIP also dramatically improves specificity in discovery of authentic binding sites. To demonstrate that eCLIP enables large-scale and robust profiling of RBPs, 102 eCLIP experiments in biological duplicate for a diverse collection of 74 RBPs in HepG2 and K562 cells were completed (available at https://www.encodeproject.org). We establish that eCLIP is comparable in amplification and sample requirements to ChIP-seq, and enables integrative analysis of diverse RBPs to reveal factor-specific profiles, common artifacts for CLIP experiments and RNA-centric perspectives of RBP activity.
Run to trim off both 5â and 3â adapters on both reads.
Command: quality-cutoff 6 -m 18 -a NNNNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -g CTTCCGATCTACAAGTT -g CTTCCGATCTTGGTCCT -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGT AGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz /full/path/to/files/file_R1.C01.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.metrics
Takes output from cutadapt round 1.
Run to trim off the 3â adapters on read 2, to control for double ligation events.
Command: cutadapt -f fastq --match-read-wildcards --times 1 -e 0.1 -O 5 --quality-cutoff 6 -m 18 -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGTAGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.metrics
Takes output from cutadapt round 2.
Maps to human specific version of RepBase used to remove repetitive elements, helps control for spurious artifacts from rRNA (& other) repetitive reads.
Eric L Van Nostrand, Gabriel A Pratt, Alexander A Shishkin, Chelsea Gelboin-Burkhart, Mark Y Fang, Balaji Sundararaman, Steven M Blue, Thai B Nguyen, Christine Surka, Keri Elkins, Rebecca Stanton, Frank Rigo, Mitchell Guttman, Gene W Yeo