GSE279065 eCLIP Data Processing
Publication
LARP6 regulates the mRNA translation of fibrogenic genes in liver fibrosis.bioRxiv : the preprint server for biology (2025) — PMID 39868246
Processing Steps
Generate Jupyter Notebook-
1
Data was processed using the eCLIP pipeline and available at: http://github.com/yeolab/eclip
-
2
Unique Molecular Identifiers (UMIs) were extracted from raw sequencing reads with umi_tools extract
UMI-tools -
3
Post-umi-extracted reads were trimmed for adapter sequences and barcode sequences (eCLIP samples) using cutadapt.
cutadapt -
4
Trimmed reads were mapped against RepBase with STAR to remove reads mapping to repetitive sequences (--outFilterMultimapNmax 30 --alignEndsType EndToEnd --outFilterMultimapScoreRange 1 --outSAMmode Full --outFilterType BySJout --outSAMtype BAM Unsorted --outFilterScoreMin 10 --outReadsUnmapped Fastx --outSAMattributes All)
-
5
Remaining reads were mapped to the appropriate genome build (GRCh38) using STAR aligner (--outFilterMultimapNmax 1 --alignEndsType EndToEnd --outFilterMultimapScoreRange 1 --outSAMmode Full --outFilterType BySJout --outSAMtype BAM Unsorted --outFilterScoreMin 10 --outReadsUnmapped Fastx --outSAMattributes All)
-
6
Uniquely mapped reads were removed of PCR duplicates with umi_tools
UMI-tools -
7
Peak clusters were identified with CLIPper, available at: https://github.com/YeoLab/clipper
CLIPper -
8
Clusters enriched over corresponding size-matched input (SMInput) were identified using a custom Perl script, available in the main eCLIP repository as: overlap_peakfi_with_bam.pl
-
9
Overlapping enriched clusters (peaks) were merged with a custom perl script, available in the main eCLIP repository as: compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl
Raw Source Text
Data was processed using the eCLIP pipeline and available at: http://github.com/yeolab/eclip Unique Molecular Identifiers (UMIs) were extracted from raw sequencing reads with umi_tools extract Post-umi-extracted reads were trimmed for adapter sequences and barcode sequences (eCLIP samples) using cutadapt. Trimmed reads were mapped against RepBase with STAR to remove reads mapping to repetitive sequences (--outFilterMultimapNmax 30 --alignEndsType EndToEnd --outFilterMultimapScoreRange 1 --outSAMmode Full --outFilterType BySJout --outSAMtype BAM Unsorted --outFilterScoreMin 10 --outReadsUnmapped Fastx --outSAMattributes All) Remaining reads were mapped to the appropriate genome build (GRCh38) using STAR aligner (--outFilterMultimapNmax 1 --alignEndsType EndToEnd --outFilterMultimapScoreRange 1 --outSAMmode Full --outFilterType BySJout --outSAMtype BAM Unsorted --outFilterScoreMin 10 --outReadsUnmapped Fastx --outSAMattributes All) Uniquely mapped reads were removed of PCR duplicates with umi_tools Peak clusters were identified with CLIPper, available at: https://github.com/YeoLab/clipper Clusters enriched over corresponding size-matched input (SMInput) were identified using a custom Perl script, available in the main eCLIP repository as: overlap_peakfi_with_bam.pl Overlapping enriched clusters (peaks) were merged with a custom perl script, available in the main eCLIP repository as: compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl Assembly: GRCh38 Supplementary files format and content: bigwigs contain RPM-normalized read densities of uniquely-mapped reads Supplementary files format and content: BED files contain CLIPper peak clusters. Columns 4 and 5 describe the -log10(p-value) and log2(fold) enrichment IP over corresponding SMInput. Supplementary files format and content: Tab-delimited text file contains the translation efficiency table comparing "rp" to "rnaseq" samples