GSE167466 Processing Pipeline

RIP-Seq code_examples 18 steps

Publication

Crosstalk between CRISPR-Cas9 and the human transcriptome.

Nature communications (2022) — PMID 35236841

Dataset

Crosstalk between CRISPR-Cas9 and the human transcriptome

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Sequenced reads were reformatted to include randomers in read headers with umi_tools (1.0.0).

UMI-tools v1.0.0 GitHub

$ Bash example

# Install UMI-tools if not already installed
# conda install -c bioconda umi-tools

# Reformat sequenced reads to include randomers (UMIs) in read headers.
# This command extracts UMIs from the start of the read (e.g., 8bp) and appends them to the read header.
# Adjust --bc-pattern and input/output filenames as per actual data and UMI location.
umi_tools extract \
    --input input.fastq.gz \
    --output output.fastq.gz \
    --extract-method=string \
    --bc-pattern=NNNNNNNN \
    --log umi_tools_extract.log

View on GitHub

Args: --random-seed 1 --bc-pattern NNNNNNNNNN

umi_tools (Inferred with models/gemini-2.5-flash) v1.1.2 GitHub

$ Bash example

# Install umi_tools (example using conda)
# conda create -n umi_tools_env umi_tools=1.1.2 -c bioconda -c conda-forge
# conda activate umi_tools_env

# Example usage of umi_tools extract with the provided arguments.
# This command assumes input reads are in 'input.fastq.gz' and output will be written to 'output.fastq.gz'.
# The --bc-pattern NNNNNNNNNN specifies a 10-nucleotide barcode at the start of the read.
# Adjust input/output filenames and other parameters as per your specific pipeline.
umi_tools extract \
  --random-seed 1 \
  --bc-pattern NNNNNNNNNN \
  --stdin input.fastq.gz \
  --stdout output.fastq.gz

View on GitHub

Reads were then trimmed with cutadapt (1.14).

cutadapt v1.14 GitHub

$ Bash example

# Install cutadapt (if not already installed)
# conda install -c bioconda cutadapt=1.14

# Define input and output files
INPUT_READS="input_reads.fastq.gz"
TRIMMED_READS="trimmed_reads.fastq.gz"

# Define adapter sequence (replace with actual adapter sequence if known)
# Common Illumina adapters:
# TruSeq Universal Adapter: AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCT
# TruSeq Small RNA 3' Adapter: TGGAATTCTCGGGTGCCAAGGAACTCCAG
ADAPTER_SEQUENCE="AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC" # Example adapter

# Define quality threshold (e.g., Q20)
QUALITY_THRESHOLD=20

# Define minimum read length after trimming
MIN_LENGTH=20

# Execute cutadapt for trimming
cutadapt -a "${ADAPTER_SEQUENCE}" \
         -q "${QUALITY_THRESHOLD}" \
         -m "${MIN_LENGTH}" \
         -o "${TRIMMED_READS}" \
         "${INPUT_READS}"

View on GitHub

Args: --match-read-wildcards -O 1 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/)

eCLIP v1.0.0 GitHub

$ Bash example

# Install umi_tools if not already available
# conda install -c bioconda umi_tools

# Download reference adapter file
# wget https://raw.githubusercontent.com/YeoLab/eclip/master/example/inputs/InvRNA_adapters.fasta

# Define input and output files (placeholders for a paired-end eCLIP experiment)
INPUT_R1="input_R1.fastq.gz"
INPUT_R2="input_R2.fastq.gz"
OUTPUT_R1="output_R1.fastq.gz"
OUTPUT_R2="output_R2.fastq.gz"
ADAPTER_FILE="InvRNA_adapters.fasta"

# Execute umi_tools extract command
# This step extracts Unique Molecular Identifiers (UMIs) from reads
# and places them in the read header, often involving adapter trimming.
# The specific patterns for UMI extraction (bc-pattern, read2-pattern)
# are inferred from the YeoLab eCLIP CWL workflow, assuming an 18bp UMI
# located in Read 2, and using the provided adapter file.
umi_tools extract \
    --stdin "${INPUT_R1}" \
    --stdout "${OUTPUT_R1}" \
    --read2-in "${INPUT_R2}" \
    --read2-out "${OUTPUT_R2}" \
    --extract-method=regex \
    --bc-pattern=NNNNNNNNNNNNNNNNNN \
    --read2-pattern="^(?P<discard1>.*)(?P<umi_1>.{18})(?P<discard2>.*)" \
    --adapter-file "${ADAPTER_FILE}" \
    --match-read-wildcards \
    --output-encoding=1 \
    --error-correct-threshold=1 \
    --error-rate=0.1 \
    --quality-cutoff=6 \
    --min-umi-length=18

View on GitHub

Reads were then trimmed once more with cutadapt (1.14) to remove double-ligation events.

cutadapt v1.14 GitHub

$ Bash example

# Install cutadapt (if not already installed)
# conda install -c bioconda cutadapt=1.14

# Example command for trimming double-ligation events. 
# Replace 'ADAPTER_SEQUENCE' with the actual adapter sequence for double-ligation events.
# Replace 'input_R1.fastq.gz', 'input_R2.fastq.gz' with your input files.
# Replace 'output_R1_trimmed.fastq.gz', 'output_R2_trimmed.fastq.gz' with your desired output files.
# The specific adapter sequence for 'double-ligation events' is assay-dependent and not provided in the description.
# A common approach for eCLIP is to remove the 3' adapter, which might be involved in double-ligation.
# For example, if the 3' adapter is 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCA', it might be used with -a or -b.

cutadapt -a ADAPTER_SEQUENCE -A ADAPTER_SEQUENCE \
         -o output_R1_trimmed.fastq.gz -p output_R2_trimmed.fastq.gz \
         input_R1.fastq.gz input_R2.fastq.gz

View on GitHub

Args: --match-read-wildcards -O 5 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/)

eCLIP vNot explicitly versioned, inferred from `clipper` repository's commit history (main logic stable since ~2019) GitHub

$ Bash example

# Installation (if not already installed)
# It is recommended to install clipper in a dedicated conda environment:
# conda create -n clipper_env python=3.8
# conda activate clipper_env
# git clone https://github.com/yeolab/clipper.git
# cd clipper
# pip install -r requirements.txt
# # Ensure the clipper.py script is executable or called with python
# # For example, add the clipper directory to your PATH:
# # export PATH=$(pwd):$PATH
# # Or call directly: python /path/to/clipper/clipper.py

# Download reference data specified in the description
# The description mentions 'InvRNA*.fasta', assuming 'InvRNA.fasta' from the example inputs.
wget -nc https://raw.githubusercontent.com/YeoLab/eclip/master/example/inputs/InvRNA.fasta

# Placeholder for input BAM files and output prefix, as they were not provided in the description.
# Replace these with your actual aligned read (input) and control BAM files.
INPUT_BAM="path/to/your/input.bam"
CONTROL_BAM="path/to/your/control.bam" # Optional, but recommended for eCLIP peak calling
OUTPUT_PREFIX="eclip_peaks_output"
SPECIES="hg38" # Example species, replace with your actual species (e.g., mm10 for mouse)

# Execute the clipper peak calling command
# Assuming clipper.py is in your PATH or called with its full path
clipper.py \
    -s "${SPECIES}" \
    -o "${OUTPUT_PREFIX}" \
    "${INPUT_BAM}" \
    "${CONTROL_BAM}" \
    --match-read-wildcards \
    -O 5 \
    --times 1 \
    -e 0.1 \
    --quality-cutoff 6 \
    -m 18 \
    -a InvRNA.fasta

View on GitHub

Trimmed reads were then mapped with STAR (2.4.0i) against a human-specific repeat element database (RepBase 18.05).

STAR v2.4.0i GitHub

$ Bash example

# Install STAR (if not already installed)
# conda install -c bioconda star

# --- Reference Data Preparation (RepBase 18.05) ---
# Placeholder for RepBase 18.05 FASTA file. 
# You would typically download this from a source like GIRI (Genetic Information Research Institute).
# Example: wget -O RepBase18.05.fasta.gz "http://www.girinst.org/repbase/update/RepBase18.05.fasta.gz"
# gunzip RepBase18.05.fasta.gz

REPBASE_FASTA="RepBase18.05.fasta"
STAR_INDEX_DIR="RepBase_STAR_index"

# Create STAR genome index for RepBase
mkdir -p "${STAR_INDEX_DIR}"

STAR --runMode genomeGenerate \
     --genomeDir "${STAR_INDEX_DIR}" \
     --genomeFastaFiles "${REPBASE_FASTA}" \
     --genomeSAindexNbases 10 # Adjust based on genome size; 10 is suitable for smaller genomes/databases

# --- Mapping Trimmed Reads with STAR ---
# Input trimmed reads file
TRIMMED_READS="trimmed_reads.fastq.gz"

# Output prefix for alignment files
OUTPUT_PREFIX="mapped_to_repbase"

# Run STAR alignment
STAR --runMode alignReads \
     --genomeDir "${STAR_INDEX_DIR}" \
     --readFilesIn "${TRIMMED_READS}" \
     --readFilesCommand zcat \
     --outFileNamePrefix "${OUTPUT_PREFIX}." \
     --outSAMtype BAM SortedByCoordinate \
     --outFilterMultimapNmax 1 \
     --outFilterMismatchNmax 3 \
     --outFilterScoreMinOverLread 0.66 \
     --outFilterMatchNminOverLread 0.66 \
     --outReadsUnmapped Fastx \
     --runThreadN 8 # Adjust number of threads as needed

View on GitHub

Args: --runThreadN 16 \ --genomeDir human_repbase \ --readFilesIn path/to/read1 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 30 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd

STAR (Inferred with models/gemini-2.5-flash) v(Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install STAR if not already available
# conda install -c bioconda star

# Define variables for clarity
# The genome directory should contain a STAR index built from a human reference genome (e.g., GRCh38) 
# and potentially repeat sequences (e.g., from RepBase) as indicated by 'human_repbase'.
GENOME_DIR="human_repbase" 
READ_FILE_1="path/to/read1.fastq.gz" # Placeholder for input reads
OUT_PREFIX="out_prefix" # Prefix for output files

# Execute STAR alignment
STAR \
    --runThreadN 16 \
    --genomeDir "${GENOME_DIR}" \
    --readFilesIn "${READ_FILE_1}" \
    --outFileNamePrefix "${OUT_PREFIX}" \
    --outReadsUnmapped Fastx \
    --outSAMtype BAM Unsorted \
    --outSAMattributes All \
    --outSAMunmapped Within \
    --outSAMattrRGline ID:foo \
    --outFilterType BySJout \
    --outFilterMultimapNmax 30 \
    --outFilterMultimapScoreRange 1 \
    --outFilterScoreMin 10 \
    --alignEndsType EndToEnd

View on GitHub

Unmapped reads filtered of repeat elements were then mapped with STAR (2.4.0i) against a human genome (GRCh38).

STAR v2.4.0i GitHub

$ Bash example

# Install STAR (if not already installed)
# conda install -c bioconda star=2.4.0i

# Define variables
INPUT_READS="unmapped_filtered_reads.fastq.gz" # Placeholder for the input reads file
STAR_INDEX_DIR="/path/to/STAR_index/GRCh38" # Placeholder for the STAR genome index directory
OUTPUT_PREFIX="aligned_reads" # Prefix for output files
NUM_THREADS=8 # Placeholder for number of threads

# Run STAR alignment
STAR --runThreadN ${NUM_THREADS} \
     --genomeDir ${STAR_INDEX_DIR} \
     --readFilesIn ${INPUT_READS} \
     --outFileNamePrefix ${OUTPUT_PREFIX} \
     --outSAMtype BAM SortedByCoordinate \
     --outReadsUnmapped Fastx \
     --outFilterMultimapNmax 20 \
     --outFilterMismatchNmax 10 \
     --alignIntronMin 20 \
     --alignIntronMax 1000000 \
     --alignMatesGapMax 1000000

View on GitHub

Args: --runThreadN 16 \ --genomeDir genomedir \ --readFilesIn /path/to/read1 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 1 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd

STAR (Inferred with models/gemini-2.5-flash) v2.7.10a GitHub

$ Bash example

# Install STAR (if not already installed)
# conda install -c bioconda star

# Define variables
# Replace with your actual paths and filenames
GENOME_DIR="/path/to/STAR_index/GRCh38" # Example: STAR genome index directory for human GRCh38
READ1_FILE="/path/to/your/sample_R1.fastq.gz" # Path to your input FASTQ file (read 1)
OUTPUT_PREFIX="my_sample_aligned" # Prefix for all output files

# Note: The genome index (GENOME_DIR) must be pre-built using STAR's genomeGenerate command.
# Example command for genome generation (run once per genome, adjust parameters as needed):
# STAR --runThreadN 16 --runMode genomeGenerate \
#      --genomeDir "${GENOME_DIR}" \
#      --genomeFastaFiles /path/to/GRCh38.primary_assembly.fa \
#      --sjdbGTFfile /path/to/gencode.v38.annotation.gtf \
#      --sjdbOverhang 100 # Recommended: (ReadLength - 1)

# Align reads using STAR
STAR --runThreadN 16 \
     --genomeDir "${GENOME_DIR}" \
     --readFilesIn "${READ1_FILE}" \
     --outFileNamePrefix "${OUTPUT_PREFIX}" \
     --outReadsUnmapped Fastx \
     --outSAMtype BAM Unsorted \
     --outSAMattributes All \
     --outSAMunmapped Within \
     --outSAMattrRGline ID:foo \
     --outFilterType BySJout \
     --outFilterMultimapNmax 1 \
     --outFilterMultimapScoreRange 1 \
     --outFilterScoreMin 10 \
     --alignEndsType EndToEnd

View on GitHub

Aligned reads were sorted with samtools (1.6)

samtools v1.6 GitHub

$ Bash example

# Install samtools (if not already installed)
# conda install -c bioconda samtools=1.6

# Sort aligned reads
samtools sort -o sorted_reads.bam aligned_reads.bam

View on GitHub

Sorted reads were collapsed with umi_tools (1.0.0).

UMI-tools v1.0.0 GitHub

$ Bash example

# Install umi_tools if not already installed
# conda install -c bioconda umi-tools

# Placeholder for input sorted BAM file (e.g., from alignment)
INPUT_BAM="sorted_aligned_reads.bam"
# Placeholder for output collapsed BAM file
OUTPUT_BAM="collapsed_reads.bam"
# Placeholder for log file
LOG_FILE="umi_tools_collapse.log"

# Collapse sorted reads using umi_tools
# The 'collapse' command groups reads with identical UMIs and mapping positions.
# The '--method directional' is a common default for collapsing, considering strand information.
# The '--umi-separator' might be needed if UMIs are not in the read name (e.g., in a separate tag).
# Assuming UMIs are already extracted and present in the read names or a BAM tag.
umi_tools collapse \
    -I "${INPUT_BAM}" \
    -S "${OUTPUT_BAM}" \
    --log "${LOG_FILE}" \
    --log-level INFO \
    --method directional

View on GitHub

13
Args: --random-seed 1 --method unique

(Inferred with models/gemini-2.5-flash)
$ Bash example
```
some_tool --random-seed 1 --method unique
```

BAM files were used to identify peak clusters with Clipper (1.2.2).

CLIPper v1.2.2 GitHub

$ Bash example

# Define input and output files
INPUT_BAM="sample_ip.bam" # Placeholder for your IP BAM file
CONTROL_BAM="sample_input.bam" # Placeholder for your control/input BAM file
OUTPUT_DIR="clipper_peaks_output"
GENOME_SIZE_FILE="hg38.chrom.sizes" # Placeholder for human hg38 genome sizes

# --- Installation (commented out) ---
# It is recommended to run CLIPper within a dedicated Python environment.
# You can clone the repository and install dependencies:
# git clone https://github.com/yeolab/clipper.git
# cd clipper
# pip install -r requirements.txt
# Or, if available via pip:
# pip install clipper

# --- Reference Data (commented out) ---
# Download hg38.chrom.sizes if not available.
# This file is crucial for CLIPper to determine chromosome lengths.
# mkdir -p references
# wget -O references/${GENOME_SIZE_FILE} http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes
# GENOME_SIZE_FILE="references/${GENOME_SIZE_FILE}" # Update path if downloaded

# Create output directory
mkdir -p "${OUTPUT_DIR}"

# Execute CLIPper to identify peak clusters
# Parameters:
# -b: Input BAM file (IP sample)
# -c: Control BAM file (e.g., size-matched input or IgG control)
# -g: Genome size file (e.g., hg38.chrom.sizes)
# -o: Output directory for results
# -p: Peak type (e.g., 'CLIP' for eCLIP, 'ChIP' for ChIP-seq)
# -f: False Discovery Rate (FDR) threshold for peak calling (e.g., 0.05)
# Note: Other parameters like -s (size factor file) might be relevant depending on experimental design.
python clipper.py \
    -b "${INPUT_BAM}" \
    -c "${CONTROL_BAM}" \
    -g "${GENOME_SIZE_FILE}" \
    -o "${OUTPUT_DIR}" \
    -p CLIP \
    -f 0.05

View on GitHub

Args: --species GRCh38 --bam path/to/input.bam --timeout 3600000 --maxgenes 1000000 --save-pickle --outfile path/to/output.bam

Python-based BAM processing script (Inferred with models/gemini-2.5-flash) vUnknown GitHub

$ Bash example

# This command executes a Python-based script or tool for processing BAM files.
# It likely involves operations related to gene features, given the '--maxgenes' argument,
# and saves intermediate data using Python's pickle format.
# The '--species GRCh38' argument indicates that a GRCh38 genome annotation file (e.g., GTF/GFF)
# would be a required reference dataset for this tool. For GRCh38, common sources include Ensembl or GENCODE.
# Please replace 'process_bam_script.py' with the actual script or tool name.
# Also, ensure the GRCh38 annotation file (e.g., 'path/to/GRCh38.gtf') is accessible if required by the script.

process_bam_script.py \
    --species GRCh38 \
    --bam path/to/input.bam \
    --timeout 3600000 \
    --maxgenes 1000000 \
    --save-pickle \
    --outfile path/to/output.bam

View on GitHub

Peak clusters were normalized using BAM files for IP against BAM files for INPUT with peaksnormalize.pl (overlap_peakfi_with_bam_PE.pl), included in eclip 0.1.5+.

eCLIP v0.1.5 GitHub

$ Bash example

# Clone the eCLIP repository if not already available
# git clone https://github.com/yeolab/eclip.git
# export PATH=$PATH:/path/to/eclip/bin

# Define input files (placeholders)
# Replace with your actual peak file (e.g., BED or narrowPeak format)
PEAK_FILE="your_peak_clusters.bed"
# Replace with your actual IP BAM file
IP_BAM="your_ip_sample.bam"
# Replace with your actual INPUT BAM file
INPUT_BAM="your_input_sample.bam"
# Define an output prefix for the normalized peak file
OUTPUT_PREFIX="normalized_peak_clusters"

# Execute the peaksnormalize.pl script to normalize peak clusters
# The script takes a peak file, an IP BAM file, an INPUT BAM file, and an output prefix.
peaksnormalize.pl "${PEAK_FILE}" "${IP_BAM}" "${INPUT_BAM}" "${OUTPUT_PREFIX}"

View on GitHub

Overlapping normalized peak regions were merged with compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl, included within eclip-0.1.5+

eCLIP v0.1.5 GitHub

$ Bash example

# Install Perl (if not already available)
# conda install -c bioconda perl

# Clone the merge_peaks repository which contains the script
# git clone https://github.com/yeolab/merge_peaks.git
# cd merge_peaks

# Example usage: Merge overlapping normalized peak regions from replicates
# Replace normalized_peaks_repX.bed with your actual input peak files.
# The script typically takes multiple input BED files and outputs a merged BED file.
perl compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl \
    normalized_peaks_rep1.bed \
    normalized_peaks_rep2.bed \
    normalized_peaks_rep3.bed \
    > merged_replicate_overlapping_peaks.bed

View on GitHub

Normalized peak (compressed.bed) files were ranked by entropy score (make_informationcontent_from_peaks.pl included within the merge_peaks pipeline) and used as inputs to IDR (2.0.2) to determine reproducible peaks.

IDR v2.0.2 GitHub

$ Bash example

# Assuming replicate1.compressed.bed and replicate2.compressed.bed are the normalized peak files
# and we need to rank them by entropy score.
# make_informationcontent_from_peaks.pl is included within the merge_peaks pipeline.
# Installation for merge_peaks (containing make_informationcontent_from_peaks.pl):
# git clone https://github.com/yeolab/merge_peaks.git
# export PATH=$PATH:/path/to/merge_peaks/scripts

# Rank replicate 1 peaks by entropy score
make_informationcontent_from_peaks.pl replicate1.compressed.bed replicate1.ranked.bed

# Rank replicate 2 peaks by entropy score
make_informationcontent_from_peaks.pl replicate2.compressed.bed replicate2.ranked.bed

# IDR installation:
# conda install -c bioconda idr=2.0.2

# Run IDR with the ranked peak files to determine reproducible peaks.
# The --rank signal.value parameter is common in eCLIP IDR pipelines, referring to the score column
# (which is the entropy score in this case, output by make_informationcontent_from_peaks.pl).
idr --samples replicate1.ranked.bed replicate2.ranked.bed \
    --output-file-prefix idr_reproducible_peaks \
    --rank signal.value \
    --plot \
    --log-output-file idr.log

View on GitHub

Tools Used

eCLIP STAR

Raw Source Text

Sequenced reads were reformatted to include randomers in read headers with umi_tools (1.0.0). Args: --random-seed 1 --bc-pattern NNNNNNNNNN
Reads were then trimmed with cutadapt (1.14). Args: --match-read-wildcards -O 1 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/)
Reads were then trimmed once more with cutadapt (1.14) to remove double-ligation events. Args: --match-read-wildcards -O 5 --times 1 -e 0.1 --quality-cutoff 6 -m 18  -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/)
Trimmed reads were then mapped with STAR (2.4.0i) against a human-specific repeat element database (RepBase 18.05). Args: --runThreadN 16 \  --genomeDir human_repbase \  --readFilesIn path/to/read1 \  --outFileNamePrefix out_prefix \  --outReadsUnmapped Fastx \  --outSAMtype BAM Unsorted \  --outSAMattributes All \  --outSAMunmapped Within \  --outSAMattrRGline ID:foo \  --outFilterType BySJout \  --outFilterMultimapNmax 30 \  --outFilterMultimapScoreRange 1 \  --outFilterScoreMin 10 \  --alignEndsType EndToEnd
Unmapped reads filtered of repeat elements were then mapped with STAR (2.4.0i) against a human genome (GRCh38). Args: --runThreadN 16 \  --genomeDir genomedir \  --readFilesIn /path/to/read1 \  --outFileNamePrefix out_prefix \  --outReadsUnmapped Fastx \  --outSAMtype BAM   Unsorted \  --outSAMattributes All \  --outSAMunmapped Within \  --outSAMattrRGline ID:foo \  --outFilterType BySJout \  --outFilterMultimapNmax 1 \  --outFilterMultimapScoreRange 1 \  --outFilterScoreMin 10 \  --alignEndsType EndToEnd
Aligned reads were sorted with samtools (1.6)
Sorted reads were collapsed with umi_tools (1.0.0). Args: --random-seed 1 --method unique
BAM files were used to identify peak clusters with Clipper (1.2.2). Args: --species GRCh38 --bam path/to/input.bam --timeout 3600000 --maxgenes 1000000 --save-pickle --outfile path/to/output.bam
Peak clusters were normalized using BAM files for IP against BAM files for INPUT with peaksnormalize.pl (overlap_peakfi_with_bam_PE.pl), included in eclip 0.1.5+.
Overlapping normalized peak regions were merged with compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl, included within eclip-0.1.5+
Normalized peak (compressed.bed) files were ranked by entropy score (make_informationcontent_from_peaks.pl included within the merge_peaks pipeline) and used as inputs to IDR (2.0.2) to determine reproducible peaks.
Genome_build: GRCh38

← Back to Analysis