GSE166402 Processing Pipeline

RIP-Seq code_examples 18 steps

Publication

Robust single-cell discovery of RNA targets of RNA-binding proteins and ribosomes.

Nature methods (2021) — PMID 33963355

Dataset

GSE166402

Robust single-cell discovery of RNA targets of RNA binding proteins and ribosomes [STAMP_new_eclip]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Sequenced reads were reformatted to include randomers in read headers with umi_tools (1.0.0).

    UMI-tools v1.0.0 GitHub
    $ Bash example
    # Install umi_tools if not already installed
    # conda install -c bioconda umi_tools=1.0.0
    
    # Define input and output file names (placeholders)
    INPUT_R1="input_R1.fastq.gz"
    INPUT_R2="input_R2.fastq.gz"
    OUTPUT_R1="output_R1.fastq.gz"
    OUTPUT_R2="output_R2.fastq.gz"
    LOG_FILE="umi_tools_extract.log"
    
    # Extract UMIs from read sequences and add them to read headers.
    # This command assumes UMIs are 10bp long and located at the start of Read 1.
    # Adjust --bc-pattern if UMI length or location is different (e.g., NNNNNNNNNN for 10bp UMI).
    # If UMIs are in a separate index read, the command would be different (e.g., --bc-pattern-r2).
    umi_tools extract \
        --bc-pattern=NNNNNNNNNN \
        --read1-in "${INPUT_R1}" \
        --read2-in "${INPUT_R2}" \
        --read1-out "${OUTPUT_R1}" \
        --read2-out "${OUTPUT_R2}" \
        --log "${LOG_FILE}"
  2. 2

    Args: --random-seed 1 --bc-pattern NNNNNNNNNN

    umi_tools (Inferred with models/gemini-2.5-flash) v1.1.2 GitHub
    $ Bash example
    # Install umi_tools
    # conda install -c bioconda umi_tools=1.1.2
    
    # Execute umi_tools extract to extract UMIs based on the pattern
    umi_tools extract --random-seed 1 --bc-pattern NNNNNNNNNN -I input.fastq.gz -S output.fastq.gz
  3. 3

    Reads were then trimmed with cutadapt (1.14).

    cutadapt v1.14 GitHub
    $ Bash example
    # Install cutadapt via conda
    # conda create -n cutadapt_env cutadapt=1.14 -y
    # conda activate cutadapt_env
    
    # Example command for trimming adapters and quality
    # Replace 'input_reads.fastq.gz' with your actual input file path.
    # Replace 'trimmed_reads.fastq.gz' with your desired output file path.
    # The adapter sequence 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCA' is a common Illumina 3' adapter. Adjust if a different adapter was used.
    # -q 20: Trims low-quality bases from the 3' end with a quality threshold of 20.
    # -m 20: Discards reads shorter than 20 bp after trimming.
    cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -q 20 -m 20 -o trimmed_reads.fastq.gz input_reads.fastq.gz
  4. 4

    Args: --match-read-wildcards -O 1 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/)

    $ Bash example
    # Install cutadapt
    # conda install -c bioconda cutadapt=4.0
    
    # Download adapter sequence file (if not already present)
    # The description refers to 'InvRNA*.fasta', and the example inputs contain 'InvRNA_adapters.fasta'.
    # wget https://raw.githubusercontent.com/YeoLab/eclip/master/example/inputs/InvRNA_adapters.fasta
    
    # Example input files (replace 'raw_reads.fastq.gz' with your actual input FASTQ file)
    # raw_reads.fastq.gz: Input FASTQ file containing sequencing reads
    
    cutadapt \
      --match-read-wildcards \
      -O 1 \
      --times 1 \
      -e 0.1 \
      --quality-cutoff 6 \
      -m 18 \
      -a InvRNA_adapters.fasta \
      -o trimmed_reads.fastq.gz \
      raw_reads.fastq.gz
  5. 5

    Reads were then trimmed once more with cutadapt (1.14) to remove double-ligation events.

    cutadapt v1.14 GitHub
    $ Bash example
    # Install cutadapt (example using conda)
    # conda install -c bioconda cutadapt=1.14
    
    # Define input and output file paths
    INPUT_R1="input_reads_R1.fastq.gz"
    INPUT_R2="input_reads_R2.fastq.gz"
    OUTPUT_R1="trimmed_reads_R1.fastq.gz"
    OUTPUT_R2="trimmed_reads_R2.fastq.gz"
    
    # Define the adapter sequence to remove.
    # This sequence should correspond to the adapter involved in double-ligation events.
    # For eCLIP, this is typically the 3' adapter sequence.
    # Example: ADAPTER_SEQUENCE="AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC"
    ADAPTER_SEQUENCE="YOUR_3_PRIME_ADAPTER_SEQUENCE"
    
    # Trim reads to remove double-ligation events using cutadapt
    # -a ADAPTER_SEQUENCE: Trims the 3' end of R1 if ADAPTER_SEQUENCE is found.
    # -A ADAPTER_SEQUENCE: Trims the 3' end of R2 if ADAPTER_SEQUENCE is found.
    # -o: Output file for R1
    # -p: Output file for R2 (paired-end)
    cutadapt -a "${ADAPTER_SEQUENCE}" -A "${ADAPTER_SEQUENCE}" \
             -o "${OUTPUT_R1}" -p "${OUTPUT_R2}" \
             "${INPUT_R1}" "${INPUT_R2}"
  6. 6

    Args: --match-read-wildcards -O 5 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/)

    $ Bash example
    # Install cutadapt if not already installed
    # conda install -c bioconda cutadapt=1.18
    
    # Download example input files (replace with actual paths if already available)
    # wget -P . https://raw.githubusercontent.com/YeoLab/eclip/master/example/inputs/InvRNA_adapter.fasta
    # wget -P . https://raw.githubusercontent.com/YeoLab/eclip/master/example/inputs/InvRNA_R1.fastq.gz
    
    cutadapt \
      -O 5 \
      --quality-cutoff 6 \
      -m 18 \
      -a file:InvRNA_adapter.fasta \
      -o InvRNA_R1_trimmed.fastq.gz \
      InvRNA_R1.fastq.gz
  7. 7

    Trimmed reads were then mapped with STAR (2.4.0i) against a human-specific repeat element database (RepBase 18.05).

    $ Bash example
    # Install STAR if not already available
    # conda install -c bioconda star=2.4.0i
    
    # Define variables
    TRIMMED_READS="trimmed_reads.fastq.gz" # Placeholder for actual trimmed reads file
    REPBASE_STAR_INDEX="/path/to/RepBase_18.05_STAR_index" # Placeholder for the STAR index built from RepBase 18.05
    OUTPUT_DIR="star_repbase_mapping"
    THREADS=8 # Adjust as needed
    
    # Create output directory
    mkdir -p "${OUTPUT_DIR}"
    
    # Run STAR mapping
    STAR --runThreadN "${THREADS}" \
         --genomeDir "${REPBASE_STAR_INDEX}" \
         --readFilesIn "${TRIMMED_READS}" \
         --outFileNamePrefix "${OUTPUT_DIR}/star_repbase_" \
         --outSAMtype BAM SortedByCoordinate \
         --outBAMcompression 6
  8. 8

    Args: --runThreadN 16 \ --genomeDir human_repbase \ --readFilesIn path/to/read1 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 30 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd

    STAR (Inferred with models/gemini-2.5-flash) vNot specified GitHub
    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star
    
    # Define variables for clarity
    # Replace 'path/to/human_repbase_STAR_index' with the actual path to your STAR genome index for human_repbase.
    GENOME_DIR="path/to/human_repbase_STAR_index"
    # Replace 'path/to/read1.fastq.gz' with the actual path to your input read file(s).
    # For paired-end reads, it would be "path/to/read1.fastq.gz path/to/read2.fastq.gz"
    READ_FILES="path/to/read1.fastq.gz"
    OUT_PREFIX="out_prefix"
    
    # Execute STAR alignment
    STAR \
        --runThreadN 16 \
        --genomeDir "${GENOME_DIR}" \
        --readFilesIn "${READ_FILES}" \
        --outFileNamePrefix "${OUT_PREFIX}" \
        --outReadsUnmapped Fastx \
        --outSAMtype BAM Unsorted \
        --outSAMattributes All \
        --outSAMunmapped Within \
        --outSAMattrRGline ID:foo \
        --outFilterType BySJout \
        --outFilterMultimapNmax 30 \
        --outFilterMultimapScoreRange 1 \
        --outFilterScoreMin 10 \
        --alignEndsType EndToEnd
  9. 9

    Unmapped reads filtered of repeat elements were then mapped with STAR (2.4.0i) against a human genome (hg19).

    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star=2.4.0i
    
    # Define variables
    STAR_VERSION="2.4.0i"
    GENOME_DIR="/path/to/hg19_star_index" # Placeholder for indexed hg19 genome directory
    READS_FILE="input_unmapped_filtered_reads.fastq.gz" # Placeholder for input FASTQ file
    OUTPUT_PREFIX="aligned_reads_hg19"
    
    # --- Genome Indexing (Run once for hg19 if index doesn't exist) ---
    # This step requires the hg19 FASTA and GTF files to build the STAR index.
    # Example:
    # GENOME_FASTA="/path/to/hg19.fa"
    # GENOME_GTF="/path/to/hg19.gtf"
    # mkdir -p "${GENOME_DIR}"
    # STAR --runMode genomeGenerate \
    #      --genomeDir "${GENOME_DIR}" \
    #      --genomeFastaFiles "${GENOME_FASTA}" \
    #      --sjdbGTFfile "${GENOME_GTF}" \
    #      --sjdbOverhang 100 \
    #      --runThreadN 8 # Adjust threads as needed based on available resources
    
    # --- Alignment Step ---
    # Unmapped reads filtered of repeat elements were then mapped with STAR (2.4.0i) against a human genome (hg19).
    STAR --runMode alignReads \
         --genomeDir "${GENOME_DIR}" \
         --readFilesIn "${READS_FILE}" \
         --readFilesCommand zcat \
         --outFileNamePrefix "${OUTPUT_PREFIX}." \
         --outSAMtype BAM SortedByCoordinate \
         --outSAMunmapped None \
         --runThreadN 8 # Adjust threads as needed based on available resources
    
  10. 10

    Args: --runThreadN 16 \ --genomeDir genomedir \ --readFilesIn /path/to/read1 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 1 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd

    STAR (Inferred with models/gemini-2.5-flash) v2.7.10a GitHub
    $ Bash example
    # Install STAR (example using conda)
    # conda install -c bioconda star
    
    # Define variables
    # Placeholder for human GRCh38 genome index. Replace with your actual path.
    GENOME_DIR="/path/to/STAR_genome_index_GRCh38"
    # Placeholder for input FASTQ files. Assuming paired-end reads based on common RNA-seq practices and 'readFilesIn' plural.
    READ1_FILE="/path/to/sample_R1.fastq.gz"
    READ2_FILE="/path/to/sample_R2.fastq.gz"
    # Placeholder for output file prefix.
    OUTPUT_PREFIX="sample_aligned_"
    
    # Execute STAR alignment
    STAR \
      --runThreadN 16 \
      --genomeDir "${GENOME_DIR}" \
      --readFilesIn "${READ1_FILE}" "${READ2_FILE}" \
      --outFileNamePrefix "${OUTPUT_PREFIX}" \
      --outReadsUnmapped Fastx \
      --outSAMtype BAM Unsorted \
      --outSAMattributes All \
      --outSAMunmapped Within \
      --outSAMattrRGline ID:foo \
      --outFilterType BySJout \
      --outFilterMultimapNmax 1 \
      --outFilterMultimapScoreRange 1 \
      --outFilterScoreMin 10 \
      --alignEndsType EndToEnd
  11. 11

    Aligned reads were sorted with samtools (1.6)

    samtools v1.6 GitHub
    $ Bash example
    # Install samtools (if not already installed)
    # conda install -c bioconda samtools=1.6
    
    # Sort aligned reads (e.g., BAM file)
    samtools sort -o sorted_reads.bam aligned_reads.bam
    
    # Index the sorted BAM file (common practice after sorting)
    samtools index sorted_reads.bam
  12. 12

    Sorted reads were collapsed with umi_tools (1.0.0).

    UMI-tools v1.0.0 GitHub
    $ Bash example
    # Install umi_tools (if not already installed)
    # conda install -c bioconda umi_tools=1.0.0
    
    # Placeholder for input and output files
    # Replace 'input_sorted.bam' with your actual sorted BAM file.
    # Replace 'output_deduplicated.bam' with your desired output file name.
    # This command assumes UMIs are already extracted and stored in the 'RX' tag (default for umi_tools extract).
    # The '--method=directional' is a common and robust deduplication strategy.
    
    umi_tools dedup \
      --input input_sorted.bam \
      --output output_deduplicated.bam \
      --extract-umi-method=tag \
      --umi-tag=RX \
      --method=directional
  13. 13

    Args: --random-seed 1 --method unique

    Generic Data Processing Tool (Inferred with models/gemini-2.5-flash) vUnknown
    $ Bash example
    # This command represents a step in a bioinformatics pipeline.
    # The specific tool is not explicitly stated in the description.
    # The arguments '--random-seed 1' and '--method unique' suggest a data processing
    # or analysis step involving a random seed for reproducibility and a 'unique'
    # method for handling or identifying unique features/items.
    #
    # Please replace 'your_actual_tool_name' with the specific tool
    # that uses these arguments in your pipeline context.
    your_actual_tool_name --random-seed 1 --method unique
  14. 14

    BAM files were used to identify peak clusters with Clipper (1.2.2).

    CLIPper v1.2.2 GitHub
    $ Bash example
    bash
    # Install CLIPper (if not already installed)
    # conda install -c bioconda clipper
    
    # Placeholder for genome size file (e.g., for hg38)
    # Download hg38.chrom.sizes from UCSC or generate it
    # Example: wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes
    
    GENOME_SIZE_FILE="hg38.chrom.sizes" # Replace with actual path to your genome size file
    INPUT_BAM="input.bam" # Replace with your input BAM file
    OUTPUT_PREFIX="output_peaks" # Prefix for output files
    
    # Execute CLIPper to identify peak clusters
    # Using common parameters for eCLIP peak calling (FDR 0.05, window size 20, extension size 100)
    clipper -b "${INPUT_BAM}" \
            -s "${GENOME_SIZE_FILE}" \
            -o "${OUTPUT_PREFIX}" \
            -f 0.05 \
            -w 20 \
            -e 100 \
            -p 8 # Example: use 8 processes
    
  15. 15

    Args: --species hg19 --bam path/to/input.bam --timeout 3600000 --maxgenes 1000000 --save-pickle --outfile path/to/output.bam

    clipper (Inferred with models/gemini-2.5-flash) vNot specified (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Installation instructions (commented out)
    # Clone the clipper repository
    # git clone https://github.com/yeolab/clipper.git
    # cd clipper
    # Install dependencies (e.g., via pip or conda)
    # conda create -n clipper_env python=3.8
    # conda activate clipper_env
    # pip install -r requirements.txt
    
    # Ensure the clipper script is executable or run with python.
    # The --species hg19 argument indicates the human genome build 19.
    # For clipper, this typically refers to a pre-indexed genome directory or a configuration
    # that points to the hg19 genome FASTA and its annotations (e.g., GTF/GFF).
    # Make sure the necessary reference files for hg19 are accessible to clipper.
    
    # Execution command
    python clipper.py \
        --species hg19 \
        --bam path/to/input.bam \
        --timeout 3600000 \
        --maxgenes 1000000 \
        --save-pickle \
        --outfile path/to/output.bam
  16. 16

    Peak clusters were normalized using BAM files for IP against BAM files for INPUT with peaksnormalize.pl (overlap_peakfi_with_bam_PE.pl), included in eclip 0.1.5+.

    $ Bash example
    # Install eclip (if not already installed)
    # The eclip repository (https://github.com/yeolab/eclip) contains the peaksnormalize.pl script.
    # Ensure Perl and necessary dependencies (e.g., bedtools, samtools) are available.
    # For example, using conda:
    # conda create -n eclip_env python=3.8
    # conda activate eclip_env
    # conda install -c bioconda bedtools samtools
    # git clone https://github.com/yeolab/eclip.git
    # export PATH=$PATH:$(pwd)/eclip/bin
    
    # Define input files (placeholders based on description)
    IP_BAM="ip_sample.bam" # BAM file for IP sample
    INPUT_BAM="input_sample.bam" # BAM file for INPUT sample
    PEAK_FILE="input_peaks.bed" # File containing peak clusters (e.g., BED format)
    OUTPUT_PREFIX="normalized_peaks" # Prefix for output files
    
    # Execute the peak normalization command
    peaksnormalize.pl "${PEAK_FILE}" "${IP_BAM}" "${INPUT_BAM}" "${OUTPUT_PREFIX}"
  17. 17

    Overlapping normalized peak regions were merged with compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl, included within eclip-0.1.5+

    $ Bash example
    # Install dependencies
    # This script is part of the 'merge_peaks' repository, which is often used within the eCLIP pipeline.
    # It may rely on standard Perl modules and external tools like bedtools.
    # conda install -c bioconda bedtools
    
    # Clone the merge_peaks repository to obtain the script
    # git clone https://github.com/yeolab/merge_peaks.git
    # SCRIPT_DIR="merge_peaks" # Directory where the repository was cloned
    
    # Placeholder for input normalized peak regions from replicates
    # These would be generated by previous steps in the eCLIP pipeline (e.g., peak calling and normalization)
    # Example: two replicate peak files in BED format
    INPUT_REPLICATE1_PEAKS="sample_rep1_normalized_peaks.bed"
    INPUT_REPLICATE2_PEAKS="sample_rep2_normalized_peaks.bed"
    # If more replicates, add them as additional arguments: INPUT_REPLICATE3_PEAKS="sample_rep3_normalized_peaks.bed"
    
    # Output merged peak file
    OUTPUT_MERGED_PEAKS="sample_merged_replicate_overlapping_peaks.bed"
    
    # Path to the script
    # Adjust SCRIPT_PATH if the script is in your system's PATH or a different location
    SCRIPT_PATH="${SCRIPT_DIR}/compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl"
    
    # Execute the script to merge overlapping normalized peak regions
    # The script typically takes multiple input BED files as arguments and outputs a single merged BED file to stdout.
    perl "${SCRIPT_PATH}" "${INPUT_REPLICATE1_PEAKS}" "${INPUT_REPLICATE2_PEAKS}" > "${OUTPUT_MERGED_PEAKS}"
  18. 18

    Normalized peak (compressed.bed) files were ranked by entropy score (make_informationcontent_from_peaks.pl included within the merge_peaks pipeline) and used as inputs to IDR (2.0.2) to determine reproducible peaks.

    IDR v2.0.2 GitHub
    $ Bash example
    # Install IDR (if not already installed)
    # conda install -c bioconda idr
    
    # Install merge_peaks (if not already installed)
    # git clone https://github.com/yeolab/merge_peaks.git
    # export PATH=$PATH:/path/to/merge_peaks/bin
    
    # Define input peak files (replace with actual file paths)
    REP1_PEAKS="rep1_peaks.compressed.bed"
    REP2_PEAKS="rep2_peaks.compressed.bed"
    OUTPUT_PREFIX="idr_reproducible_peaks"
    
    # Step 1: Rank normalized peak files by entropy score
    # make_informationcontent_from_peaks.pl adds entropy as the 5th column (1-indexed)
    make_informationcontent_from_peaks.pl "${REP1_PEAKS}" > "${REP1_PEAKS}.entropy"
    make_informationcontent_from_peaks.pl "${REP2_PEAKS}" > "${REP2_PEAKS}.entropy"
    
    # Sort the entropy-scored peaks by the 5th column (entropy score) in reverse numerical order
    # IDR expects the score column to be sorted for optimal performance, though it can sort internally.
    # The merge_peaks pipeline often sorts explicitly.
    sort -k5gr "${REP1_PEAKS}.entropy" > "${REP1_PEAKS}.entropy.sorted"
    sort -k5gr "${REP2_PEAKS}.entropy" > "${REP2_PEAKS}.entropy.sorted"
    
    # Step 2: Run IDR to determine reproducible peaks
    # --rank-column 5 specifies that the 5th column (entropy score) should be used for ranking
    idr --samples "${REP1_PEAKS}.entropy.sorted" "${REP2_PEAKS}.entropy.sorted" \
        --output-file "${OUTPUT_PREFIX}" \
        --rank-column 5 \
        --plot \
        --log-output-file "${OUTPUT_PREFIX}.log"
    
    # Clean up intermediate files (optional)
    # rm "${REP1_PEAKS}.entropy" "${REP2_PEAKS}.entropy"
    # rm "${REP1_PEAKS}.entropy.sorted" "${REP2_PEAKS}.entropy.sorted"

Tools Used

Raw Source Text
Sequenced reads were reformatted to include randomers in read headers with umi_tools (1.0.0). Args: --random-seed 1 --bc-pattern NNNNNNNNNN
Reads were then trimmed with cutadapt (1.14). Args: --match-read-wildcards -O 1 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/)
Reads were then trimmed once more with cutadapt (1.14) to remove double-ligation events. Args: --match-read-wildcards -O 5 --times 1 -e 0.1 --quality-cutoff 6 -m 18  -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/)
Trimmed reads were then mapped with STAR (2.4.0i) against a human-specific repeat element database (RepBase 18.05). Args: --runThreadN 16 \  --genomeDir human_repbase \  --readFilesIn path/to/read1 \  --outFileNamePrefix out_prefix \  --outReadsUnmapped Fastx \  --outSAMtype BAM Unsorted \  --outSAMattributes All \  --outSAMunmapped Within \  --outSAMattrRGline ID:foo \  --outFilterType BySJout \  --outFilterMultimapNmax 30 \  --outFilterMultimapScoreRange 1 \  --outFilterScoreMin 10 \  --alignEndsType EndToEnd
Unmapped reads filtered of repeat elements were then mapped with STAR (2.4.0i) against a human genome (hg19). Args: --runThreadN 16 \  --genomeDir genomedir \  --readFilesIn /path/to/read1 \  --outFileNamePrefix out_prefix \  --outReadsUnmapped Fastx \  --outSAMtype BAM   Unsorted \  --outSAMattributes All \  --outSAMunmapped Within \  --outSAMattrRGline ID:foo \  --outFilterType BySJout \  --outFilterMultimapNmax 1 \  --outFilterMultimapScoreRange 1 \  --outFilterScoreMin 10 \  --alignEndsType EndToEnd
Aligned reads were sorted with samtools (1.6)
Sorted reads were collapsed with umi_tools (1.0.0). Args: --random-seed 1 --method unique
BAM files were used to identify peak clusters with Clipper (1.2.2). Args: --species hg19 --bam path/to/input.bam --timeout 3600000 --maxgenes 1000000 --save-pickle --outfile path/to/output.bam
Peak clusters were normalized using BAM files for IP against BAM files for INPUT with peaksnormalize.pl (overlap_peakfi_with_bam_PE.pl), included in eclip 0.1.5+.
Overlapping normalized peak regions were merged with compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl, included within eclip-0.1.5+
Normalized peak (compressed.bed) files were ranked by entropy score (make_informationcontent_from_peaks.pl included within the merge_peaks pipeline) and used as inputs to IDR (2.0.2) to determine reproducible peaks.
Genome_build: hg19
Supplementary_files_format_and_content: BigWig, BED
← Back to Analysis