GSE167466 Processing Pipeline

RIP-Seq code_examples 18 steps

Publication

Crosstalk between CRISPR-Cas9 and the human transcriptome.

Nature communications (2022) — PMID 35236841

Dataset

GSE167466

Crosstalk between CRISPR-Cas9 and the human transcriptome

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Sequenced reads were reformatted to include randomers in read headers with umi_tools (1.0.0).

    UMI-tools v1.0.0 GitHub
    $ Bash example
    # Install UMI-tools if not already installed
    # conda install -c bioconda umi-tools
    
    # Reformat sequenced reads to include randomers (UMIs) in read headers.
    # This command extracts UMIs from the start of the read (e.g., 8bp) and appends them to the read header.
    # Adjust --bc-pattern and input/output filenames as per actual data and UMI location.
    umi_tools extract \
        --input input.fastq.gz \
        --output output.fastq.gz \
        --extract-method=string \
        --bc-pattern=NNNNNNNN \
        --log umi_tools_extract.log
  2. 2

    Args: --random-seed 1 --bc-pattern NNNNNNNNNN

    umi_tools (Inferred with models/gemini-2.5-flash) v1.1.2 GitHub
    $ Bash example
    # Install umi_tools (example using conda)
    # conda create -n umi_tools_env umi_tools=1.1.2 -c bioconda -c conda-forge
    # conda activate umi_tools_env
    
    # Example usage of umi_tools extract with the provided arguments.
    # This command assumes input reads are in 'input.fastq.gz' and output will be written to 'output.fastq.gz'.
    # The --bc-pattern NNNNNNNNNN specifies a 10-nucleotide barcode at the start of the read.
    # Adjust input/output filenames and other parameters as per your specific pipeline.
    umi_tools extract \
      --random-seed 1 \
      --bc-pattern NNNNNNNNNN \
      --stdin input.fastq.gz \
      --stdout output.fastq.gz
  3. 3

    Reads were then trimmed with cutadapt (1.14).

    cutadapt v1.14 GitHub
    $ Bash example
    # Install cutadapt (if not already installed)
    # conda install -c bioconda cutadapt=1.14
    
    # Define input and output files
    INPUT_READS="input_reads.fastq.gz"
    TRIMMED_READS="trimmed_reads.fastq.gz"
    
    # Define adapter sequence (replace with actual adapter sequence if known)
    # Common Illumina adapters:
    # TruSeq Universal Adapter: AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCT
    # TruSeq Small RNA 3' Adapter: TGGAATTCTCGGGTGCCAAGGAACTCCAG
    ADAPTER_SEQUENCE="AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC" # Example adapter
    
    # Define quality threshold (e.g., Q20)
    QUALITY_THRESHOLD=20
    
    # Define minimum read length after trimming
    MIN_LENGTH=20
    
    # Execute cutadapt for trimming
    cutadapt -a "${ADAPTER_SEQUENCE}" \
             -q "${QUALITY_THRESHOLD}" \
             -m "${MIN_LENGTH}" \
             -o "${TRIMMED_READS}" \
             "${INPUT_READS}"
  4. 4

    Args: --match-read-wildcards -O 1 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/)

    $ Bash example
    # Install umi_tools if not already available
    # conda install -c bioconda umi_tools
    
    # Download reference adapter file
    # wget https://raw.githubusercontent.com/YeoLab/eclip/master/example/inputs/InvRNA_adapters.fasta
    
    # Define input and output files (placeholders for a paired-end eCLIP experiment)
    INPUT_R1="input_R1.fastq.gz"
    INPUT_R2="input_R2.fastq.gz"
    OUTPUT_R1="output_R1.fastq.gz"
    OUTPUT_R2="output_R2.fastq.gz"
    ADAPTER_FILE="InvRNA_adapters.fasta"
    
    # Execute umi_tools extract command
    # This step extracts Unique Molecular Identifiers (UMIs) from reads
    # and places them in the read header, often involving adapter trimming.
    # The specific patterns for UMI extraction (bc-pattern, read2-pattern)
    # are inferred from the YeoLab eCLIP CWL workflow, assuming an 18bp UMI
    # located in Read 2, and using the provided adapter file.
    umi_tools extract \
        --stdin "${INPUT_R1}" \
        --stdout "${OUTPUT_R1}" \
        --read2-in "${INPUT_R2}" \
        --read2-out "${OUTPUT_R2}" \
        --extract-method=regex \
        --bc-pattern=NNNNNNNNNNNNNNNNNN \
        --read2-pattern="^(?P<discard1>.*)(?P<umi_1>.{18})(?P<discard2>.*)" \
        --adapter-file "${ADAPTER_FILE}" \
        --match-read-wildcards \
        --output-encoding=1 \
        --error-correct-threshold=1 \
        --error-rate=0.1 \
        --quality-cutoff=6 \
        --min-umi-length=18
  5. 5

    Reads were then trimmed once more with cutadapt (1.14) to remove double-ligation events.

    cutadapt v1.14 GitHub
    $ Bash example
    # Install cutadapt (if not already installed)
    # conda install -c bioconda cutadapt=1.14
    
    # Example command for trimming double-ligation events. 
    # Replace 'ADAPTER_SEQUENCE' with the actual adapter sequence for double-ligation events.
    # Replace 'input_R1.fastq.gz', 'input_R2.fastq.gz' with your input files.
    # Replace 'output_R1_trimmed.fastq.gz', 'output_R2_trimmed.fastq.gz' with your desired output files.
    # The specific adapter sequence for 'double-ligation events' is assay-dependent and not provided in the description.
    # A common approach for eCLIP is to remove the 3' adapter, which might be involved in double-ligation.
    # For example, if the 3' adapter is 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCA', it might be used with -a or -b.
    
    cutadapt -a ADAPTER_SEQUENCE -A ADAPTER_SEQUENCE \
             -o output_R1_trimmed.fastq.gz -p output_R2_trimmed.fastq.gz \
             input_R1.fastq.gz input_R2.fastq.gz
  6. 6

    Args: --match-read-wildcards -O 5 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/)

    eCLIP vNot explicitly versioned, inferred from `clipper` repository's commit history (main logic stable since ~2019) GitHub
    $ Bash example
    # Installation (if not already installed)
    # It is recommended to install clipper in a dedicated conda environment:
    # conda create -n clipper_env python=3.8
    # conda activate clipper_env
    # git clone https://github.com/yeolab/clipper.git
    # cd clipper
    # pip install -r requirements.txt
    # # Ensure the clipper.py script is executable or called with python
    # # For example, add the clipper directory to your PATH:
    # # export PATH=$(pwd):$PATH
    # # Or call directly: python /path/to/clipper/clipper.py
    
    # Download reference data specified in the description
    # The description mentions 'InvRNA*.fasta', assuming 'InvRNA.fasta' from the example inputs.
    wget -nc https://raw.githubusercontent.com/YeoLab/eclip/master/example/inputs/InvRNA.fasta
    
    # Placeholder for input BAM files and output prefix, as they were not provided in the description.
    # Replace these with your actual aligned read (input) and control BAM files.
    INPUT_BAM="path/to/your/input.bam"
    CONTROL_BAM="path/to/your/control.bam" # Optional, but recommended for eCLIP peak calling
    OUTPUT_PREFIX="eclip_peaks_output"
    SPECIES="hg38" # Example species, replace with your actual species (e.g., mm10 for mouse)
    
    # Execute the clipper peak calling command
    # Assuming clipper.py is in your PATH or called with its full path
    clipper.py \
        -s "${SPECIES}" \
        -o "${OUTPUT_PREFIX}" \
        "${INPUT_BAM}" \
        "${CONTROL_BAM}" \
        --match-read-wildcards \
        -O 5 \
        --times 1 \
        -e 0.1 \
        --quality-cutoff 6 \
        -m 18 \
        -a InvRNA.fasta
  7. 7

    Trimmed reads were then mapped with STAR (2.4.0i) against a human-specific repeat element database (RepBase 18.05).

    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star
    
    # --- Reference Data Preparation (RepBase 18.05) ---
    # Placeholder for RepBase 18.05 FASTA file. 
    # You would typically download this from a source like GIRI (Genetic Information Research Institute).
    # Example: wget -O RepBase18.05.fasta.gz "http://www.girinst.org/repbase/update/RepBase18.05.fasta.gz"
    # gunzip RepBase18.05.fasta.gz
    
    REPBASE_FASTA="RepBase18.05.fasta"
    STAR_INDEX_DIR="RepBase_STAR_index"
    
    # Create STAR genome index for RepBase
    mkdir -p "${STAR_INDEX_DIR}"
    
    STAR --runMode genomeGenerate \
         --genomeDir "${STAR_INDEX_DIR}" \
         --genomeFastaFiles "${REPBASE_FASTA}" \
         --genomeSAindexNbases 10 # Adjust based on genome size; 10 is suitable for smaller genomes/databases
    
    # --- Mapping Trimmed Reads with STAR ---
    # Input trimmed reads file
    TRIMMED_READS="trimmed_reads.fastq.gz"
    
    # Output prefix for alignment files
    OUTPUT_PREFIX="mapped_to_repbase"
    
    # Run STAR alignment
    STAR --runMode alignReads \
         --genomeDir "${STAR_INDEX_DIR}" \
         --readFilesIn "${TRIMMED_READS}" \
         --readFilesCommand zcat \
         --outFileNamePrefix "${OUTPUT_PREFIX}." \
         --outSAMtype BAM SortedByCoordinate \
         --outFilterMultimapNmax 1 \
         --outFilterMismatchNmax 3 \
         --outFilterScoreMinOverLread 0.66 \
         --outFilterMatchNminOverLread 0.66 \
         --outReadsUnmapped Fastx \
         --runThreadN 8 # Adjust number of threads as needed
    
  8. 8

    Args: --runThreadN 16 \ --genomeDir human_repbase \ --readFilesIn path/to/read1 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 30 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd

    STAR (Inferred with models/gemini-2.5-flash) v(Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install STAR if not already available
    # conda install -c bioconda star
    
    # Define variables for clarity
    # The genome directory should contain a STAR index built from a human reference genome (e.g., GRCh38) 
    # and potentially repeat sequences (e.g., from RepBase) as indicated by 'human_repbase'.
    GENOME_DIR="human_repbase" 
    READ_FILE_1="path/to/read1.fastq.gz" # Placeholder for input reads
    OUT_PREFIX="out_prefix" # Prefix for output files
    
    # Execute STAR alignment
    STAR \
        --runThreadN 16 \
        --genomeDir "${GENOME_DIR}" \
        --readFilesIn "${READ_FILE_1}" \
        --outFileNamePrefix "${OUT_PREFIX}" \
        --outReadsUnmapped Fastx \
        --outSAMtype BAM Unsorted \
        --outSAMattributes All \
        --outSAMunmapped Within \
        --outSAMattrRGline ID:foo \
        --outFilterType BySJout \
        --outFilterMultimapNmax 30 \
        --outFilterMultimapScoreRange 1 \
        --outFilterScoreMin 10 \
        --alignEndsType EndToEnd
  9. 9

    Unmapped reads filtered of repeat elements were then mapped with STAR (2.4.0i) against a human genome (GRCh38).

    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star=2.4.0i
    
    # Define variables
    INPUT_READS="unmapped_filtered_reads.fastq.gz" # Placeholder for the input reads file
    STAR_INDEX_DIR="/path/to/STAR_index/GRCh38" # Placeholder for the STAR genome index directory
    OUTPUT_PREFIX="aligned_reads" # Prefix for output files
    NUM_THREADS=8 # Placeholder for number of threads
    
    # Run STAR alignment
    STAR --runThreadN ${NUM_THREADS} \
         --genomeDir ${STAR_INDEX_DIR} \
         --readFilesIn ${INPUT_READS} \
         --outFileNamePrefix ${OUTPUT_PREFIX} \
         --outSAMtype BAM SortedByCoordinate \
         --outReadsUnmapped Fastx \
         --outFilterMultimapNmax 20 \
         --outFilterMismatchNmax 10 \
         --alignIntronMin 20 \
         --alignIntronMax 1000000 \
         --alignMatesGapMax 1000000
  10. 10

    Args: --runThreadN 16 \ --genomeDir genomedir \ --readFilesIn /path/to/read1 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 1 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd

    STAR (Inferred with models/gemini-2.5-flash) v2.7.10a GitHub
    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star
    
    # Define variables
    # Replace with your actual paths and filenames
    GENOME_DIR="/path/to/STAR_index/GRCh38" # Example: STAR genome index directory for human GRCh38
    READ1_FILE="/path/to/your/sample_R1.fastq.gz" # Path to your input FASTQ file (read 1)
    OUTPUT_PREFIX="my_sample_aligned" # Prefix for all output files
    
    # Note: The genome index (GENOME_DIR) must be pre-built using STAR's genomeGenerate command.
    # Example command for genome generation (run once per genome, adjust parameters as needed):
    # STAR --runThreadN 16 --runMode genomeGenerate \
    #      --genomeDir "${GENOME_DIR}" \
    #      --genomeFastaFiles /path/to/GRCh38.primary_assembly.fa \
    #      --sjdbGTFfile /path/to/gencode.v38.annotation.gtf \
    #      --sjdbOverhang 100 # Recommended: (ReadLength - 1)
    
    # Align reads using STAR
    STAR --runThreadN 16 \
         --genomeDir "${GENOME_DIR}" \
         --readFilesIn "${READ1_FILE}" \
         --outFileNamePrefix "${OUTPUT_PREFIX}" \
         --outReadsUnmapped Fastx \
         --outSAMtype BAM Unsorted \
         --outSAMattributes All \
         --outSAMunmapped Within \
         --outSAMattrRGline ID:foo \
         --outFilterType BySJout \
         --outFilterMultimapNmax 1 \
         --outFilterMultimapScoreRange 1 \
         --outFilterScoreMin 10 \
         --alignEndsType EndToEnd
  11. 11

    Aligned reads were sorted with samtools (1.6)

    samtools v1.6 GitHub
    $ Bash example
    # Install samtools (if not already installed)
    # conda install -c bioconda samtools=1.6
    
    # Sort aligned reads
    samtools sort -o sorted_reads.bam aligned_reads.bam
  12. 12

    Sorted reads were collapsed with umi_tools (1.0.0).

    UMI-tools v1.0.0 GitHub
    $ Bash example
    # Install umi_tools if not already installed
    # conda install -c bioconda umi-tools
    
    # Placeholder for input sorted BAM file (e.g., from alignment)
    INPUT_BAM="sorted_aligned_reads.bam"
    # Placeholder for output collapsed BAM file
    OUTPUT_BAM="collapsed_reads.bam"
    # Placeholder for log file
    LOG_FILE="umi_tools_collapse.log"
    
    # Collapse sorted reads using umi_tools
    # The 'collapse' command groups reads with identical UMIs and mapping positions.
    # The '--method directional' is a common default for collapsing, considering strand information.
    # The '--umi-separator' might be needed if UMIs are not in the read name (e.g., in a separate tag).
    # Assuming UMIs are already extracted and present in the read names or a BAM tag.
    umi_tools collapse \
        -I "${INPUT_BAM}" \
        -S "${OUTPUT_BAM}" \
        --log "${LOG_FILE}" \
        --log-level INFO \
        --method directional
  13. 13

    Args: --random-seed 1 --method unique

    (Inferred with models/gemini-2.5-flash)
    $ Bash example
    some_tool --random-seed 1 --method unique
  14. 14

    BAM files were used to identify peak clusters with Clipper (1.2.2).

    CLIPper v1.2.2 GitHub
    $ Bash example
    # Define input and output files
    INPUT_BAM="sample_ip.bam" # Placeholder for your IP BAM file
    CONTROL_BAM="sample_input.bam" # Placeholder for your control/input BAM file
    OUTPUT_DIR="clipper_peaks_output"
    GENOME_SIZE_FILE="hg38.chrom.sizes" # Placeholder for human hg38 genome sizes
    
    # --- Installation (commented out) ---
    # It is recommended to run CLIPper within a dedicated Python environment.
    # You can clone the repository and install dependencies:
    # git clone https://github.com/yeolab/clipper.git
    # cd clipper
    # pip install -r requirements.txt
    # Or, if available via pip:
    # pip install clipper
    
    # --- Reference Data (commented out) ---
    # Download hg38.chrom.sizes if not available.
    # This file is crucial for CLIPper to determine chromosome lengths.
    # mkdir -p references
    # wget -O references/${GENOME_SIZE_FILE} http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes
    # GENOME_SIZE_FILE="references/${GENOME_SIZE_FILE}" # Update path if downloaded
    
    # Create output directory
    mkdir -p "${OUTPUT_DIR}"
    
    # Execute CLIPper to identify peak clusters
    # Parameters:
    # -b: Input BAM file (IP sample)
    # -c: Control BAM file (e.g., size-matched input or IgG control)
    # -g: Genome size file (e.g., hg38.chrom.sizes)
    # -o: Output directory for results
    # -p: Peak type (e.g., 'CLIP' for eCLIP, 'ChIP' for ChIP-seq)
    # -f: False Discovery Rate (FDR) threshold for peak calling (e.g., 0.05)
    # Note: Other parameters like -s (size factor file) might be relevant depending on experimental design.
    python clipper.py \
        -b "${INPUT_BAM}" \
        -c "${CONTROL_BAM}" \
        -g "${GENOME_SIZE_FILE}" \
        -o "${OUTPUT_DIR}" \
        -p CLIP \
        -f 0.05
  15. 15

    Args: --species GRCh38 --bam path/to/input.bam --timeout 3600000 --maxgenes 1000000 --save-pickle --outfile path/to/output.bam

    Python-based BAM processing script (Inferred with models/gemini-2.5-flash) vUnknown GitHub
    $ Bash example
    # This command executes a Python-based script or tool for processing BAM files.
    # It likely involves operations related to gene features, given the '--maxgenes' argument,
    # and saves intermediate data using Python's pickle format.
    # The '--species GRCh38' argument indicates that a GRCh38 genome annotation file (e.g., GTF/GFF)
    # would be a required reference dataset for this tool. For GRCh38, common sources include Ensembl or GENCODE.
    # Please replace 'process_bam_script.py' with the actual script or tool name.
    # Also, ensure the GRCh38 annotation file (e.g., 'path/to/GRCh38.gtf') is accessible if required by the script.
    
    process_bam_script.py \
        --species GRCh38 \
        --bam path/to/input.bam \
        --timeout 3600000 \
        --maxgenes 1000000 \
        --save-pickle \
        --outfile path/to/output.bam
  16. 16

    Peak clusters were normalized using BAM files for IP against BAM files for INPUT with peaksnormalize.pl (overlap_peakfi_with_bam_PE.pl), included in eclip 0.1.5+.

    $ Bash example
    # Clone the eCLIP repository if not already available
    # git clone https://github.com/yeolab/eclip.git
    # export PATH=$PATH:/path/to/eclip/bin
    
    # Define input files (placeholders)
    # Replace with your actual peak file (e.g., BED or narrowPeak format)
    PEAK_FILE="your_peak_clusters.bed"
    # Replace with your actual IP BAM file
    IP_BAM="your_ip_sample.bam"
    # Replace with your actual INPUT BAM file
    INPUT_BAM="your_input_sample.bam"
    # Define an output prefix for the normalized peak file
    OUTPUT_PREFIX="normalized_peak_clusters"
    
    # Execute the peaksnormalize.pl script to normalize peak clusters
    # The script takes a peak file, an IP BAM file, an INPUT BAM file, and an output prefix.
    peaksnormalize.pl "${PEAK_FILE}" "${IP_BAM}" "${INPUT_BAM}" "${OUTPUT_PREFIX}"
  17. 17

    Overlapping normalized peak regions were merged with compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl, included within eclip-0.1.5+

    $ Bash example
    # Install Perl (if not already available)
    # conda install -c bioconda perl
    
    # Clone the merge_peaks repository which contains the script
    # git clone https://github.com/yeolab/merge_peaks.git
    # cd merge_peaks
    
    # Example usage: Merge overlapping normalized peak regions from replicates
    # Replace normalized_peaks_repX.bed with your actual input peak files.
    # The script typically takes multiple input BED files and outputs a merged BED file.
    perl compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl \
        normalized_peaks_rep1.bed \
        normalized_peaks_rep2.bed \
        normalized_peaks_rep3.bed \
        > merged_replicate_overlapping_peaks.bed
  18. 18

    Normalized peak (compressed.bed) files were ranked by entropy score (make_informationcontent_from_peaks.pl included within the merge_peaks pipeline) and used as inputs to IDR (2.0.2) to determine reproducible peaks.

    IDR v2.0.2 GitHub
    $ Bash example
    # Assuming replicate1.compressed.bed and replicate2.compressed.bed are the normalized peak files
    # and we need to rank them by entropy score.
    # make_informationcontent_from_peaks.pl is included within the merge_peaks pipeline.
    # Installation for merge_peaks (containing make_informationcontent_from_peaks.pl):
    # git clone https://github.com/yeolab/merge_peaks.git
    # export PATH=$PATH:/path/to/merge_peaks/scripts
    
    # Rank replicate 1 peaks by entropy score
    make_informationcontent_from_peaks.pl replicate1.compressed.bed replicate1.ranked.bed
    
    # Rank replicate 2 peaks by entropy score
    make_informationcontent_from_peaks.pl replicate2.compressed.bed replicate2.ranked.bed
    
    # IDR installation:
    # conda install -c bioconda idr=2.0.2
    
    # Run IDR with the ranked peak files to determine reproducible peaks.
    # The --rank signal.value parameter is common in eCLIP IDR pipelines, referring to the score column
    # (which is the entropy score in this case, output by make_informationcontent_from_peaks.pl).
    idr --samples replicate1.ranked.bed replicate2.ranked.bed \
        --output-file-prefix idr_reproducible_peaks \
        --rank signal.value \
        --plot \
        --log-output-file idr.log

Tools Used

Raw Source Text
Sequenced reads were reformatted to include randomers in read headers with umi_tools (1.0.0). Args: --random-seed 1 --bc-pattern NNNNNNNNNN
Reads were then trimmed with cutadapt (1.14). Args: --match-read-wildcards -O 1 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/)
Reads were then trimmed once more with cutadapt (1.14) to remove double-ligation events. Args: --match-read-wildcards -O 5 --times 1 -e 0.1 --quality-cutoff 6 -m 18  -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/)
Trimmed reads were then mapped with STAR (2.4.0i) against a human-specific repeat element database (RepBase 18.05). Args: --runThreadN 16 \  --genomeDir human_repbase \  --readFilesIn path/to/read1 \  --outFileNamePrefix out_prefix \  --outReadsUnmapped Fastx \  --outSAMtype BAM Unsorted \  --outSAMattributes All \  --outSAMunmapped Within \  --outSAMattrRGline ID:foo \  --outFilterType BySJout \  --outFilterMultimapNmax 30 \  --outFilterMultimapScoreRange 1 \  --outFilterScoreMin 10 \  --alignEndsType EndToEnd
Unmapped reads filtered of repeat elements were then mapped with STAR (2.4.0i) against a human genome (GRCh38). Args: --runThreadN 16 \  --genomeDir genomedir \  --readFilesIn /path/to/read1 \  --outFileNamePrefix out_prefix \  --outReadsUnmapped Fastx \  --outSAMtype BAM   Unsorted \  --outSAMattributes All \  --outSAMunmapped Within \  --outSAMattrRGline ID:foo \  --outFilterType BySJout \  --outFilterMultimapNmax 1 \  --outFilterMultimapScoreRange 1 \  --outFilterScoreMin 10 \  --alignEndsType EndToEnd
Aligned reads were sorted with samtools (1.6)
Sorted reads were collapsed with umi_tools (1.0.0). Args: --random-seed 1 --method unique
BAM files were used to identify peak clusters with Clipper (1.2.2). Args: --species GRCh38 --bam path/to/input.bam --timeout 3600000 --maxgenes 1000000 --save-pickle --outfile path/to/output.bam
Peak clusters were normalized using BAM files for IP against BAM files for INPUT with peaksnormalize.pl (overlap_peakfi_with_bam_PE.pl), included in eclip 0.1.5+.
Overlapping normalized peak regions were merged with compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl, included within eclip-0.1.5+
Normalized peak (compressed.bed) files were ranked by entropy score (make_informationcontent_from_peaks.pl included within the merge_peaks pipeline) and used as inputs to IDR (2.0.2) to determine reproducible peaks.
Genome_build: GRCh38
← Back to Analysis