GSE173498 Processing Pipeline

OTHER code_examples 19 steps

Publication

Discovery and functional interrogation of SARS-CoV-2 protein-RNA interactions.

Research square (2022) — PMID 35313591

Dataset

GSE173498

Discovery and functional interrogation of the virus and host RNA interactome of SARS-CoV-2 proteins [eCLIP]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Sequenced reads were reformatted to include randomers in read headers with umi_tools (1.0.0).

    UMI-tools v1.0.0 GitHub
    $ Bash example
    # Install UMI-tools if not already installed
    # conda install -c bioconda umi_tools=1.0.0
    
    # Placeholder for input and output files
    # Replace 'input.fastq.gz' with your actual input FASTQ file containing UMIs.
    # Replace 'output.fastq.gz' with your desired output FASTQ file where UMIs are moved to headers.
    # Replace 'NNNNNNNNNN' with the actual UMI barcode pattern. 
    # For example, if a 10bp UMI is at the start of Read 1, use '--bc-pattern="^(?P<umi_1>.{10})"'.
    # If the UMI is in a separate index read, the command structure will be different, 
    # potentially involving '--extract-method=tag' and multiple input files. 
    # This command assumes an inline UMI in the primary input FASTQ file.
    
    umi_tools extract --bc-pattern=NNNNNNNNNN -I input.fastq.gz -S output.fastq.gz --log=umi_tools_extract.log
  2. 2

    Args: --random-seed 1 --bc-pattern NNNNNNNNNN

    demultiplex_fastq.py (Inferred with models/gemini-2.5-flash) v0.1.0 GitHub
    $ Bash example
    # Install Miniconda or Anaconda if not already installed
    # wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
    # bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda
    # export PATH="$HOME/miniconda/bin:$PATH"
    
    # Clone the skipper repository
    # git clone https://github.com/yeolab/skipper.git
    # cd skipper
    
    # Create and activate the conda environment for skipper
    # conda env create -f environment.yaml
    # conda activate skipper
    
    # Example usage of demultiplex_fastq.py
    # Assuming input_read1.fastq.gz and input_read2.fastq.gz are your input files
    # and you are in the 'skipper' directory after cloning.
    # The script will output files like demultiplexed_output_prefix_barcode1.fastq.gz, etc.
    python scripts/demultiplex_fastq.py \
      --random-seed 1 \
      --bc-pattern NNNNNNNNNN \
      -i input_read1.fastq.gz input_read2.fastq.gz \
      -o demultiplexed_output_prefix
  3. 3

    Reads were then trimmed with cutadapt (1.14).

    cutadapt v1.14 GitHub
    $ Bash example
    # Install cutadapt (if not already installed)
    # conda install -c bioconda cutadapt=1.14
    
    # Define input and output files (placeholders)
    INPUT_READS="input_reads.fastq.gz"
    TRIMMED_READS="trimmed_reads.fastq.gz"
    
    # Define adapter sequence (replace with actual adapter sequence if known)
    # Example: Illumina universal adapter
    ADAPTER_SEQUENCE="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA"
    
    # Execute cutadapt for trimming
    # -a: 3' adapter sequence to remove
    # -q 20,20: Trim low-quality bases from both ends with a quality threshold of 20
    # -m 20: Discard reads shorter than 20 bp after trimming
    # -o: Output file for trimmed reads
    cutadapt -a "${ADAPTER_SEQUENCE}" -q 20,20 -m 20 -o "${TRIMMED_READS}" "${INPUT_READS}"
  4. 4

    Args: --match-read-wildcards -O 1 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/)

    $ Bash example
    # Install cutadapt if not already installed
    # conda install -c bioconda cutadapt
    
    # Download the adapter sequence file if not present
    # wget https://raw.githubusercontent.com/YeoLab/eclip/master/example/inputs/InvRNA.fasta
    
    # Execute cutadapt for adapter trimming and quality filtering
    cutadapt \
      --match-read-wildcards \
      -O 1 \
      --times 1 \
      -e 0.1 \
      --quality-cutoff 6 \
      -m 18 \
      -a file:InvRNA.fasta \
      -o trimmed_reads.fastq.gz \
      input_reads.fastq.gz
  5. 5

    Reads were then trimmed once more with cutadapt (1.14) to remove double-ligation events.

    cutadapt v1.14 GitHub
    $ Bash example
    # Install cutadapt (if not already installed)
    # conda install -c bioconda cutadapt=1.14
    
    # Define input and output file paths (placeholders)
    INPUT_READ1="reads_R1.fastq.gz"
    INPUT_READ2="reads_R2.fastq.gz"
    OUTPUT_READ1="trimmed_reads_R1.fastq.gz"
    OUTPUT_READ2="trimmed_reads_R2.fastq.gz"
    
    # Define adapter sequences (placeholders for common Illumina adapters)
    # These sequences should be replaced with the actual adapters used in the experiment
    # For double-ligation events, it's common to trim the sequencing adapter itself.
    ADAPTER_R1="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA"
    ADAPTER_R2="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT"
    
    # Execute cutadapt to remove double-ligation events (adapter sequences)
    # -a for 3' adapter of read 1, -A for 3' adapter of read 2
    # --minimum-length is often used to discard very short reads after trimming
    cutadapt -a "${ADAPTER_R1}" -A "${ADAPTER_R2}" \
             -o "${OUTPUT_READ1}" -p "${OUTPUT_READ2}" \
             --minimum-length 18 \
             "${INPUT_READ1}" "${INPUT_READ2}"
  6. 6

    Args: --match-read-wildcards -O 5 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/)

    $ Bash example
    # Install clipper (if not already installed)
    # conda create -n clipper_env python=3.8
    # conda activate clipper_env
    # pip install clipper
    
    # Download reference annotation file (assuming hg19, adjust if mm10 is needed)
    # wget https://raw.githubusercontent.com/YeoLab/eclip/master/example/inputs/InvRNA_hg19.fasta
    
    # Placeholder for input BAM files. Replace with actual paths to your treated and control BAMs.
    TREATED_BAM="treated.bam"
    CONTROL_BAM="control.bam"
    OUTPUT_DIR="clipper_output"
    
    # Create output directory
    mkdir -p "${OUTPUT_DIR}"
    
    # Run clipper for peak calling
    clipper \
      --match-read-wildcards \
      -O "${OUTPUT_DIR}" \
      --times 1 \
      -e 0.1 \
      --quality-cutoff 6 \
      -m 18 \
      -a InvRNA_hg19.fasta \
      "${TREATED_BAM}" \
      "${CONTROL_BAM}"
  7. 7

    Trimmed reads were then mapped with STAR (2.4.0i) against a repeat element database (RepBase 18.05).

    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star=2.4.0i
    
    # Define variables
    STAR_VERSION="2.4.0i"
    REPBASE_FASTA="repbase_18.05.fasta" # Placeholder for the RepBase 18.05 FASTA file. Obtain from RepBase (e.g., http://www.girinst.org/repbase/update/index.html)
    GENOME_DIR="STAR_RepBase_index"
    TRIMMED_READS="trimmed_reads.fastq.gz" # Placeholder for trimmed reads (e.g., output from a trimming step)
    OUTPUT_PREFIX="repbase_mapping"
    
    # 1. Create STAR genome index for RepBase 18.05
    # This step assumes you have the RepBase 18.05 FASTA file. 
    # For mapping against a repeat database, a GTF/GFF is typically not used, and splicing is disabled.
    mkdir -p "${GENOME_DIR}"
    STAR --runMode genomeGenerate \
         --genomeDir "${GENOME_DIR}" \
         --genomeFastaFiles "${REPBASE_FASTA}" \
         --runThreadN 8 # Adjust threads as needed
    
    # 2. Map trimmed reads to the RepBase index
    STAR --version # To confirm the version used
    STAR --runMode alignReads \
         --genomeDir "${GENOME_DIR}" \
         --readFilesIn "${TRIMMED_READS}" \
         --runThreadN 8 \
         --outFileNamePrefix "${OUTPUT_PREFIX}_" \
         --outSAMtype BAM SortedByCoordinate \
         --outSAMunmapped Within \
         --outFilterMultimapNmax 100 \
         --outFilterMismatchNmax 10 \
         --alignIntronMax 1 \
         --alignMatesGapMax 1000000 \
         --limitBAMsortRAM 30000000000 # Adjust RAM based on available resources (e.g., 30GB)
    
    # Optional: Index the resulting BAM file
    samtools index "${OUTPUT_PREFIX}_Aligned.sortedByCoordinate.bam"
  8. 8

    Args: --runThreadN 16 \ --genomeDir human_repbase \ --readFilesIn path/to/read1 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 30 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd

    STAR (Inferred with models/gemini-2.5-flash) vNot specified GitHub
    $ Bash example
    # Install STAR (example using conda)
    # conda install -c bioconda star
    
    # Note: The 'human_repbase' directory must contain a pre-built STAR genome index.
    # This index would typically be generated using STAR's genomeGenerate command,
    # potentially including repetitive element sequences if 'repbase' implies that.
    # Example of STAR index generation (not part of this step):
    # STAR --runThreadN <threads> --runMode genomeGenerate --genomeDir human_repbase \
    #      --genomeFastaFiles /path/to/human_genome.fa /path/to/repbase_sequences.fa \
    #      --sjdbGTFfile /path/to/annotations.gtf # if applicable
    
    # Placeholder for input reads. The description 'readFilesIn path/to/read1' suggests a single input file.
    # If paired-end, the argument would typically be '--readFilesIn path/to/read1 path/to/read2'.
    # cp /path/to/your/actual_read_file.fastq.gz path/to/read1 # Example of placing input file
    
    STAR --runThreadN 16 \
         --genomeDir human_repbase \
         --readFilesIn path/to/read1 \
         --outFileNamePrefix out_prefix \
         --outReadsUnmapped Fastx \
         --outSAMtype BAM Unsorted \
         --outSAMattributes All \
         --outSAMunmapped Within \
         --outSAMattrRGline ID:foo \
         --outFilterType BySJout \
         --outFilterMultimapNmax 30 \
         --outFilterMultimapScoreRange 1 \
         --outFilterScoreMin 10 \
         --alignEndsType EndToEnd
  9. 9

    Unmapped reads filtered of repeat elements were then mapped with STAR (2.4.0i) against a human genome (hg19/ChlSab2).

    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star=2.4.0i
    
    # Define variables
    # Replace with actual path to STAR genome index.
    # If mapping against a combined hg19/ChlSab2 genome, the index should be built from both fasta files.
    GENOME_DIR="/path/to/STAR_genome_index/hg19_ChlSab2" # Example: /path/to/STAR_genome_index/hg19_ChlSab2
    INPUT_FASTQ="filtered_unmapped_reads.fastq.gz" # Replace with your input FASTQ file
    OUTPUT_PREFIX="aligned_reads"
    THREADS=8 # Adjust as needed
    
    # Example for creating a STAR genome index for hg19 and ChlSab2 (run once)
    # Ensure you have the fasta files for hg19 (e.g., from UCSC) and ChlSab2 (e.g., from NCBI/Ensembl), 
    # and optionally a GTF for hg19 (e.g., from GENCODE or UCSC).
    # STAR --runMode genomeGenerate \
    #      --genomeDir ${GENOME_DIR} \
    #      --genomeFastaFiles /path/to/hg19.fa /path/to/ChlSab2.fa \
    #      --sjdbGTFfile /path/to/hg19.gtf \
    #      --runThreadN ${THREADS}
    
    # Map reads with STAR
    STAR --runMode alignReads \
         --genomeDir ${GENOME_DIR} \
         --readFilesIn ${INPUT_FASTQ} \
         --outFileNamePrefix ${OUTPUT_PREFIX}_ \
         --outSAMtype BAM SortedByCoordinate \
         --outSAMunmapped Within \
         --outFilterMultimapNmax 20 \
         --outFilterScoreMinOverLread 0.66 \
         --outFilterMatchNminOverLread 0.66 \
         --runThreadN ${THREADS}
  10. 10

    Args: --runThreadN 16 \ --genomeDir genomedir \ --readFilesIn /path/to/read1 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 1 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd

    STAR (Inferred with models/gemini-2.5-flash) v2.7.10a (Inferred from common usage) GitHub
    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star
    
    # Placeholder for genome directory (e.g., hg38, mm10)
    # This directory should contain the STAR genome index generated with STAR --runMode genomeGenerate
    GENOME_DIR="genomedir" # Replace with actual path to STAR genome index
    
    # Placeholder for input FASTQ file(s)
    READ_FILE_1="/path/to/read1.fastq.gz" # Replace with actual path to your read 1 file
    # If paired-end, use: READ_FILE_2="/path/to/read2.fastq.gz"
    
    # Placeholder for output prefix
    OUT_PREFIX="out_prefix" # Replace with desired output file prefix
    
    STAR \
      --runThreadN 16 \
      --genomeDir "${GENOME_DIR}" \
      --readFilesIn "${READ_FILE_1}" \
      --outFileNamePrefix "${OUT_PREFIX}" \
      --outReadsUnmapped Fastx \
      --outSAMtype BAM Unsorted \
      --outSAMattributes All \
      --outSAMunmapped Within \
      --outSAMattrRGline ID:foo \
      --outFilterType BySJout \
      --outFilterMultimapNmax 1 \
      --outFilterMultimapScoreRange 1 \
      --outFilterScoreMin 10 \
      --alignEndsType EndToEnd
  11. 11

    Aligned reads were sorted with samtools (1.6)

    samtools v1.6 GitHub
    $ Bash example
    # Install samtools if not already available
    # conda install -c bioconda samtools=1.6
    
    # Sort aligned reads (BAM format) by coordinate
    # Input: aligned_reads.bam
    # Output: aligned_reads.sorted.bam
    samtools sort -o aligned_reads.sorted.bam aligned_reads.bam
  12. 12

    Sorted reads were collapsed with umi_tools (1.0.0).

    UMI-tools v1.0.0 GitHub
    $ Bash example
    # Install umi_tools if not already installed
    # conda create -n umi_tools_env umi_tools=1.0.0 -c bioconda -y
    # conda activate umi_tools_env
    
    # Define input and output files
    INPUT_BAM="sorted_reads.bam"
    OUTPUT_DEDUP_BAM="collapsed_reads.dedup.bam"
    OUTPUT_STATS="deduplication_stats.txt"
    
    # Collapse sorted reads using umi_tools dedup
    # Assuming UMIs are in the read ID (default behavior if not specified otherwise).
    # Using 'directional' method for deduplication, which is robust for many applications.
    # If reads are paired-end, add --paired.
    umi_tools dedup \
        --input "${INPUT_BAM}" \
        --output "${OUTPUT_DEDUP_BAM}" \
        --method "directional" \
        --output-stats "${OUTPUT_STATS}" \
        --log "umi_tools_dedup.log"
  13. 13

    Args: --random-seed 1 --method unique

    Custom Data Processing Script (Inferred with models/gemini-2.5-flash) vN/A
    $ Bash example
    # This command represents a generic data processing step.
    # The specific tool is not explicitly stated in the description.
    process_data --random-seed 1 --method unique
  14. 14

    BAM files were used to identify peak clusters with Clipper (1.2.2).

    CLIPper v1.2.2 GitHub
    $ Bash example
    # Install CLIPper (if not already installed)
    # pip install clipper
    
    # Placeholder for genome size file (e.g., for human hg38)
    # Replace with the actual path to your genome size file, or generate one using samtools faidx
    GENOME_SIZE_FILE="/path/to/hg38.chrom.sizes"
    
    # Input BAM file(s)
    # The description mentions "BAM files" (plural), implying one or more input BAMs.
    # For a single run, we'll use a placeholder for one input BAM.
    INPUT_BAM="input.bam"
    
    # Output peak file
    OUTPUT_BED="peaks.bed"
    
    # Run CLIPper to identify peak clusters
    # This is a basic command. Specific parameters like -p (p-value), -f (fold-change),
    # -c (control BAM), -u (upstream extension), -d (downstream extension), etc.,
    # would be added based on the specific experimental design and desired stringency.
    clipper.py -g "${GENOME_SIZE_FILE}" -o "${OUTPUT_BED}" "${INPUT_BAM}"
  15. 15

    Args: --species (hg19/ChlSab2_Sars) --bam path/to/input.bam --timeout 3600000 --maxgenes 1000000 --save-pickle --outfile path/to/output.bam

    Python script for gene feature extraction (Inferred with models/gemini-2.5-flash) vN/A (Inferred with models/gemini-2.5-flash)
    $ Bash example
    bash
    # It is assumed that Python and necessary libraries (e.g., pandas, numpy, pysam if processing BAMs) are installed.
    # Example:
    # conda create -n myenv python=3.9
    # conda activate myenv
    # pip install pandas numpy pysam
    
    # Placeholder for the inferred Python script.
    # Replace 'python_script.py' with the actual script name if known.
    # Replace 'path/to/input.bam' and 'path/to/output.bam' with actual file paths.
    
    python python_script.py \
        --species hg19 \
        --bam path/to/input.bam \
        --timeout 3600000 \
        --maxgenes 1000000 \
        --save-pickle \
        --outfile path/to/output.bam
    
  16. 16

    Peak clusters were normalized using BAM files for IP against BAM files for INPUT with peaksnormalize.pl (overlap_peakfi_with_bam_PE.pl), included in eclip 0.1.5+.

    $ Bash example
    # Clone the eclip repository if not already available
    # git clone https://github.com/yeolab/eclip.git
    # export PATH=$PATH:/path/to/eclip/bin
    # Ensure Perl and required modules are installed (e.g., Bio::DB::Sam)
    
    # Define input files (placeholders)
    PEAK_FILE="peaks.bed" # Example: output from a peak caller like CLIPper
    IP_BAM="ip_replicate1.bam" # BAM file for IP sample
    INPUT_BAM="input_replicate1.bam" # BAM file for INPUT sample
    OUTPUT_PREFIX="normalized_peaks"
    
    # Normalize peak clusters using peaksnormalize.pl
    peaksnormalize.pl "${PEAK_FILE}" "${IP_BAM}" "${INPUT_BAM}" "${OUTPUT_PREFIX}"
  17. 17

    Overlapping normalized peak regions were merged with compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl, included within eclip-0.1.5+

    $ Bash example
    # Clone the eCLIP repository if not already available, or ensure the script is in your PATH.
    # git clone https://github.com/yeolab/eclip.git
    # cd eclip
    
    # Assuming the script is located in a 'scripts' directory within the cloned eCLIP repository
    # or is otherwise accessible in your environment. Adjust the path as necessary.
    ECLIP_SCRIPTS_DIR="path/to/eclip/scripts" # Replace with the actual path to the eCLIP scripts directory
    
    # Placeholder for input normalized peak regions (BED format) from replicates.
    # These files would be the output from a previous peak calling and normalization step.
    INPUT_PEAKS_REP1="normalized_replicate1_peaks.bed"
    INPUT_PEAKS_REP2="normalized_replicate2_peaks.bed"
    # Add more input files for additional replicates as needed, e.g., INPUT_PEAKS_REP3="normalized_replicate3_peaks.bed"
    
    # Output file for the merged peak regions
    OUTPUT_MERGED_PEAKS="merged_replicate_overlapping_peaks.bed"
    
    # Execute the merging script.
    # The script 'compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl'
    # likely takes multiple input BED files (representing normalized peak regions from replicates)
    # and merges them based on overlap and L2 fold enrichment criteria, outputting a single BED file.
    # Specific parameters for L2 fold enrichment or overlap thresholds are not provided in the description,
    # so a generic call is used here, assuming it takes input files as positional arguments.
    perl "${ECLIP_SCRIPTS_DIR}/compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl" \
        "${INPUT_PEAKS_REP1}" \
        "${INPUT_PEAKS_REP2}" \
        > "${OUTPUT_MERGED_PEAKS}"
    
  18. 18

    Normalized peak (compressed.bed) files were ranked by entropy score (make_informationcontent_from_peaks.pl included within the merge_peaks pipeline) and used as inputs to IDR (2.0.2) to determine reproducible peaks.

    IDR v2.0.2 GitHub
    $ Bash example
    # Install IDR (e.g., via conda)
    # conda install -c bioconda idr=2.0.2
    
    # Install merge_peaks (assuming the scripts are accessible, e.g., cloned or in PATH)
    # git clone https://github.com/yeolab/merge_peaks.git
    # export PATH=$PATH:/path/to/merge_peaks/scripts
    
    # Placeholder for input normalized peak files (e.g., from two replicates)
    # These files are assumed to be in compressed.bed format as per the description.
    # Replace with actual file paths.
    INPUT_REP1_BED="replicate1.compressed.bed"
    INPUT_REP2_BED="replicate2.compressed.bed"
    
    # Output files after ranking by entropy score
    RANKED_REP1_BED="replicate1.ranked.bed"
    RANKED_REP2_BED="replicate2.ranked.bed"
    
    # Output prefix for IDR results
    IDR_OUTPUT_PREFIX="idr_reproducible_peaks"
    
    # Step 1: Rank normalized peak files by entropy score using make_informationcontent_from_peaks.pl
    # This script is included within the merge_peaks pipeline.
    perl make_informationcontent_from_peaks.pl "${INPUT_REP1_BED}" "${RANKED_REP1_BED}"
    perl make_informationcontent_from_peaks.pl "${INPUT_REP2_BED}" "${RANKED_REP2_BED}"
    
    # Step 2: Run IDR (2.0.2) to determine reproducible peaks
    # A common rank threshold (e.g., 0.01) is used as it's not specified in the description.
    idr --samples "${RANKED_REP1_BED}" "${RANKED_REP2_BED}" --output-file "${IDR_OUTPUT_PREFIX}" --rank-threshold 0.01
  19. 19

    Reproducible peaks were filtered for those ≥20 bases in length, and not overlapping with WT negative control samples.

    filter_peaks.py (Inferred with models/gemini-2.5-flash) vN/A GitHub
    $ Bash example
    # Install merge_peaks (if not already installed)
    # git clone https://github.com/yeolab/merge_peaks.git
    # # Navigate into the cloned directory if needed, or adjust path
    # # cd merge_peaks
    # # Ensure Python environment is set up (e.g., with conda)
    # # conda create -n merge_peaks_env python=3.8
    # # conda activate merge_peaks_env
    # # pip install -r requirements.txt # if a requirements.txt exists
    
    # Execute filter_peaks.py
    # Replace '/path/to/merge_peaks' with the actual path to the cloned repository's root where filter_peaks.py resides.
    # Replace 'merged_reproducible_peaks.bed' with the actual input file containing reproducible peaks.
    # Replace 'WT_negative_control.bed' with the actual negative control peak file (e.g., a blacklist file).
    python /path/to/merge_peaks/filter_peaks.py \
        --input merged_reproducible_peaks.bed \
        --output filtered_reproducible_peaks.bed \
        --min-length 20 \
        --blacklist WT_negative_control.bed

Tools Used

Raw Source Text
Sequenced reads were reformatted to include randomers in read headers with umi_tools (1.0.0). Args: --random-seed 1 --bc-pattern NNNNNNNNNN
Reads were then trimmed with cutadapt (1.14). Args: --match-read-wildcards -O 1 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/)
Reads were then trimmed once more with cutadapt (1.14) to remove double-ligation events. Args: --match-read-wildcards -O 5 --times 1 -e 0.1 --quality-cutoff 6 -m 18  -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/)
Trimmed reads were then mapped with STAR (2.4.0i) against a repeat element database (RepBase 18.05). Args: --runThreadN 16 \  --genomeDir human_repbase \  --readFilesIn path/to/read1 \  --outFileNamePrefix out_prefix \  --outReadsUnmapped Fastx \  --outSAMtype BAM Unsorted \  --outSAMattributes All \  --outSAMunmapped Within \  --outSAMattrRGline ID:foo \  --outFilterType BySJout \  --outFilterMultimapNmax 30 \  --outFilterMultimapScoreRange 1 \  --outFilterScoreMin 10 \  --alignEndsType EndToEnd
Unmapped reads filtered of repeat elements were then mapped with STAR (2.4.0i) against a human genome (hg19/ChlSab2). Args: --runThreadN 16 \  --genomeDir genomedir \  --readFilesIn /path/to/read1 \  --outFileNamePrefix out_prefix \  --outReadsUnmapped Fastx \  --outSAMtype BAM   Unsorted \  --outSAMattributes All \  --outSAMunmapped Within \  --outSAMattrRGline ID:foo \  --outFilterType BySJout \  --outFilterMultimapNmax 1 \  --outFilterMultimapScoreRange 1 \  --outFilterScoreMin 10 \  --alignEndsType EndToEnd
Aligned reads were sorted with samtools (1.6)
Sorted reads were collapsed with umi_tools (1.0.0). Args: --random-seed 1 --method unique
BAM files were used to identify peak clusters with Clipper (1.2.2). Args: --species (hg19/ChlSab2_Sars) --bam path/to/input.bam --timeout 3600000 --maxgenes 1000000 --save-pickle --outfile path/to/output.bam
Peak clusters were normalized using BAM files for IP against BAM files for INPUT with peaksnormalize.pl (overlap_peakfi_with_bam_PE.pl), included in eclip 0.1.5+.
Overlapping normalized peak regions were merged with compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl, included within eclip-0.1.5+
Normalized peak (compressed.bed) files were ranked by entropy score (make_informationcontent_from_peaks.pl included within the merge_peaks pipeline) and used as inputs to IDR (2.0.2) to determine reproducible peaks.
Reproducible peaks were filtered for those ≥20 bases in length, and not overlapping with WT negative control samples.
Genome_build: hg19
Genome_build: ChlSab2
Genome_build: Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome (MN908947.3)
← Back to Analysis