GSE173498 Processing Pipeline

OTHER code_examples 19 steps

Publication

Discovery and functional interrogation of SARS-CoV-2 protein-RNA interactions.

Research square (2022) — PMID 35313591

Dataset

Discovery and functional interrogation of the virus and host RNA interactome of SARS-CoV-2 proteins [eCLIP]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Sequenced reads were reformatted to include randomers in read headers with umi_tools (1.0.0).

UMI-tools v1.0.0 GitHub

$ Bash example

# Install UMI-tools if not already installed
# conda install -c bioconda umi_tools=1.0.0

# Placeholder for input and output files
# Replace 'input.fastq.gz' with your actual input FASTQ file containing UMIs.
# Replace 'output.fastq.gz' with your desired output FASTQ file where UMIs are moved to headers.
# Replace 'NNNNNNNNNN' with the actual UMI barcode pattern. 
# For example, if a 10bp UMI is at the start of Read 1, use '--bc-pattern="^(?P<umi_1>.{10})"'.
# If the UMI is in a separate index read, the command structure will be different, 
# potentially involving '--extract-method=tag' and multiple input files. 
# This command assumes an inline UMI in the primary input FASTQ file.

umi_tools extract --bc-pattern=NNNNNNNNNN -I input.fastq.gz -S output.fastq.gz --log=umi_tools_extract.log

View on GitHub

Args: --random-seed 1 --bc-pattern NNNNNNNNNN

demultiplex_fastq.py (Inferred with models/gemini-2.5-flash) v0.1.0 GitHub

$ Bash example

# Install Miniconda or Anaconda if not already installed
# wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
# bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda
# export PATH="$HOME/miniconda/bin:$PATH"

# Clone the skipper repository
# git clone https://github.com/yeolab/skipper.git
# cd skipper

# Create and activate the conda environment for skipper
# conda env create -f environment.yaml
# conda activate skipper

# Example usage of demultiplex_fastq.py
# Assuming input_read1.fastq.gz and input_read2.fastq.gz are your input files
# and you are in the 'skipper' directory after cloning.
# The script will output files like demultiplexed_output_prefix_barcode1.fastq.gz, etc.
python scripts/demultiplex_fastq.py \
  --random-seed 1 \
  --bc-pattern NNNNNNNNNN \
  -i input_read1.fastq.gz input_read2.fastq.gz \
  -o demultiplexed_output_prefix

View on GitHub

Reads were then trimmed with cutadapt (1.14).

cutadapt v1.14 GitHub

$ Bash example

# Install cutadapt (if not already installed)
# conda install -c bioconda cutadapt=1.14

# Define input and output files (placeholders)
INPUT_READS="input_reads.fastq.gz"
TRIMMED_READS="trimmed_reads.fastq.gz"

# Define adapter sequence (replace with actual adapter sequence if known)
# Example: Illumina universal adapter
ADAPTER_SEQUENCE="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA"

# Execute cutadapt for trimming
# -a: 3' adapter sequence to remove
# -q 20,20: Trim low-quality bases from both ends with a quality threshold of 20
# -m 20: Discard reads shorter than 20 bp after trimming
# -o: Output file for trimmed reads
cutadapt -a "${ADAPTER_SEQUENCE}" -q 20,20 -m 20 -o "${TRIMMED_READS}" "${INPUT_READS}"

View on GitHub

Args: --match-read-wildcards -O 1 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/)

eCLIP v2.10 GitHub

$ Bash example

# Install cutadapt if not already installed
# conda install -c bioconda cutadapt

# Download the adapter sequence file if not present
# wget https://raw.githubusercontent.com/YeoLab/eclip/master/example/inputs/InvRNA.fasta

# Execute cutadapt for adapter trimming and quality filtering
cutadapt \
  --match-read-wildcards \
  -O 1 \
  --times 1 \
  -e 0.1 \
  --quality-cutoff 6 \
  -m 18 \
  -a file:InvRNA.fasta \
  -o trimmed_reads.fastq.gz \
  input_reads.fastq.gz

View on GitHub

Reads were then trimmed once more with cutadapt (1.14) to remove double-ligation events.

cutadapt v1.14 GitHub

$ Bash example

# Install cutadapt (if not already installed)
# conda install -c bioconda cutadapt=1.14

# Define input and output file paths (placeholders)
INPUT_READ1="reads_R1.fastq.gz"
INPUT_READ2="reads_R2.fastq.gz"
OUTPUT_READ1="trimmed_reads_R1.fastq.gz"
OUTPUT_READ2="trimmed_reads_R2.fastq.gz"

# Define adapter sequences (placeholders for common Illumina adapters)
# These sequences should be replaced with the actual adapters used in the experiment
# For double-ligation events, it's common to trim the sequencing adapter itself.
ADAPTER_R1="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA"
ADAPTER_R2="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT"

# Execute cutadapt to remove double-ligation events (adapter sequences)
# -a for 3' adapter of read 1, -A for 3' adapter of read 2
# --minimum-length is often used to discard very short reads after trimming
cutadapt -a "${ADAPTER_R1}" -A "${ADAPTER_R2}" \
         -o "${OUTPUT_READ1}" -p "${OUTPUT_READ2}" \
         --minimum-length 18 \
         "${INPUT_READ1}" "${INPUT_READ2}"

View on GitHub

Args: --match-read-wildcards -O 5 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/)

eCLIP vv1.0.0 GitHub

$ Bash example

# Install clipper (if not already installed)
# conda create -n clipper_env python=3.8
# conda activate clipper_env
# pip install clipper

# Download reference annotation file (assuming hg19, adjust if mm10 is needed)
# wget https://raw.githubusercontent.com/YeoLab/eclip/master/example/inputs/InvRNA_hg19.fasta

# Placeholder for input BAM files. Replace with actual paths to your treated and control BAMs.
TREATED_BAM="treated.bam"
CONTROL_BAM="control.bam"
OUTPUT_DIR="clipper_output"

# Create output directory
mkdir -p "${OUTPUT_DIR}"

# Run clipper for peak calling
clipper \
  --match-read-wildcards \
  -O "${OUTPUT_DIR}" \
  --times 1 \
  -e 0.1 \
  --quality-cutoff 6 \
  -m 18 \
  -a InvRNA_hg19.fasta \
  "${TREATED_BAM}" \
  "${CONTROL_BAM}"

View on GitHub

Trimmed reads were then mapped with STAR (2.4.0i) against a repeat element database (RepBase 18.05).

STAR v2.4.0i GitHub

$ Bash example

# Install STAR (if not already installed)
# conda install -c bioconda star=2.4.0i

# Define variables
STAR_VERSION="2.4.0i"
REPBASE_FASTA="repbase_18.05.fasta" # Placeholder for the RepBase 18.05 FASTA file. Obtain from RepBase (e.g., http://www.girinst.org/repbase/update/index.html)
GENOME_DIR="STAR_RepBase_index"
TRIMMED_READS="trimmed_reads.fastq.gz" # Placeholder for trimmed reads (e.g., output from a trimming step)
OUTPUT_PREFIX="repbase_mapping"

# 1. Create STAR genome index for RepBase 18.05
# This step assumes you have the RepBase 18.05 FASTA file. 
# For mapping against a repeat database, a GTF/GFF is typically not used, and splicing is disabled.
mkdir -p "${GENOME_DIR}"
STAR --runMode genomeGenerate \
     --genomeDir "${GENOME_DIR}" \
     --genomeFastaFiles "${REPBASE_FASTA}" \
     --runThreadN 8 # Adjust threads as needed

# 2. Map trimmed reads to the RepBase index
STAR --version # To confirm the version used
STAR --runMode alignReads \
     --genomeDir "${GENOME_DIR}" \
     --readFilesIn "${TRIMMED_READS}" \
     --runThreadN 8 \
     --outFileNamePrefix "${OUTPUT_PREFIX}_" \
     --outSAMtype BAM SortedByCoordinate \
     --outSAMunmapped Within \
     --outFilterMultimapNmax 100 \
     --outFilterMismatchNmax 10 \
     --alignIntronMax 1 \
     --alignMatesGapMax 1000000 \
     --limitBAMsortRAM 30000000000 # Adjust RAM based on available resources (e.g., 30GB)

# Optional: Index the resulting BAM file
samtools index "${OUTPUT_PREFIX}_Aligned.sortedByCoordinate.bam"

View on GitHub

Args: --runThreadN 16 \ --genomeDir human_repbase \ --readFilesIn path/to/read1 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 30 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd

STAR (Inferred with models/gemini-2.5-flash) vNot specified GitHub

$ Bash example

# Install STAR (example using conda)
# conda install -c bioconda star

# Note: The 'human_repbase' directory must contain a pre-built STAR genome index.
# This index would typically be generated using STAR's genomeGenerate command,
# potentially including repetitive element sequences if 'repbase' implies that.
# Example of STAR index generation (not part of this step):
# STAR --runThreadN <threads> --runMode genomeGenerate --genomeDir human_repbase \
#      --genomeFastaFiles /path/to/human_genome.fa /path/to/repbase_sequences.fa \
#      --sjdbGTFfile /path/to/annotations.gtf # if applicable

# Placeholder for input reads. The description 'readFilesIn path/to/read1' suggests a single input file.
# If paired-end, the argument would typically be '--readFilesIn path/to/read1 path/to/read2'.
# cp /path/to/your/actual_read_file.fastq.gz path/to/read1 # Example of placing input file

STAR --runThreadN 16 \
     --genomeDir human_repbase \
     --readFilesIn path/to/read1 \
     --outFileNamePrefix out_prefix \
     --outReadsUnmapped Fastx \
     --outSAMtype BAM Unsorted \
     --outSAMattributes All \
     --outSAMunmapped Within \
     --outSAMattrRGline ID:foo \
     --outFilterType BySJout \
     --outFilterMultimapNmax 30 \
     --outFilterMultimapScoreRange 1 \
     --outFilterScoreMin 10 \
     --alignEndsType EndToEnd

View on GitHub

Unmapped reads filtered of repeat elements were then mapped with STAR (2.4.0i) against a human genome (hg19/ChlSab2).

STAR v2.4.0i GitHub

$ Bash example

# Install STAR (if not already installed)
# conda install -c bioconda star=2.4.0i

# Define variables
# Replace with actual path to STAR genome index.
# If mapping against a combined hg19/ChlSab2 genome, the index should be built from both fasta files.
GENOME_DIR="/path/to/STAR_genome_index/hg19_ChlSab2" # Example: /path/to/STAR_genome_index/hg19_ChlSab2
INPUT_FASTQ="filtered_unmapped_reads.fastq.gz" # Replace with your input FASTQ file
OUTPUT_PREFIX="aligned_reads"
THREADS=8 # Adjust as needed

# Example for creating a STAR genome index for hg19 and ChlSab2 (run once)
# Ensure you have the fasta files for hg19 (e.g., from UCSC) and ChlSab2 (e.g., from NCBI/Ensembl), 
# and optionally a GTF for hg19 (e.g., from GENCODE or UCSC).
# STAR --runMode genomeGenerate \
#      --genomeDir ${GENOME_DIR} \
#      --genomeFastaFiles /path/to/hg19.fa /path/to/ChlSab2.fa \
#      --sjdbGTFfile /path/to/hg19.gtf \
#      --runThreadN ${THREADS}

# Map reads with STAR
STAR --runMode alignReads \
     --genomeDir ${GENOME_DIR} \
     --readFilesIn ${INPUT_FASTQ} \
     --outFileNamePrefix ${OUTPUT_PREFIX}_ \
     --outSAMtype BAM SortedByCoordinate \
     --outSAMunmapped Within \
     --outFilterMultimapNmax 20 \
     --outFilterScoreMinOverLread 0.66 \
     --outFilterMatchNminOverLread 0.66 \
     --runThreadN ${THREADS}

View on GitHub

Args: --runThreadN 16 \ --genomeDir genomedir \ --readFilesIn /path/to/read1 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 1 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd

STAR (Inferred with models/gemini-2.5-flash) v2.7.10a (Inferred from common usage) GitHub

$ Bash example

# Install STAR (if not already installed)
# conda install -c bioconda star

# Placeholder for genome directory (e.g., hg38, mm10)
# This directory should contain the STAR genome index generated with STAR --runMode genomeGenerate
GENOME_DIR="genomedir" # Replace with actual path to STAR genome index

# Placeholder for input FASTQ file(s)
READ_FILE_1="/path/to/read1.fastq.gz" # Replace with actual path to your read 1 file
# If paired-end, use: READ_FILE_2="/path/to/read2.fastq.gz"

# Placeholder for output prefix
OUT_PREFIX="out_prefix" # Replace with desired output file prefix

STAR \
  --runThreadN 16 \
  --genomeDir "${GENOME_DIR}" \
  --readFilesIn "${READ_FILE_1}" \
  --outFileNamePrefix "${OUT_PREFIX}" \
  --outReadsUnmapped Fastx \
  --outSAMtype BAM Unsorted \
  --outSAMattributes All \
  --outSAMunmapped Within \
  --outSAMattrRGline ID:foo \
  --outFilterType BySJout \
  --outFilterMultimapNmax 1 \
  --outFilterMultimapScoreRange 1 \
  --outFilterScoreMin 10 \
  --alignEndsType EndToEnd

View on GitHub

Aligned reads were sorted with samtools (1.6)

samtools v1.6 GitHub

$ Bash example

# Install samtools if not already available
# conda install -c bioconda samtools=1.6

# Sort aligned reads (BAM format) by coordinate
# Input: aligned_reads.bam
# Output: aligned_reads.sorted.bam
samtools sort -o aligned_reads.sorted.bam aligned_reads.bam

View on GitHub

Sorted reads were collapsed with umi_tools (1.0.0).

UMI-tools v1.0.0 GitHub

$ Bash example

# Install umi_tools if not already installed
# conda create -n umi_tools_env umi_tools=1.0.0 -c bioconda -y
# conda activate umi_tools_env

# Define input and output files
INPUT_BAM="sorted_reads.bam"
OUTPUT_DEDUP_BAM="collapsed_reads.dedup.bam"
OUTPUT_STATS="deduplication_stats.txt"

# Collapse sorted reads using umi_tools dedup
# Assuming UMIs are in the read ID (default behavior if not specified otherwise).
# Using 'directional' method for deduplication, which is robust for many applications.
# If reads are paired-end, add --paired.
umi_tools dedup \
    --input "${INPUT_BAM}" \
    --output "${OUTPUT_DEDUP_BAM}" \
    --method "directional" \
    --output-stats "${OUTPUT_STATS}" \
    --log "umi_tools_dedup.log"

View on GitHub

Args: --random-seed 1 --method unique

Custom Data Processing Script (Inferred with models/gemini-2.5-flash) vN/A

$ Bash example

# This command represents a generic data processing step.
# The specific tool is not explicitly stated in the description.
process_data --random-seed 1 --method unique

BAM files were used to identify peak clusters with Clipper (1.2.2).

CLIPper v1.2.2 GitHub

$ Bash example

# Install CLIPper (if not already installed)
# pip install clipper

# Placeholder for genome size file (e.g., for human hg38)
# Replace with the actual path to your genome size file, or generate one using samtools faidx
GENOME_SIZE_FILE="/path/to/hg38.chrom.sizes"

# Input BAM file(s)
# The description mentions "BAM files" (plural), implying one or more input BAMs.
# For a single run, we'll use a placeholder for one input BAM.
INPUT_BAM="input.bam"

# Output peak file
OUTPUT_BED="peaks.bed"

# Run CLIPper to identify peak clusters
# This is a basic command. Specific parameters like -p (p-value), -f (fold-change),
# -c (control BAM), -u (upstream extension), -d (downstream extension), etc.,
# would be added based on the specific experimental design and desired stringency.
clipper.py -g "${GENOME_SIZE_FILE}" -o "${OUTPUT_BED}" "${INPUT_BAM}"

View on GitHub

Args: --species (hg19/ChlSab2_Sars) --bam path/to/input.bam --timeout 3600000 --maxgenes 1000000 --save-pickle --outfile path/to/output.bam

Python script for gene feature extraction (Inferred with models/gemini-2.5-flash) vN/A (Inferred with models/gemini-2.5-flash)

$ Bash example

bash
# It is assumed that Python and necessary libraries (e.g., pandas, numpy, pysam if processing BAMs) are installed.
# Example:
# conda create -n myenv python=3.9
# conda activate myenv
# pip install pandas numpy pysam

# Placeholder for the inferred Python script.
# Replace 'python_script.py' with the actual script name if known.
# Replace 'path/to/input.bam' and 'path/to/output.bam' with actual file paths.

python python_script.py \
    --species hg19 \
    --bam path/to/input.bam \
    --timeout 3600000 \
    --maxgenes 1000000 \
    --save-pickle \
    --outfile path/to/output.bam

Peak clusters were normalized using BAM files for IP against BAM files for INPUT with peaksnormalize.pl (overlap_peakfi_with_bam_PE.pl), included in eclip 0.1.5+.

eCLIP v0.1.5 GitHub

$ Bash example

# Clone the eclip repository if not already available
# git clone https://github.com/yeolab/eclip.git
# export PATH=$PATH:/path/to/eclip/bin
# Ensure Perl and required modules are installed (e.g., Bio::DB::Sam)

# Define input files (placeholders)
PEAK_FILE="peaks.bed" # Example: output from a peak caller like CLIPper
IP_BAM="ip_replicate1.bam" # BAM file for IP sample
INPUT_BAM="input_replicate1.bam" # BAM file for INPUT sample
OUTPUT_PREFIX="normalized_peaks"

# Normalize peak clusters using peaksnormalize.pl
peaksnormalize.pl "${PEAK_FILE}" "${IP_BAM}" "${INPUT_BAM}" "${OUTPUT_PREFIX}"

View on GitHub

Overlapping normalized peak regions were merged with compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl, included within eclip-0.1.5+

eCLIP v0.1.5 GitHub

$ Bash example

# Clone the eCLIP repository if not already available, or ensure the script is in your PATH.
# git clone https://github.com/yeolab/eclip.git
# cd eclip

# Assuming the script is located in a 'scripts' directory within the cloned eCLIP repository
# or is otherwise accessible in your environment. Adjust the path as necessary.
ECLIP_SCRIPTS_DIR="path/to/eclip/scripts" # Replace with the actual path to the eCLIP scripts directory

# Placeholder for input normalized peak regions (BED format) from replicates.
# These files would be the output from a previous peak calling and normalization step.
INPUT_PEAKS_REP1="normalized_replicate1_peaks.bed"
INPUT_PEAKS_REP2="normalized_replicate2_peaks.bed"
# Add more input files for additional replicates as needed, e.g., INPUT_PEAKS_REP3="normalized_replicate3_peaks.bed"

# Output file for the merged peak regions
OUTPUT_MERGED_PEAKS="merged_replicate_overlapping_peaks.bed"

# Execute the merging script.
# The script 'compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl'
# likely takes multiple input BED files (representing normalized peak regions from replicates)
# and merges them based on overlap and L2 fold enrichment criteria, outputting a single BED file.
# Specific parameters for L2 fold enrichment or overlap thresholds are not provided in the description,
# so a generic call is used here, assuming it takes input files as positional arguments.
perl "${ECLIP_SCRIPTS_DIR}/compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl" \
"${INPUT_PEAKS_REP1}" \
"${INPUT_PEAKS_REP2}" \
> "${OUTPUT_MERGED_PEAKS}"

View on GitHub

Normalized peak (compressed.bed) files were ranked by entropy score (make_informationcontent_from_peaks.pl included within the merge_peaks pipeline) and used as inputs to IDR (2.0.2) to determine reproducible peaks.

IDR v2.0.2 GitHub

$ Bash example

# Install IDR (e.g., via conda)
# conda install -c bioconda idr=2.0.2

# Install merge_peaks (assuming the scripts are accessible, e.g., cloned or in PATH)
# git clone https://github.com/yeolab/merge_peaks.git
# export PATH=$PATH:/path/to/merge_peaks/scripts

# Placeholder for input normalized peak files (e.g., from two replicates)
# These files are assumed to be in compressed.bed format as per the description.
# Replace with actual file paths.
INPUT_REP1_BED="replicate1.compressed.bed"
INPUT_REP2_BED="replicate2.compressed.bed"

# Output files after ranking by entropy score
RANKED_REP1_BED="replicate1.ranked.bed"
RANKED_REP2_BED="replicate2.ranked.bed"

# Output prefix for IDR results
IDR_OUTPUT_PREFIX="idr_reproducible_peaks"

# Step 1: Rank normalized peak files by entropy score using make_informationcontent_from_peaks.pl
# This script is included within the merge_peaks pipeline.
perl make_informationcontent_from_peaks.pl "${INPUT_REP1_BED}" "${RANKED_REP1_BED}"
perl make_informationcontent_from_peaks.pl "${INPUT_REP2_BED}" "${RANKED_REP2_BED}"

# Step 2: Run IDR (2.0.2) to determine reproducible peaks
# A common rank threshold (e.g., 0.01) is used as it's not specified in the description.
idr --samples "${RANKED_REP1_BED}" "${RANKED_REP2_BED}" --output-file "${IDR_OUTPUT_PREFIX}" --rank-threshold 0.01

View on GitHub

Reproducible peaks were filtered for those â¥20 bases in length, and not overlapping with WT negative control samples.

filter_peaks.py (Inferred with models/gemini-2.5-flash) vN/A GitHub

$ Bash example

# Install merge_peaks (if not already installed)
# git clone https://github.com/yeolab/merge_peaks.git
# # Navigate into the cloned directory if needed, or adjust path
# # cd merge_peaks
# # Ensure Python environment is set up (e.g., with conda)
# # conda create -n merge_peaks_env python=3.8
# # conda activate merge_peaks_env
# # pip install -r requirements.txt # if a requirements.txt exists

# Execute filter_peaks.py
# Replace '/path/to/merge_peaks' with the actual path to the cloned repository's root where filter_peaks.py resides.
# Replace 'merged_reproducible_peaks.bed' with the actual input file containing reproducible peaks.
# Replace 'WT_negative_control.bed' with the actual negative control peak file (e.g., a blacklist file).
python /path/to/merge_peaks/filter_peaks.py \
    --input merged_reproducible_peaks.bed \
    --output filtered_reproducible_peaks.bed \
    --min-length 20 \
    --blacklist WT_negative_control.bed

View on GitHub

Tools Used

eCLIP STAR

Raw Source Text

Sequenced reads were reformatted to include randomers in read headers with umi_tools (1.0.0). Args: --random-seed 1 --bc-pattern NNNNNNNNNN
Reads were then trimmed with cutadapt (1.14). Args: --match-read-wildcards -O 1 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/)
Reads were then trimmed once more with cutadapt (1.14) to remove double-ligation events. Args: --match-read-wildcards -O 5 --times 1 -e 0.1 --quality-cutoff 6 -m 18  -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/)
Trimmed reads were then mapped with STAR (2.4.0i) against a repeat element database (RepBase 18.05). Args: --runThreadN 16 \  --genomeDir human_repbase \  --readFilesIn path/to/read1 \  --outFileNamePrefix out_prefix \  --outReadsUnmapped Fastx \  --outSAMtype BAM Unsorted \  --outSAMattributes All \  --outSAMunmapped Within \  --outSAMattrRGline ID:foo \  --outFilterType BySJout \  --outFilterMultimapNmax 30 \  --outFilterMultimapScoreRange 1 \  --outFilterScoreMin 10 \  --alignEndsType EndToEnd
Unmapped reads filtered of repeat elements were then mapped with STAR (2.4.0i) against a human genome (hg19/ChlSab2). Args: --runThreadN 16 \  --genomeDir genomedir \  --readFilesIn /path/to/read1 \  --outFileNamePrefix out_prefix \  --outReadsUnmapped Fastx \  --outSAMtype BAM   Unsorted \  --outSAMattributes All \  --outSAMunmapped Within \  --outSAMattrRGline ID:foo \  --outFilterType BySJout \  --outFilterMultimapNmax 1 \  --outFilterMultimapScoreRange 1 \  --outFilterScoreMin 10 \  --alignEndsType EndToEnd
Aligned reads were sorted with samtools (1.6)
Sorted reads were collapsed with umi_tools (1.0.0). Args: --random-seed 1 --method unique
BAM files were used to identify peak clusters with Clipper (1.2.2). Args: --species (hg19/ChlSab2_Sars) --bam path/to/input.bam --timeout 3600000 --maxgenes 1000000 --save-pickle --outfile path/to/output.bam
Peak clusters were normalized using BAM files for IP against BAM files for INPUT with peaksnormalize.pl (overlap_peakfi_with_bam_PE.pl), included in eclip 0.1.5+.
Overlapping normalized peak regions were merged with compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl, included within eclip-0.1.5+
Normalized peak (compressed.bed) files were ranked by entropy score (make_informationcontent_from_peaks.pl included within the merge_peaks pipeline) and used as inputs to IDR (2.0.2) to determine reproducible peaks.
Reproducible peaks were filtered for those â¥20 bases in length, and not overlapping with WT negative control samples.
Genome_build: hg19
Genome_build: ChlSab2
Genome_build: Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome (MN908947.3)

← Back to Analysis