GSE173508 Processing Pipeline

GSE code_examples 19 steps

Publication

Discovery and functional interrogation of SARS-CoV-2 protein-RNA interactions.

Research square (2022) — PMID 35313591

Dataset

Discovery and functional interrogation of the virus and host RNA interactome of SARS-CoV-2 proteins

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Sequenced reads were reformatted to include randomers in read headers with umi_tools (1.0.0).

UMI-tools v1.0.0

$ Bash example

# Install UMI-tools (if not already installed)
# conda install -c bioconda umi-tools=1.0.0

# Reformat sequenced reads to extract Unique Molecular Identifiers (UMIs) and include them in read headers.
# This command assumes UMIs are at the start of Read 1 and are 6 base pairs long.
# Adjust the --bc-pattern according to the actual UMI structure and position in your reads.
# For example:
#   - If the UMI is 6bp at the start of Read 1: --bc-pattern=NNNNNN
#   - If the UMI is 6bp at the start of Read 2 (for paired-end data): --bc-pattern=NNNNNN --read2-in=read2.fastq.gz --read2-out=read2_umi_extracted.fastq.gz
#   - If the UMI is in an index read: --bc-pattern=NNNNNN --index-in=index.fastq.gz

umi_tools extract \
    --input input_reads.fastq.gz \
    --output output_reads_with_umi_in_header.fastq.gz \
    --bc-pattern=NNNNNN \
    --log umi_tools_extract.log

Args: --random-seed 1 --bc-pattern NNNNNNNNNN

umi_tools (Inferred with models/gemini-2.5-flash) v1.1.2 GitHub

$ Bash example

# Install umi_tools via conda
# conda install -c bioconda umi_tools

# Example usage of umi_tools extract for processing Unique Molecular Identifiers (UMIs).
# Replace input.fastq.gz and output.fastq.gz with your actual input and output file paths.
# The --bc-pattern NNNNNNNNNN specifies a 10-nucleotide barcode at the start of the read.
# The --random-seed 1 ensures reproducibility for any random processes within umi_tools.
umi_tools extract --random-seed 1 --bc-pattern NNNNNNNNNN -I input.fastq.gz -S output.fastq.gz

View on GitHub

Reads were then trimmed with cutadapt (1.14).

cutadapt v1.14 GitHub

$ Bash example

# Install cutadapt (if not already installed)
# conda install -c bioconda cutadapt=1.14

# Define input and output files (placeholders)
INPUT_FASTQ="input.fastq.gz"
OUTPUT_FASTQ="trimmed.fastq.gz"

# Define common adapter sequence (e.g., Illumina universal adapter)
ADAPTER_SEQUENCE="AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC"

# Define trimming parameters
MINIMUM_LENGTH=15  # Minimum read length after trimming
QUALITY_CUTOFF=20  # Quality cutoff for 3' end trimming
NUM_CORES=4        # Number of cores for parallel processing

# Execute cutadapt command
cutadapt \
  -a "${ADAPTER_SEQUENCE}" \
  -o "${OUTPUT_FASTQ}" \
  -m ${MINIMUM_LENGTH} \
  -q ${QUALITY_CUTOFF} \
  --cores ${NUM_CORES} \
  "${INPUT_FASTQ}"

View on GitHub

Args: --match-read-wildcards -O 1 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/)

eCLIP vNot explicitly versioned, using latest available from yeolab/eclip repository GitHub

$ Bash example

# Install STAR aligner (if not already installed)
# conda install -c bioconda star

# Clone the eCLIP repository (if not already cloned)
# git clone https://github.com/YeoLab/eclip.git
# cd eclip
# export PATH=$(pwd)/src:$PATH # Add eclip scripts to PATH

# Define input/output and reference paths (placeholders - replace with actual paths)
INPUT_FASTQ="path/to/your/input_reads.fastq.gz" # Example: reads.fastq.gz or R1.fastq.gz R2.fastq.gz for paired-end
OUTPUT_PREFIX="eclip_alignment_output" # Prefix for output files
GENOME_DIR="path/to/STAR_index/hg38" # Path to the STAR genome index (e.g., for human hg38)

# Download InvRNA*.fasta files (if not already present)
# mkdir -p example_inputs
# for i in $(seq 1 1560); do
#     wget -P example_inputs https://raw.githubusercontent.com/YeoLab/eclip/master/example/inputs/InvRNA_"$i".fasta
# done
# ANNOTATION_FASTA_FILES="example_inputs/InvRNA_*.fasta"

# For demonstration, assuming InvRNA*.fasta are in the current directory or specified path
ANNOTATION_FASTA_FILES="InvRNA*.fasta" # Adjust path if files are elsewhere

# Execute the eclip_align_star.py script
python eclip_align_star.py \
  -i "${INPUT_FASTQ}" \
  -o "${OUTPUT_PREFIX}" \
  -g "${GENOME_DIR}" \
  --match-read-wildcards \
  -O 1 \
  --times 1 \
  -e 0.1 \
  --quality-cutoff 6 \
  -m 18 \
  -a ${ANNOTATION_FASTA_FILES}

View on GitHub

Reads were then trimmed once more with cutadapt (1.14) to remove double-ligation events.

cutadapt v1.14 GitHub

$ Bash example

# Install cutadapt (if not already installed)
# conda install -c bioconda cutadapt=1.14

# Define input and output file paths
# Assuming single-end reads for this example. Adjust for paired-end if necessary.
INPUT_READS="input_reads_previously_trimmed.fastq.gz"
OUTPUT_TRIMMED_READS="reads_trimmed_double_ligation.fastq.gz"

# Define the 3' adapter sequence commonly associated with double-ligation events in eCLIP.
# This is a placeholder; the exact adapter might vary based on the specific library preparation
# and the nature of the double-ligation event. A common eCLIP 3' adapter is used here.
# Example eCLIP 3' adapter from Yeo lab protocols: AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
ADAPTER_SEQUENCE="AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC"

# Trim reads with cutadapt to remove the 3' adapter (double-ligation events).
# The description implies a re-trimming step, so the input reads are already partially processed.
# -a: 3' adapter sequence to be removed from the 3' end of the reads.
# -o: Output file for trimmed reads.
# -m 18: Discard reads shorter than 18 bp after trimming (a common minimum length for eCLIP reads).
# -q 20,20: Trim low-quality bases from the 5' and 3' ends with a quality threshold of 20.
# --cores=4: Use 4 CPU cores for parallel processing (adjust based on available resources).
cutadapt -a "${ADAPTER_SEQUENCE}" \
         -o "${OUTPUT_TRIMMED_READS}" \
         -m 18 \
         -q 20,20 \
         --cores=4 \
         "${INPUT_READS}"

View on GitHub

Args: --match-read-wildcards -O 5 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/)

eCLIP v1.1.1 GitHub

$ Bash example

# Install umi_tools if not already installed
# conda install -c bioconda umi_tools

# Reference dataset: InvRNA_adapters.fasta
# This file (e.g., InvRNA_adapters.fasta from https://github.com/YeoLab/eclip/tree/master/example/inputs/)
# typically contains adapter sequences used for trimming reads, often with tools like cutadapt,
# which usually precedes UMI deduplication. The '-a InvRNA*.fasta' argument in the description
# is characteristic of adapter trimming, but the other arguments are specific to umi_tools dedup.
# For this step, we assume the primary operation is UMI deduplication.

# Placeholder for input BAM file (e.g., from alignment and adapter trimming step)
# The input BAM should be sorted by coordinate for umi_tools dedup.
INPUT_BAM="input_aligned_trimmed.bam"
OUTPUT_BAM="output_deduplicated.bam"

umi_tools dedup \
  --match-read-wildcards \
  -O 5 \
  --times 1 \
  -e 0.1 \
  --quality-cutoff 6 \
  -m 18 \
  -I "${INPUT_BAM}" \
  -S "${OUTPUT_BAM}"

View on GitHub

Trimmed reads were then mapped with STAR (2.4.0i) against a repeat element database (RepBase 18.05).

STAR v2.4.0i GitHub

$ Bash example

# Install STAR (if not already installed)
# conda install -c bioconda star=2.4.0i
# conda install -c bioconda samtools # For indexing BAM files

# --- Reference Data Setup ---
# Placeholder for RepBase 18.05 FASTA file. 
# Actual download might require a license or specific access from GIRINST.
# For demonstration, we assume 'RepBase18.05.fasta' is available in the working directory.
# Example download (replace with actual source if available and licensed):
# wget -O RepBase18.05.fasta.gz "http://www.girinst.org/repbase/update/RepBase18.05.fasta.gz"
# gunzip RepBase18.05.fasta.gz

repbase_fasta="RepBase18.05.fasta"
genomeDir="RepBase_STAR_index"

# Create STAR index for RepBase 18.05
# The --genomeSAindexNbases parameter is adjusted for shorter sequences typical of repeat elements.
# --runThreadN: Use multiple threads for indexing to speed up the process.
mkdir -p "${genomeDir}"
STAR --runMode genomeGenerate \
     --genomeDir "${genomeDir}" \
     --genomeFastaFiles "${repbase_fasta}" \
     --genomeSAindexNbases 10 \
     --runThreadN 8

# --- Alignment Step ---
# Input: Trimmed reads (placeholder). Replace with your actual trimmed reads file.
trimmed_reads="trimmed_reads.fastq.gz" 

# Output prefix for STAR alignment files
output_prefix="aligned_to_repeats"

# Align trimmed reads to the RepBase 18.05 index
# --outSAMtype BAM SortedByCoordinate: Output sorted BAM file.
# --outReadsUnmapped Fastx: Output unmapped reads to a separate FASTQ file.
# --outFilterMultimapNmax 100: Allow reads to map to up to 100 locations, which is common for repetitive sequences.
# --alignIntronMax 1: Set to 1 to effectively disable splicing, as repeats are not typically spliced regions.
# --runThreadN: Use multiple threads for alignment.
STAR --runMode alignReads \
     --genomeDir "${genomeDir}" \
     --readFilesIn "${trimmed_reads}" \
     --outFileNamePrefix "${output_prefix}" \
     --outSAMtype BAM SortedByCoordinate \
     --outReadsUnmapped Fastx \
     --outFilterMultimapNmax 100 \
     --alignIntronMax 1 \
     --runThreadN 8

# Index the resulting BAM file for easier viewing and downstream processing
samtools index "${output_prefix}Aligned.sortedByCoord.out.bam"

View on GitHub

Args: --runThreadN 16 \ --genomeDir human_repbase \ --readFilesIn path/to/read1 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 30 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd

STAR (Inferred with models/gemini-2.5-flash) v(Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install STAR (example using conda)
# conda install -c bioconda star

# Note: 'human_repbase' refers to a pre-indexed STAR genome directory.
# If this is a custom index including repetitive elements, ensure it's generated correctly.
# For a standard human genome (e.g., hg38), you would first generate the genome index:
# STAR --runThreadN 16 --runMode genomeGenerate --genomeDir /path/to/STAR_index/hg38 --genomeFastaFiles /path/to/hg38.fa --sjdbGTFfile /path/to/gencode.vXX.annotation.gtf

# Align reads using STAR
STAR \
    --runThreadN 16 \
    --genomeDir human_repbase \
    --readFilesIn path/to/read1 \
    --outFileNamePrefix out_prefix \
    --outReadsUnmapped Fastx \
    --outSAMtype BAM Unsorted \
    --outSAMattributes All \
    --outSAMunmapped Within \
    --outSAMattrRGline ID:foo \
    --outFilterType BySJout \
    --outFilterMultimapNmax 30 \
    --outFilterMultimapScoreRange 1 \
    --outFilterScoreMin 10 \
    --alignEndsType EndToEnd

View on GitHub

Unmapped reads filtered of repeat elements were then mapped with STAR (2.4.0i) against a human genome (hg19/ChlSab2).

STAR v2.4.0i GitHub

$ Bash example

# Install STAR (example using conda)
# conda install -c bioconda star

# Define variables
READS_FILE="filtered_unmapped_reads.fastq.gz" # Placeholder for input reads (e.g., from a previous filtering step)
GENOME_DIR="/path/to/STAR_genome_index/hg19" # Placeholder for hg19 STAR genome index directory
OUTPUT_PREFIX="mapped_reads_" # Prefix for output files (e.g., mapped_reads_Aligned.sortedByCoord.out.bam)
NUM_THREADS=8 # Example number of threads, adjust based on available CPU cores

# Run STAR alignment
STAR --genomeDir "${GENOME_DIR}" \
     --readFilesIn "${READS_FILE}" \
     --runThreadN "${NUM_THREADS}" \
     --outFileNamePrefix "${OUTPUT_PREFIX}" \
     --outSAMtype BAM SortedByCoordinate \
     --outBAMcompression 6 \
     --limitBAMsortRAM 30000000000 # Example: 30GB, adjust based on available RAM and expected BAM file size

View on GitHub

Args: --runThreadN 16 \ --genomeDir genomedir \ --readFilesIn /path/to/read1 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 1 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd

STAR (Inferred with models/gemini-2.5-flash) v2.7.10a GitHub

$ Bash example

# Install STAR (example using conda)
# conda install -c bioconda star

# Define variables
GENOME_DIR="/path/to/STAR_genome_index_GRCh38" # Placeholder for GRCh38 genome index
READ1_FILE="/path/to/sample_R1.fastq.gz" # Placeholder for input read 1 file
OUTPUT_PREFIX="sample_aligned" # Placeholder for output file prefix

# Execute STAR alignment
STAR \
  --runThreadN 16 \
  --genomeDir "${GENOME_DIR}" \
  --readFilesIn "${READ1_FILE}" \
  --outFileNamePrefix "${OUTPUT_PREFIX}" \
  --outReadsUnmapped Fastx \
  --outSAMtype BAM Unsorted \
  --outSAMattributes All \
  --outSAMunmapped Within \
  --outSAMattrRGline ID:foo \
  --outFilterType BySJout \
  --outFilterMultimapNmax 1 \
  --outFilterMultimapScoreRange 1 \
  --outFilterScoreMin 10 \
  --alignEndsType EndToEnd

View on GitHub

Aligned reads were sorted with samtools (1.6)

samtools v1.6 GitHub

$ Bash example

# Install samtools if not already installed
# conda install -c bioconda samtools=1.6

# Sort an aligned BAM file
# Replace input.bam with your actual input file
# Replace output.sorted.bam with your desired output file name
samtools sort -o output.sorted.bam input.bam

View on GitHub

Sorted reads were collapsed with umi_tools (1.0.0).

UMI-tools v1.0.0 GitHub

$ Bash example

# Install umi_tools if not already installed
# conda install -c bioconda umi_tools=1.0.0

# Placeholder for input sorted BAM file
INPUT_SORTED_BAM="input.sorted.bam"
# Placeholder for output deduplicated BAM file
OUTPUT_DEDUPLICATED_BAM="output.deduplicated.bam"

# Collapse sorted reads using umi_tools dedup.
# This command assumes UMIs are present in the read names and uses the 'directional' method
# for deduplication, which is common for RNA-seq data to preserve strand information.
# Adjust --umi-separator or --extract-umi-method/--umi-tag if UMIs are stored differently
# (e.g., in a BAM tag).
umi_tools dedup \
    --input "${INPUT_SORTED_BAM}" \
    --output "${OUTPUT_DEDUPLICATED_BAM}" \
    --method directional \
    --log "umi_tools_dedup.log"

View on GitHub

Args: --random-seed 1 --method unique

Generic Data Processing Script (Inferred with models/gemini-2.5-flash) vUnknown

$ Bash example

# This is a placeholder command as the specific tool and context are not provided.
# It is assumed to be a custom script or a generic data processing step.
# Replace 'my_tool_or_script.sh' with the actual executable name and specify input/output files if applicable.
my_tool_or_script.sh --random-seed 1 --method unique

BAM files were used to identify peak clusters with Clipper (1.2.2).

CLIPper v1.2.2 GitHub

$ Bash example

# Install CLIPper (if not already installed)
# pip install clipper

# Example: Download a genome size file (e.g., for hg38) if not already present
# wget -O hg38.chrom.sizes http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes

# Run CLIPper to identify peak clusters from BAM files
# Replace 'input.bam' with your actual BAM file(s)
# Replace 'hg38.chrom.sizes' with the appropriate genome size file for your reference genome
# The p-value threshold (-p) is a common parameter; adjust as needed.
clipper.py -b input.bam -s hg38.chrom.sizes -o output_peaks.bed -p 0.01

View on GitHub

Args: --species (hg19/ChlSab2_Sars) --bam path/to/input.bam --timeout 3600000 --maxgenes 1000000 --save-pickle --outfile path/to/output.bam

Custom Python Script (Inferred with models/gemini-2.5-flash) vN/A GitHub

$ Bash example

# Define input and output paths
INPUT_BAM="path/to/input.bam"
OUTPUT_BAM="path/to/output.bam"

# Define species (reference genome)
# hg19 is a human reference genome, commonly available from UCSC or NCBI.
# ChlSab2_Sars is likely a specific reference, possibly for SARS-CoV-2 on Chlorocebus sabaeus (green monkey).
SPECIES="hg19" 

# Define other parameters
TIMEOUT_MS=3600000 # 1 hour in milliseconds
MAX_GENES=1000000

# Assuming a custom Python script based on the arguments like --save-pickle and --maxgenes,
# which are not typical for standard bioinformatics tools. The actual script name
# would need to be known from the pipeline context. For demonstration, we'll use
# a placeholder 'process_bam_genes.py'. This script would likely require a Python
# environment with necessary libraries (e.g., pysam).

# Example installation (commented out)
# conda create -n myenv python=3.8 pysam
# conda activate myenv

# Execute the inferred custom Python script
python process_bam_genes.py \
    --species "${SPECIES}" \
    --bam "${INPUT_BAM}" \
    --timeout "${TIMEOUT_MS}" \
    --maxgenes "${MAX_GENES}" \
    --save-pickle \
    --outfile "${OUTPUT_BAM}"

View on GitHub

Peak clusters were normalized using BAM files for IP against BAM files for INPUT with peaksnormalize.pl (overlap_peakfi_with_bam_PE.pl), included in eclip 0.1.5+.

eCLIP v0.1.5 GitHub

$ Bash example

# Clone the eclip repository if not already available
# git clone https://github.com/yeolab/eclip.git
# export PATH=$PATH:$(pwd)/eclip/bin

# Example input files (replace with actual paths)
# PEAK_FILE should be a peak file with read counts, typically generated by a script like overlap_peakfi_with_bam_PE.pl
# (as mentioned in the description) or another read counting tool.
PEAK_FILE="peaks_with_counts.bed" 
IP_BAM="ip.bam"
INPUT_BAM="input.bam"
OUTPUT_PREFIX="normalized_peaks"

# Run peaksnormalize.pl
# This script normalizes peak clusters using IP and INPUT BAM files.
peaksnormalize.pl "${PEAK_FILE}" "${IP_BAM}" "${INPUT_BAM}" "${OUTPUT_PREFIX}"

View on GitHub

Overlapping normalized peak regions were merged with compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl, included within eclip-0.1.5+

eCLIP v0.1.5 GitHub

$ Bash example

# Install Perl if not available
# conda install -c bioconda perl

# Clone the eCLIP pipeline repository to get the script
# git clone https://github.com/yeolab/eclip.git
# cd eclip/scripts

# Define input and output file names (replace with actual paths and names)
# These would be the normalized peak regions from individual replicates
INPUT_PEAK_FILE_REP1="replicate1_normalized_peaks.bed"
INPUT_PEAK_FILE_REP2="replicate2_normalized_peaks.bed"
# Add more input files if there are more replicates

OUTPUT_MERGED_PEAKS="merged_replicate_overlapping_peaks.bed"

# Execute the merging script
# Assuming the script 'compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl' is in the current PATH or specified with its full path
perl compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl \
    "${INPUT_PEAK_FILE_REP1}" \
    "${INPUT_PEAK_FILE_REP2}" \
    > "${OUTPUT_MERGED_PEAKS}"

View on GitHub

Normalized peak (compressed.bed) files were ranked by entropy score (make_informationcontent_from_peaks.pl included within the merge_peaks pipeline) and used as inputs to IDR (2.0.2) to determine reproducible peaks.

IDR v2.0.2 GitHub

$ Bash example

# Install merge_peaks pipeline (which includes make_informationcontent_from_peaks.pl and a custom IDR script)
# git clone https://github.com/yeolab/merge_peaks.git
# export PATH=$PATH:$(pwd)/merge_peaks/bin

# Example input files (normalized and compressed .bed files)
# Assuming 'compressed.bed' refers to standard BED files that might have been gzipped and unzipped,
# or simply a naming convention for processed BED files.
INPUT_REP1="replicate1.bed"
INPUT_REP2="replicate2.bed"
OUTPUT_PREFIX="idr_output"

# Step 1: Rank peaks by entropy score (information content)
# make_informationcontent_from_peaks.pl is part of the merge_peaks pipeline.
# It takes a BED file and outputs a new BED file with an added 'signal.value' column
# representing the information content, which IDR uses for ranking.
make_informationcontent_from_peaks.pl "${INPUT_REP1}" > "${INPUT_REP1%.bed}.ranked.bed"
make_informationcontent_from_peaks.pl "${INPUT_REP2}" > "${INPUT_REP2%.bed}.ranked.bed"

# Step 2: Use ranked peaks as inputs to IDR (version 2.0.2 logic, as implemented in merge_peaks)
# The 'idr' script used here is from the merge_peaks pipeline (merge_peaks/bin/idr).
# It implements the IDR 2.0.2 logic for determining reproducible peaks.
idr --input-file-type bed \
    --rank-by signal.value \
    --output-file "${OUTPUT_PREFIX}.idr.bed" \
    "${INPUT_REP1%.bed}.ranked.bed" \
    "${INPUT_REP2%.bed}.ranked.bed"

View on GitHub

Reproducible peaks were filtered for those â¥20 bases in length, and not overlapping with WT negative control samples.

bedtools (Inferred with models/gemini-2.5-flash) v2.30.0 GitHub

$ Bash example

# Install bedtools if not already installed
# conda install -c bioconda bedtools

# Filter peaks for length >= 20 bases
awk 'BEGIN{OFS="\t"} {if ($3 - $2 >= 20) print}' reproducible_peaks.bed > reproducible_peaks_length_filtered.bed

# Filter out peaks overlapping with WT negative control samples
bedtools subtract -a reproducible_peaks_length_filtered.bed -b wt_negative_control.bed > final_filtered_peaks.bed

View on GitHub

Tools Used

eCLIP STAR

Raw Source Text

Sequenced reads were reformatted to include randomers in read headers with umi_tools (1.0.0). Args: --random-seed 1 --bc-pattern NNNNNNNNNN
Reads were then trimmed with cutadapt (1.14). Args: --match-read-wildcards -O 1 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/)
Reads were then trimmed once more with cutadapt (1.14) to remove double-ligation events. Args: --match-read-wildcards -O 5 --times 1 -e 0.1 --quality-cutoff 6 -m 18  -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/)
Trimmed reads were then mapped with STAR (2.4.0i) against a repeat element database (RepBase 18.05). Args: --runThreadN 16 \  --genomeDir human_repbase \  --readFilesIn path/to/read1 \  --outFileNamePrefix out_prefix \  --outReadsUnmapped Fastx \  --outSAMtype BAM Unsorted \  --outSAMattributes All \  --outSAMunmapped Within \  --outSAMattrRGline ID:foo \  --outFilterType BySJout \  --outFilterMultimapNmax 30 \  --outFilterMultimapScoreRange 1 \  --outFilterScoreMin 10 \  --alignEndsType EndToEnd
Unmapped reads filtered of repeat elements were then mapped with STAR (2.4.0i) against a human genome (hg19/ChlSab2). Args: --runThreadN 16 \  --genomeDir genomedir \  --readFilesIn /path/to/read1 \  --outFileNamePrefix out_prefix \  --outReadsUnmapped Fastx \  --outSAMtype BAM   Unsorted \  --outSAMattributes All \  --outSAMunmapped Within \  --outSAMattrRGline ID:foo \  --outFilterType BySJout \  --outFilterMultimapNmax 1 \  --outFilterMultimapScoreRange 1 \  --outFilterScoreMin 10 \  --alignEndsType EndToEnd
Aligned reads were sorted with samtools (1.6)
Sorted reads were collapsed with umi_tools (1.0.0). Args: --random-seed 1 --method unique
BAM files were used to identify peak clusters with Clipper (1.2.2). Args: --species (hg19/ChlSab2_Sars) --bam path/to/input.bam --timeout 3600000 --maxgenes 1000000 --save-pickle --outfile path/to/output.bam
Peak clusters were normalized using BAM files for IP against BAM files for INPUT with peaksnormalize.pl (overlap_peakfi_with_bam_PE.pl), included in eclip 0.1.5+.
Overlapping normalized peak regions were merged with compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl, included within eclip-0.1.5+
Normalized peak (compressed.bed) files were ranked by entropy score (make_informationcontent_from_peaks.pl included within the merge_peaks pipeline) and used as inputs to IDR (2.0.2) to determine reproducible peaks.
Reproducible peaks were filtered for those â¥20 bases in length, and not overlapping with WT negative control samples.
Genome_build: hg19
Genome_build: ChlSab2
Genome_build: Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome (MN908947.3)

← Back to Analysis