GSE124023 Processing Pipeline

RIP-Seq code_examples 22 steps

Publication

DDX5 promotes oncogene C3 and FABP1 expressions and drives intestinal inflammation and tumorigenesis.

Life science alliance (2020) — PMID 32817263

Dataset

DDX5 targets tissue specific RNAs to promote intestine tumorigenesis

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Sequenced reads were removed of inline barcodes and reformatted to include randomers in read headers with eclipdemux (v0.0.1).

eclipdemux v0.0.1 GitHub

$ Bash example

# Install eclipdemux (assuming it's a Python package, often part of a larger workflow like yeolab/eclip or yeolab/skipper)
# pip install eclipdemux

# Define input and output files (placeholders)
INPUT_FASTQ="raw_reads.fastq.gz"
OUTPUT_FASTQ="demuxed_reads.fastq.gz"
BARCODE_FILE="inline_barcodes.txt" # Placeholder: file defining inline barcodes to be removed
RANDOMER_LENGTH=10 # Placeholder: length of randomers to be added to read headers

# Execute eclipdemux to remove inline barcodes and add randomers to read headers
# The exact command might vary slightly depending on how eclipdemux is installed (e.g., as a standalone script or a Python module).
# This example assumes it's run as a Python module, as seen in some Yeo lab pipelines.
python -m eclipdemux.demux -i "${INPUT_FASTQ}" -o "${OUTPUT_FASTQ}" -b "${BARCODE_FILE}" -r "${RANDOMER_LENGTH}"

View on GitHub

Args: --length 10

N/A (Inferred with models/gemini-2.5-flash) vN/A

$ Bash example

echo "A specific tool cannot be inferred from 'Args: --length 10' without additional context (e.g., assay type, specific task). This argument could be used in various tools for filtering, k-mer length, motif length, etc."
# Example placeholder if a tool were known:
# some_tool --length 10 input.file > output.file

Reads were then trimmed with cutadapt (1.9.1).

cutadapt v1.9.1 GitHub

$ Bash example

# Install cutadapt (if not already installed)
# conda install -c bioconda cutadapt=1.9.1

# Define input and output file names
INPUT_READ1="input_R1.fastq.gz"
INPUT_READ2="input_R2.fastq.gz"
OUTPUT_READ1="trimmed_R1.fastq.gz"
OUTPUT_READ2="trimmed_R2.fastq.gz"

# Define common Illumina adapter sequences (adjust if specific adapters are known)
# For Read 1 (forward strand)
ADAPTER_R1="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA"
# For Read 2 (reverse complement of Read 1 adapter, or specific R2 adapter)
ADAPTER_R2="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT"

# Trim adapters and quality trim reads
# -a: 3' adapter for Read 1
# -A: 3' adapter for Read 2
# -q: Quality cutoff (e.g., 20 for both ends)
# -m: Minimum read length after trimming (e.g., 20 bp)
# -o: Output file for Read 1
# -p: Output file for Read 2
cutadapt -a "${ADAPTER_R1}" -A "${ADAPTER_R2}" \
         -q 20,20 -m 20 \
         -o "${OUTPUT_READ1}" -p "${OUTPUT_READ2}" \
         "${INPUT_READ1}" "${INPUT_READ2}"

View on GitHub

Args: --match-read-wildcards -O 1 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -g g_adapters.fasta -A A_adapters.fasta -a a_adapters.fasta (fasta sequences generated from parsebarcodes.sh within the eclip 0.1.5+ pipeline)

eCLIP v0.1.5 GitHub

$ Bash example

# Install cutadapt if not already available
# conda install -c bioconda cutadapt

# Placeholder for input FASTQ file (e.g., raw reads after demultiplexing)
INPUT_FASTQ="input_reads.fastq.gz"
# Placeholder for output trimmed FASTQ file
OUTPUT_TRIMMED_FASTQ="trimmed_reads.fastq.gz"

# Adapter sequences generated from parsebarcodes.sh within the eclip 0.1.5+ pipeline
# These files would contain the specific adapter sequences to be trimmed.
# For example, g_adapters.fasta might contain 5' adapters, A_adapters.fasta and a_adapters.fasta 3' adapters.
# Ensure these files are present in the working directory or provide their full paths.
G_ADAPTERS_FASTA="g_adapters.fasta"
A_ADAPTERS_FASTA="A_adapters.fasta"
a_ADAPTERS_FASTA="a_adapters.fasta"

# Execute cutadapt with the specified arguments for adapter trimming and quality filtering
cutadapt \
  --match-read-wildcards \
  -O 1 \
  --times 1 \
  -e 0.1 \
  --quality-cutoff 6 \
  -m 18 \
  -g "${G_ADAPTERS_FASTA}" \
  -A "${A_ADAPTERS_FASTA}" \
  -a "${a_ADAPTERS_FASTA}" \
  -o "${OUTPUT_TRIMMED_FASTQ}" \
  "${INPUT_FASTQ}"

View on GitHub

Reads were then trimmed once more with cutadapt (1.9.1) to remove double-ligation events.

cutadapt v1.9.1 GitHub

$ Bash example

# Install cutadapt (version 1.9.1)
# conda create -n cutadapt_env cutadapt=1.9.1
# conda activate cutadapt_env

# Define input and output files
INPUT_FASTQ="input_reads.fastq.gz"
OUTPUT_FASTQ="trimmed_reads.fastq.gz"

# Define the 3' adapter sequence commonly used in eCLIP assays.
# This adapter helps remove double-ligation events where the adapter ligates to itself or to the 3' end of the read.
ADAPTER_SEQUENCE="AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC"

# Trim reads using cutadapt to remove the 3' adapter and filter for minimum length.
# -a: Specifies a 3' adapter sequence.
# -m: Discard reads shorter than MIN_LENGTH.
cutadapt -a "${ADAPTER_SEQUENCE}" \
         -m 15 \
         -o "${OUTPUT_FASTQ}" \
         "${INPUT_FASTQ}"

View on GitHub

Args: --match-read-wildcards -O 1 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -A A_adapters.fasta (fasta sequences generated from parsebarcodes.sh within the eclip 0.1.5+ pipeline)

eCLIP v0.1.5 GitHub

$ Bash example

# Install cutadapt if not already installed
# conda install -c bioconda cutadapt

# Placeholder for input FASTQ file (e.g., raw reads from sequencing)
INPUT_FASTQ="input.fastq.gz"
# Placeholder for output trimmed FASTQ file
OUTPUT_TRIMMED_FASTQ="output_trimmed.fastq.gz"
# Adapter file generated from parsebarcodes.sh within the eclip 0.1.5+ pipeline
ADAPTER_FILE="A_adapters.fasta"

cutadapt \
  --match-read-wildcards \
  -O 1 \
  --times 1 \
  -e 0.1 \
  --quality-cutoff 6 \
  -m 18 \
  -A "${ADAPTER_FILE}" \
  -o "${OUTPUT_TRIMMED_FASTQ}" \
  "${INPUT_FASTQ}"

View on GitHub

Trimmed reads were then mapped with STAR (2.4.0i) against a mouse-specific repeat element database (RepBase 18.05).

STAR v2.4.0i GitHub

$ Bash example

# Install STAR (if not already installed)
# conda install -c bioconda star

# Define variables for input and reference files
TRIMMED_READS="trimmed_reads.fastq.gz" # Replace with your actual trimmed reads file
MOUSE_REPBASE_STAR_INDEX="path/to/mouse_repbase_18.05_star_index" # Replace with the actual path to the STAR index built from RepBase 18.05 mouse sequences
OUTPUT_PREFIX="star_repbase_mapping_"
NUM_THREADS=8 # Adjust as needed

# Run STAR to map trimmed reads against the mouse-specific repeat element database
STAR \
  --genomeDir "${MOUSE_REPBASE_STAR_INDEX}" \
  --readFilesIn "${TRIMMED_READS}" \
  --outFileNamePrefix "${OUTPUT_PREFIX}" \
  --outSAMtype BAM SortedByCoordinate \
  --outFilterMultimapNmax 100 \
  --runThreadN "${NUM_THREADS}"

View on GitHub

Args: --runThreadN 16 \ --genomeDir mouse_repbase \ --readFilesIn path/to/read1 path/to/read2 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 30 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd

STAR (Inferred with models/gemini-2.5-flash) v2.7.10a GitHub

$ Bash example

# Install STAR using conda
# conda create -n star_env star=2.7.10a -c bioconda -c conda-forge
# conda activate star_env

# Placeholder for STAR genome index directory
# Replace /path/to/STAR_index/GRCm39_with_repbase with the actual path to your STAR index.
# This index should be generated for the mouse genome (e.g., GRCm39) and potentially include repetitive elements.
# Example command for genome generation:
# STAR --runThreadN <num_threads> --runMode genomeGenerate --genomeDir /path/to/STAR_index/GRCm39_with_repbase --genomeFastaFiles /path/to/GRCm39.fasta --sjdbGTFfile /path/to/GRCm39.gtf --sjdbOverhang 100
GENOME_DIR="/path/to/STAR_index/GRCm39_with_repbase"

# Placeholder for input FASTQ files
READ1="path/to/read1.fastq.gz"
READ2="path/to/read2.fastq.gz"

# Placeholder for output prefix
OUTPUT_PREFIX="out_prefix"

STAR \
  --runThreadN 16 \
  --genomeDir "${GENOME_DIR}" \
  --readFilesIn "${READ1}" "${READ2}" \
  --outFileNamePrefix "${OUTPUT_PREFIX}" \
  --outReadsUnmapped Fastx \
  --outSAMtype BAM Unsorted \
  --outSAMattributes All \
  --outSAMunmapped Within \
  --outSAMattrRGline ID:foo \
  --outFilterType BySJout \
  --outFilterMultimapNmax 30 \
  --outFilterMultimapScoreRange 1 \
  --outFilterScoreMin 10 \
  --alignEndsType EndToEnd

View on GitHub

Unmapped reads filtered of repeat elements were then mapped with STAR (2.4.0i) against a mouse genome (mm10).

STAR v2.4.0i GitHub

$ Bash example

# Install STAR (if not already installed)
# conda install -c bioconda star=2.4.0i

# Define variables
# Input FASTQ file containing unmapped reads filtered of repeat elements
INPUT_FASTQ="unmapped_reads.fastq.gz"

# Path to the pre-built STAR genome index for the mouse genome (mm10).
# This directory should contain files like Genome, SA, SAindex, etc.
# Replace with the actual path to your mm10 STAR index.
GENOME_INDEX_DIR="/path/to/mm10_star_index"

# Prefix for output files (e.g., mapped_reads_Aligned.sortedByCoord.out.bam)
OUTPUT_PREFIX="mapped_reads_"

# Number of threads to use for alignment
NUM_THREADS=8 # Adjust based on available resources

# Run STAR alignment
STAR --genomeDir "${GENOME_INDEX_DIR}" \
     --readFilesIn "${INPUT_FASTQ}" \
     --runThreadN "${NUM_THREADS}" \
     --outFileNamePrefix "${OUTPUT_PREFIX}" \
     --outSAMtype BAM SortedByCoordinate \
     --outSAMattributes Standard

View on GitHub

Args: --runThreadN 16 \ --genomeDir genomedir \ --readFilesIn /path/to/read1 /path/to/read2 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 1 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd

STAR (Inferred with models/gemini-2.5-flash) v2.7.10a (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

STAR_GENOME_DIR="/path/to/STAR_genome_index_GRCh38"
READ1_FILE="/path/to/read1.fastq.gz"
READ2_FILE="/path/to/read2.fastq.gz"
OUTPUT_PREFIX="aligned_output"

# Install STAR using conda
# conda install -c bioconda star

# Note: The genome index must be pre-built using STAR's genomeGenerate command.
# For human (GRCh38), you might download pre-built indices or create your own.

STAR \
  --runThreadN 16 \
  --genomeDir "${STAR_GENOME_DIR}" \
  --readFilesIn "${READ1_FILE}" "${READ2_FILE}" \
  --outFileNamePrefix "${OUTPUT_PREFIX}" \
  --outReadsUnmapped Fastx \
  --outSAMtype BAM Unsorted \
  --outSAMattributes All \
  --outSAMunmapped Within \
  --outSAMattrRGline ID:foo \
  --outFilterType BySJout \
  --outFilterMultimapNmax 1 \
  --outFilterMultimapScoreRange 1 \
  --outFilterScoreMin 10 \
  --alignEndsType EndToEnd

View on GitHub

Aligned reads were sorted with samtools (1.4.1)

samtools v1.4.1 GitHub

$ Bash example

# Install samtools if not already available
# conda install -c bioconda samtools=1.4.1

# Sort aligned reads
# Replace 'aligned_reads.bam' with your actual input BAM file
# Replace 'sorted_aligned_reads.bam' with your desired output sorted BAM file
samtools sort -o sorted_aligned_reads.bam aligned_reads.bam

View on GitHub

Sorted reads were collapsed with barcodecollapsepe.py included in eclip 0.1.5+ pipelines.

eCLIP v0.1.5 GitHub

$ Bash example

# Install eclip-utils, which contains barcodecollapsepe.py
# conda install -c bioconda eclip-utils
# or
# pip install eclip-utils

# Define input and output files
INPUT_R1="sorted_reads_R1.fastq.gz"
INPUT_R2="sorted_reads_R2.fastq.gz"
OUTPUT_PREFIX="collapsed_reads"

# Collapse sorted reads using barcodecollapsepe.py
barcodecollapsepe.py -1 "${INPUT_R1}" -2 "${INPUT_R2}" -o "${OUTPUT_PREFIX}"

View on GitHub

Args: -o preRmDup.bam -m metrics_file -b rmDup.bam

sambamba (Inferred with models/gemini-2.5-flash) v0.8.0 GitHub

$ Bash example

# Install sambamba
# conda install -c bioconda sambamba

# Execute sambamba markdup
sambamba markdup -o preRmDup.bam -m markdup_metrics.txt rmDup.bam

View on GitHub

PCR de-duped reads from each inline barcode were then merged with samtools (1.4.1) merge (merged.bam)

samtools v1.4.1 GitHub

$ Bash example

# Install samtools if not already installed
# conda install -c bioconda samtools

# Merge PCR de-duped reads from each inline barcode
# Replace input_deduped_barcode_1.bam, input_deduped_barcode_2.bam with your actual de-duped BAM files
samtools merge merged.bam input_deduped_barcode_1.bam input_deduped_barcode_2.bam

View on GitHub

Merged alignments were split to keep just read2 using samtools (1.4.1) view.

samtools v1.4.1 GitHub

$ Bash example

# Input: merged_alignments.bam (placeholder for merged alignment file)
# Output: read2_alignments.bam (placeholder for output file containing only read2)

# Split merged alignments to keep just read2
# -b: Output BAM format
# -f 0x80: Select alignments where the 'read is second in pair' flag (0x80) is set.
samtools view -b -f 0x80 merged_alignments.bam > read2_alignments.bam

View on GitHub

Args: -h -b -f 128

Unknown (Inferred with models/gemini-2.5-flash) vUnknown

$ Bash example

# The specific tool is not provided in the description.
# This command uses a placeholder for the tool based on the given arguments.
tool_placeholder -h -b -f 128

Read2 BAM files were used to identify peak clusters with Clipper (1.2.2).

CLIPper v1.2.2 GitHub

$ Bash example

# Install CLIPper (if not already installed)
# pip install clipper
# Or using conda:
# conda install -c bioconda clipper

# Placeholder for input Read2 BAM file and output prefix
INPUT_BAM="input_read2.bam"
OUTPUT_PREFIX="output_peaks"

# Placeholder for genome reference files (using hg38 as the latest common assembly)
# Ensure these files are available in your environment
GENOME_FASTA="hg38.fa"
CHROM_SIZES="hg38.chrom.sizes"

# Run CLIPper to identify peak clusters
clipper.py -b ${INPUT_BAM} -o ${OUTPUT_PREFIX} -s ${CHROM_SIZES} -f ${GENOME_FASTA}

View on GitHub

Args: --species mm10 --bam path/to/input.bam --timeout 3600000 --maxgenes 1000000 --save-pickle --outfile path/to/output.bam

(Inferred with models/gemini-2.5-flash) vNot specified

$ Bash example

# This tool is inferred based on the arguments provided.
# The '--save-pickle' argument strongly suggests it's a Python script.
# Replace 'python_script.py' with the actual script/tool name if known.

# Example installation (replace with actual if known):
# pip install some_python_package
# or
# git clone https://github.com/user/repo.git
# cd repo
# python setup.py install

# Reference genome: mm10 (Mouse, GRCm38 assembly)
# Source: UCSC Genome Browser or Ensembl

python_script.py --species mm10 \
                 --bam path/to/input.bam \
                 --timeout 3600000 \
                 --maxgenes 1000000 \
                 --save-pickle \
                 --outfile path/to/output.bam

Peak clusters were normalized using read2 BAM files for IP against read2 BAM files for INPUT with peaksnormalize.pl (overlap_peakfi_with_bam_PE.pl), included in eclip 0.1.5+.

eCLIP v0.1.5 GitHub

$ Bash example

# Install eCLIP (or ensure scripts are in PATH)
# For example, clone the repository and add scripts to PATH:
# git clone https://github.com/yeolab/eclip.git
# export PATH="$(pwd)/eclip/scripts:$PATH"

# Placeholder variables for input files
# Replace with actual paths to your IP and INPUT read2 BAM files and the peak file to be normalized
IP_READ2_BAM="ip_sample_read2.bam"
INPUT_READ2_BAM="input_sample_read2.bam"
PEAK_FILE="initial_peak_clusters.bed"
OUTPUT_PREFIX="normalized_peak_clusters"

# Normalize peak clusters using peaksnormalize.pl
# The script uses read2 BAM files for IP against read2 BAM files for INPUT.
peaksnormalize.pl \
    --ip_bam "${IP_READ2_BAM}" \
    --input_bam "${INPUT_READ2_BAM}" \
    --peak_file "${PEAK_FILE}" \
    --output_prefix "${OUTPUT_PREFIX}"

View on GitHub

Overlapping normalized peak regions were merged with compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl, included within eclip-0.1.5+

eCLIP v0.1.5 GitHub

$ Bash example

# Clone the eCLIP pipeline repository to get the script
# git clone https://github.com/yeolab/eclip.git
# cd eclip

# Placeholder for input normalized peak regions from replicates
# These would typically be BED files generated by a previous peak calling step
INPUT_PEAKS_REP1="replicate1_normalized_peaks.bed"
INPUT_PEAKS_REP2="replicate2_normalized_peaks.bed"

# Placeholder for the output merged peak file
OUTPUT_MERGED_PEAKS="merged_replicate_overlapping_peaks.bed"

# Execute the script to merge overlapping normalized peak regions
# The script takes multiple input BED files and outputs a single merged BED file
perl scripts/compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl \
    "${INPUT_PEAKS_REP1}" \
    "${INPUT_PEAKS_REP2}" \
    > "${OUTPUT_MERGED_PEAKS}"

View on GitHub

Normalized peak files were ranked by entropy score (make_informationcontent_from_peaks.pl included within the merge_peaks pipeline) and used as inputs to IDR (2.0.2) to determine reproducible peaks.

IDR v2.0.2 GitHub

$ Bash example

# Install IDR (if not already installed)
# conda install -c bioconda idr=2.0.2

# Placeholder for input files, which are "Normalized peak files ranked by entropy score"
# These files would typically be generated by 'make_informationcontent_from_peaks.pl'
# from the merge_peaks pipeline, for example:
# perl make_informationcontent_from_peaks.pl -i replicate1.peaks.bed -o replicate1.ranked.bed
# perl make_informationcontent_from_peaks.pl -i replicate2.peaks.bed -o replicate2.ranked.bed

# Define input ranked peak files and output prefix
INPUT_PEAKS_REP1="replicate1.ranked.bed"
INPUT_PEAKS_REP2="replicate2.ranked.bed"
OUTPUT_PREFIX="sample_id"

# Run IDR to determine reproducible peaks between replicates.
# Parameters are based on the 'run_idr.sh' script within the yeolab/merge_peaks pipeline.
idr --plot \
    --log-output-file "${OUTPUT_PREFIX}.idr.log" \
    --output-file "${OUTPUT_PREFIX}.idr.peaks" \
    "${INPUT_PEAKS_REP1}" "${INPUT_PEAKS_REP2}"

View on GitHub

Reproducible peaks were annotated by overlapping peak regions with Gencode M10 annotations

GENCODE v2.27.1 GitHub

$ Bash example

# Install bedtools
# conda install -c bioconda bedtools

# Define input and output files
REPRODUCIBLE_PEAKS="reproducible_peaks.bed" # Placeholder for your reproducible peak regions (e.g., output from IDR)
GENCODE_GTF_URL="ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M10/gencode.vM10.annotation.gtf.gz"
GENCODE_GENES_BED="gencode.vM10.genes.bed" # Placeholder for Gencode M10 gene annotations in BED format
ANNOTATED_PEAKS="annotated_peaks.bed"

# Download Gencode M10 annotation GTF (if not already present)
# wget -nc "${GENCODE_GTF_URL}"

# Extract gene features from GTF and convert to BED format.
# This step assumes you want to annotate peaks with gene regions.
# A more robust GTF to BED conversion might be needed depending on specific annotation requirements (e.g., exons, transcripts).
# Example using awk to get gene regions (chr, start, end, gene_id|gene_name, score, strand):
# zcat gencode.vM10.annotation.gtf.gz | awk '$3 == "gene" {
#     split($9, a, ";");
#     gene_id=""; gene_name="";
#     for (i=1; i<=length(a); i++) {
#         if (a[i] ~ /gene_id/) { gene_id = substr(a[i], index(a[i], "\"")+1, length(a[i])-index(a[i], "\"")-1); }
#         if (a[i] ~ /gene_name/) { gene_name = substr(a[i], index(a[i], "\"")+1, length(a[i])-index(a[i], "\"")-1); }
#     }
#     print $1"\t"$4-1"\t"$5"\t"gene_id"|"gene_name"\t0\t"$7
# }' > "${GENCODE_GENES_BED}"

# Overlap reproducible peaks with Gencode M10 gene annotations
# -a: input reproducible peaks BED file
# -b: Gencode gene annotations BED file
# -loj: left outer join, reports all entries in A, with corresponding entries in B if an overlap is found.
#       If no overlap, B fields are reported as NULL. This is common for annotating peaks.
bedtools intersect -a "${REPRODUCIBLE_PEAKS}" -b "${GENCODE_GENES_BED}" -loj > "${ANNOTATED_PEAKS}"

View on GitHub

Tools Used

eCLIP STAR

Raw Source Text

Sequenced reads were removed of inline barcodes and reformatted to include randomers in read headers with eclipdemux (v0.0.1). Args: --length 10
Reads were then trimmed with cutadapt (1.9.1). Args: --match-read-wildcards -O 1 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -g g_adapters.fasta -A A_adapters.fasta -a a_adapters.fasta (fasta sequences generated from parsebarcodes.sh within the eclip 0.1.5+ pipeline)
Reads were then trimmed once more with cutadapt (1.9.1) to remove double-ligation events. Args: --match-read-wildcards -O 1 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -A A_adapters.fasta (fasta sequences generated from parsebarcodes.sh within the eclip 0.1.5+ pipeline)
Trimmed reads were then mapped with STAR (2.4.0i) against a mouse-specific repeat element database (RepBase 18.05). Args: --runThreadN 16 \  --genomeDir mouse_repbase \  --readFilesIn path/to/read1 path/to/read2 \  --outFileNamePrefix out_prefix \  --outReadsUnmapped Fastx \  --outSAMtype BAM Unsorted \  --outSAMattributes All \  --outSAMunmapped Within \  --outSAMattrRGline ID:foo \  --outFilterType BySJout \  --outFilterMultimapNmax 30 \  --outFilterMultimapScoreRange 1 \  --outFilterScoreMin 10 \  --alignEndsType EndToEnd
Unmapped reads filtered of repeat elements were then mapped with STAR (2.4.0i) against a mouse genome (mm10). Args: --runThreadN 16 \  --genomeDir genomedir \  --readFilesIn /path/to/read1 /path/to/read2 \  --outFileNamePrefix out_prefix \  --outReadsUnmapped Fastx \  --outSAMtype BAM   Unsorted \  --outSAMattributes All \  --outSAMunmapped Within \  --outSAMattrRGline ID:foo \  --outFilterType BySJout \  --outFilterMultimapNmax 1 \  --outFilterMultimapScoreRange 1 \  --outFilterScoreMin 10 \  --alignEndsType EndToEnd
Aligned reads were sorted with samtools (1.4.1)
Sorted reads were collapsed with barcodecollapsepe.py included in eclip 0.1.5+ pipelines. Args: -o preRmDup.bam -m metrics_file -b rmDup.bam
PCR de-duped reads from each inline barcode were then merged with samtools (1.4.1) merge (merged.bam)
Merged alignments were split to keep just read2 using samtools (1.4.1) view. Args: -h -b -f 128
Read2 BAM files were used to identify peak clusters with Clipper (1.2.2). Args: --species mm10 --bam path/to/input.bam --timeout 3600000 --maxgenes 1000000 --save-pickle --outfile path/to/output.bam
Peak clusters were normalized using read2 BAM files for IP against read2 BAM files for INPUT with peaksnormalize.pl (overlap_peakfi_with_bam_PE.pl), included in eclip 0.1.5+.
Overlapping normalized peak regions were merged with compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl, included within eclip-0.1.5+
Normalized peak files were ranked by entropy score (make_informationcontent_from_peaks.pl included within the merge_peaks pipeline) and used as inputs to IDR (2.0.2) to determine reproducible peaks.
Reproducible peaks were annotated by overlapping peak regions with Gencode M10 annotations
Genome_build: mm10
Supplementary_files_format_and_content: tab-delimited text files include -log10pValues and Log2FoldChange values for each IDR peaks called after cutoff of 3,3 in each parameter

← Back to Analysis