GSE124023 Processing Pipeline

RIP-Seq code_examples 22 steps

Publication

DDX5 promotes oncogene C3 and FABP1 expressions and drives intestinal inflammation and tumorigenesis.

Life science alliance (2020) — PMID 32817263

Dataset

GSE124023

DDX5 targets tissue specific RNAs to promote intestine tumorigenesis

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Sequenced reads were removed of inline barcodes and reformatted to include randomers in read headers with eclipdemux (v0.0.1).

    eclipdemux v0.0.1 GitHub
    $ Bash example
    # Install eclipdemux (assuming it's a Python package, often part of a larger workflow like yeolab/eclip or yeolab/skipper)
    # pip install eclipdemux
    
    # Define input and output files (placeholders)
    INPUT_FASTQ="raw_reads.fastq.gz"
    OUTPUT_FASTQ="demuxed_reads.fastq.gz"
    BARCODE_FILE="inline_barcodes.txt" # Placeholder: file defining inline barcodes to be removed
    RANDOMER_LENGTH=10 # Placeholder: length of randomers to be added to read headers
    
    # Execute eclipdemux to remove inline barcodes and add randomers to read headers
    # The exact command might vary slightly depending on how eclipdemux is installed (e.g., as a standalone script or a Python module).
    # This example assumes it's run as a Python module, as seen in some Yeo lab pipelines.
    python -m eclipdemux.demux -i "${INPUT_FASTQ}" -o "${OUTPUT_FASTQ}" -b "${BARCODE_FILE}" -r "${RANDOMER_LENGTH}"
  2. 2

    Args: --length 10

    N/A (Inferred with models/gemini-2.5-flash) vN/A
    $ Bash example
    echo "A specific tool cannot be inferred from 'Args: --length 10' without additional context (e.g., assay type, specific task). This argument could be used in various tools for filtering, k-mer length, motif length, etc."
    # Example placeholder if a tool were known:
    # some_tool --length 10 input.file > output.file
  3. 3

    Reads were then trimmed with cutadapt (1.9.1).

    cutadapt v1.9.1 GitHub
    $ Bash example
    # Install cutadapt (if not already installed)
    # conda install -c bioconda cutadapt=1.9.1
    
    # Define input and output file names
    INPUT_READ1="input_R1.fastq.gz"
    INPUT_READ2="input_R2.fastq.gz"
    OUTPUT_READ1="trimmed_R1.fastq.gz"
    OUTPUT_READ2="trimmed_R2.fastq.gz"
    
    # Define common Illumina adapter sequences (adjust if specific adapters are known)
    # For Read 1 (forward strand)
    ADAPTER_R1="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA"
    # For Read 2 (reverse complement of Read 1 adapter, or specific R2 adapter)
    ADAPTER_R2="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT"
    
    # Trim adapters and quality trim reads
    # -a: 3' adapter for Read 1
    # -A: 3' adapter for Read 2
    # -q: Quality cutoff (e.g., 20 for both ends)
    # -m: Minimum read length after trimming (e.g., 20 bp)
    # -o: Output file for Read 1
    # -p: Output file for Read 2
    cutadapt -a "${ADAPTER_R1}" -A "${ADAPTER_R2}" \
             -q 20,20 -m 20 \
             -o "${OUTPUT_READ1}" -p "${OUTPUT_READ2}" \
             "${INPUT_READ1}" "${INPUT_READ2}"
  4. 4

    Args: --match-read-wildcards -O 1 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -g g_adapters.fasta -A A_adapters.fasta -a a_adapters.fasta (fasta sequences generated from parsebarcodes.sh within the eclip 0.1.5+ pipeline)

    $ Bash example
    # Install cutadapt if not already available
    # conda install -c bioconda cutadapt
    
    # Placeholder for input FASTQ file (e.g., raw reads after demultiplexing)
    INPUT_FASTQ="input_reads.fastq.gz"
    # Placeholder for output trimmed FASTQ file
    OUTPUT_TRIMMED_FASTQ="trimmed_reads.fastq.gz"
    
    # Adapter sequences generated from parsebarcodes.sh within the eclip 0.1.5+ pipeline
    # These files would contain the specific adapter sequences to be trimmed.
    # For example, g_adapters.fasta might contain 5' adapters, A_adapters.fasta and a_adapters.fasta 3' adapters.
    # Ensure these files are present in the working directory or provide their full paths.
    G_ADAPTERS_FASTA="g_adapters.fasta"
    A_ADAPTERS_FASTA="A_adapters.fasta"
    a_ADAPTERS_FASTA="a_adapters.fasta"
    
    # Execute cutadapt with the specified arguments for adapter trimming and quality filtering
    cutadapt \
      --match-read-wildcards \
      -O 1 \
      --times 1 \
      -e 0.1 \
      --quality-cutoff 6 \
      -m 18 \
      -g "${G_ADAPTERS_FASTA}" \
      -A "${A_ADAPTERS_FASTA}" \
      -a "${a_ADAPTERS_FASTA}" \
      -o "${OUTPUT_TRIMMED_FASTQ}" \
      "${INPUT_FASTQ}"
  5. 5

    Reads were then trimmed once more with cutadapt (1.9.1) to remove double-ligation events.

    cutadapt v1.9.1 GitHub
    $ Bash example
    # Install cutadapt (version 1.9.1)
    # conda create -n cutadapt_env cutadapt=1.9.1
    # conda activate cutadapt_env
    
    # Define input and output files
    INPUT_FASTQ="input_reads.fastq.gz"
    OUTPUT_FASTQ="trimmed_reads.fastq.gz"
    
    # Define the 3' adapter sequence commonly used in eCLIP assays.
    # This adapter helps remove double-ligation events where the adapter ligates to itself or to the 3' end of the read.
    ADAPTER_SEQUENCE="AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC"
    
    # Trim reads using cutadapt to remove the 3' adapter and filter for minimum length.
    # -a: Specifies a 3' adapter sequence.
    # -m: Discard reads shorter than MIN_LENGTH.
    cutadapt -a "${ADAPTER_SEQUENCE}" \
             -m 15 \
             -o "${OUTPUT_FASTQ}" \
             "${INPUT_FASTQ}"
  6. 6

    Args: --match-read-wildcards -O 1 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -A A_adapters.fasta (fasta sequences generated from parsebarcodes.sh within the eclip 0.1.5+ pipeline)

    $ Bash example
    # Install cutadapt if not already installed
    # conda install -c bioconda cutadapt
    
    # Placeholder for input FASTQ file (e.g., raw reads from sequencing)
    INPUT_FASTQ="input.fastq.gz"
    # Placeholder for output trimmed FASTQ file
    OUTPUT_TRIMMED_FASTQ="output_trimmed.fastq.gz"
    # Adapter file generated from parsebarcodes.sh within the eclip 0.1.5+ pipeline
    ADAPTER_FILE="A_adapters.fasta"
    
    cutadapt \
      --match-read-wildcards \
      -O 1 \
      --times 1 \
      -e 0.1 \
      --quality-cutoff 6 \
      -m 18 \
      -A "${ADAPTER_FILE}" \
      -o "${OUTPUT_TRIMMED_FASTQ}" \
      "${INPUT_FASTQ}"
  7. 7

    Trimmed reads were then mapped with STAR (2.4.0i) against a mouse-specific repeat element database (RepBase 18.05).

    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star
    
    # Define variables for input and reference files
    TRIMMED_READS="trimmed_reads.fastq.gz" # Replace with your actual trimmed reads file
    MOUSE_REPBASE_STAR_INDEX="path/to/mouse_repbase_18.05_star_index" # Replace with the actual path to the STAR index built from RepBase 18.05 mouse sequences
    OUTPUT_PREFIX="star_repbase_mapping_"
    NUM_THREADS=8 # Adjust as needed
    
    # Run STAR to map trimmed reads against the mouse-specific repeat element database
    STAR \
      --genomeDir "${MOUSE_REPBASE_STAR_INDEX}" \
      --readFilesIn "${TRIMMED_READS}" \
      --outFileNamePrefix "${OUTPUT_PREFIX}" \
      --outSAMtype BAM SortedByCoordinate \
      --outFilterMultimapNmax 100 \
      --runThreadN "${NUM_THREADS}"
    
  8. 8

    Args: --runThreadN 16 \ --genomeDir mouse_repbase \ --readFilesIn path/to/read1 path/to/read2 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 30 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd

    STAR (Inferred with models/gemini-2.5-flash) v2.7.10a GitHub
    $ Bash example
    # Install STAR using conda
    # conda create -n star_env star=2.7.10a -c bioconda -c conda-forge
    # conda activate star_env
    
    # Placeholder for STAR genome index directory
    # Replace /path/to/STAR_index/GRCm39_with_repbase with the actual path to your STAR index.
    # This index should be generated for the mouse genome (e.g., GRCm39) and potentially include repetitive elements.
    # Example command for genome generation:
    # STAR --runThreadN <num_threads> --runMode genomeGenerate --genomeDir /path/to/STAR_index/GRCm39_with_repbase --genomeFastaFiles /path/to/GRCm39.fasta --sjdbGTFfile /path/to/GRCm39.gtf --sjdbOverhang 100
    GENOME_DIR="/path/to/STAR_index/GRCm39_with_repbase"
    
    # Placeholder for input FASTQ files
    READ1="path/to/read1.fastq.gz"
    READ2="path/to/read2.fastq.gz"
    
    # Placeholder for output prefix
    OUTPUT_PREFIX="out_prefix"
    
    STAR \
      --runThreadN 16 \
      --genomeDir "${GENOME_DIR}" \
      --readFilesIn "${READ1}" "${READ2}" \
      --outFileNamePrefix "${OUTPUT_PREFIX}" \
      --outReadsUnmapped Fastx \
      --outSAMtype BAM Unsorted \
      --outSAMattributes All \
      --outSAMunmapped Within \
      --outSAMattrRGline ID:foo \
      --outFilterType BySJout \
      --outFilterMultimapNmax 30 \
      --outFilterMultimapScoreRange 1 \
      --outFilterScoreMin 10 \
      --alignEndsType EndToEnd
  9. 9

    Unmapped reads filtered of repeat elements were then mapped with STAR (2.4.0i) against a mouse genome (mm10).

    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star=2.4.0i
    
    # Define variables
    # Input FASTQ file containing unmapped reads filtered of repeat elements
    INPUT_FASTQ="unmapped_reads.fastq.gz"
    
    # Path to the pre-built STAR genome index for the mouse genome (mm10).
    # This directory should contain files like Genome, SA, SAindex, etc.
    # Replace with the actual path to your mm10 STAR index.
    GENOME_INDEX_DIR="/path/to/mm10_star_index"
    
    # Prefix for output files (e.g., mapped_reads_Aligned.sortedByCoord.out.bam)
    OUTPUT_PREFIX="mapped_reads_"
    
    # Number of threads to use for alignment
    NUM_THREADS=8 # Adjust based on available resources
    
    # Run STAR alignment
    STAR --genomeDir "${GENOME_INDEX_DIR}" \
         --readFilesIn "${INPUT_FASTQ}" \
         --runThreadN "${NUM_THREADS}" \
         --outFileNamePrefix "${OUTPUT_PREFIX}" \
         --outSAMtype BAM SortedByCoordinate \
         --outSAMattributes Standard
    
  10. 10

    Args: --runThreadN 16 \ --genomeDir genomedir \ --readFilesIn /path/to/read1 /path/to/read2 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 1 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd

    STAR (Inferred with models/gemini-2.5-flash) v2.7.10a (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    STAR_GENOME_DIR="/path/to/STAR_genome_index_GRCh38"
    READ1_FILE="/path/to/read1.fastq.gz"
    READ2_FILE="/path/to/read2.fastq.gz"
    OUTPUT_PREFIX="aligned_output"
    
    # Install STAR using conda
    # conda install -c bioconda star
    
    # Note: The genome index must be pre-built using STAR's genomeGenerate command.
    # For human (GRCh38), you might download pre-built indices or create your own.
    
    STAR \
      --runThreadN 16 \
      --genomeDir "${STAR_GENOME_DIR}" \
      --readFilesIn "${READ1_FILE}" "${READ2_FILE}" \
      --outFileNamePrefix "${OUTPUT_PREFIX}" \
      --outReadsUnmapped Fastx \
      --outSAMtype BAM Unsorted \
      --outSAMattributes All \
      --outSAMunmapped Within \
      --outSAMattrRGline ID:foo \
      --outFilterType BySJout \
      --outFilterMultimapNmax 1 \
      --outFilterMultimapScoreRange 1 \
      --outFilterScoreMin 10 \
      --alignEndsType EndToEnd
  11. 11

    Aligned reads were sorted with samtools (1.4.1)

    samtools v1.4.1 GitHub
    $ Bash example
    # Install samtools if not already available
    # conda install -c bioconda samtools=1.4.1
    
    # Sort aligned reads
    # Replace 'aligned_reads.bam' with your actual input BAM file
    # Replace 'sorted_aligned_reads.bam' with your desired output sorted BAM file
    samtools sort -o sorted_aligned_reads.bam aligned_reads.bam
  12. 12

    Sorted reads were collapsed with barcodecollapsepe.py included in eclip 0.1.5+ pipelines.

    $ Bash example
    # Install eclip-utils, which contains barcodecollapsepe.py
    # conda install -c bioconda eclip-utils
    # or
    # pip install eclip-utils
    
    # Define input and output files
    INPUT_R1="sorted_reads_R1.fastq.gz"
    INPUT_R2="sorted_reads_R2.fastq.gz"
    OUTPUT_PREFIX="collapsed_reads"
    
    # Collapse sorted reads using barcodecollapsepe.py
    barcodecollapsepe.py -1 "${INPUT_R1}" -2 "${INPUT_R2}" -o "${OUTPUT_PREFIX}"
  13. 13

    Args: -o preRmDup.bam -m metrics_file -b rmDup.bam

    sambamba (Inferred with models/gemini-2.5-flash) v0.8.0 GitHub
    $ Bash example
    # Install sambamba
    # conda install -c bioconda sambamba
    
    # Execute sambamba markdup
    sambamba markdup -o preRmDup.bam -m markdup_metrics.txt rmDup.bam
  14. 14

    PCR de-duped reads from each inline barcode were then merged with samtools (1.4.1) merge (merged.bam)

    samtools v1.4.1 GitHub
    $ Bash example
    # Install samtools if not already installed
    # conda install -c bioconda samtools
    
    # Merge PCR de-duped reads from each inline barcode
    # Replace input_deduped_barcode_1.bam, input_deduped_barcode_2.bam with your actual de-duped BAM files
    samtools merge merged.bam input_deduped_barcode_1.bam input_deduped_barcode_2.bam
  15. 15

    Merged alignments were split to keep just read2 using samtools (1.4.1) view.

    samtools v1.4.1 GitHub
    $ Bash example
    # Input: merged_alignments.bam (placeholder for merged alignment file)
    # Output: read2_alignments.bam (placeholder for output file containing only read2)
    
    # Split merged alignments to keep just read2
    # -b: Output BAM format
    # -f 0x80: Select alignments where the 'read is second in pair' flag (0x80) is set.
    samtools view -b -f 0x80 merged_alignments.bam > read2_alignments.bam
  16. 16

    Args: -h -b -f 128

    Unknown (Inferred with models/gemini-2.5-flash) vUnknown
    $ Bash example
    # The specific tool is not provided in the description.
    # This command uses a placeholder for the tool based on the given arguments.
    tool_placeholder -h -b -f 128
  17. 17

    Read2 BAM files were used to identify peak clusters with Clipper (1.2.2).

    CLIPper v1.2.2 GitHub
    $ Bash example
    # Install CLIPper (if not already installed)
    # pip install clipper
    # Or using conda:
    # conda install -c bioconda clipper
    
    # Placeholder for input Read2 BAM file and output prefix
    INPUT_BAM="input_read2.bam"
    OUTPUT_PREFIX="output_peaks"
    
    # Placeholder for genome reference files (using hg38 as the latest common assembly)
    # Ensure these files are available in your environment
    GENOME_FASTA="hg38.fa"
    CHROM_SIZES="hg38.chrom.sizes"
    
    # Run CLIPper to identify peak clusters
    clipper.py -b ${INPUT_BAM} -o ${OUTPUT_PREFIX} -s ${CHROM_SIZES} -f ${GENOME_FASTA}
    
  18. 18

    Args: --species mm10 --bam path/to/input.bam --timeout 3600000 --maxgenes 1000000 --save-pickle --outfile path/to/output.bam

    (Inferred with models/gemini-2.5-flash) vNot specified
    $ Bash example
    # This tool is inferred based on the arguments provided.
    # The '--save-pickle' argument strongly suggests it's a Python script.
    # Replace 'python_script.py' with the actual script/tool name if known.
    
    # Example installation (replace with actual if known):
    # pip install some_python_package
    # or
    # git clone https://github.com/user/repo.git
    # cd repo
    # python setup.py install
    
    # Reference genome: mm10 (Mouse, GRCm38 assembly)
    # Source: UCSC Genome Browser or Ensembl
    
    python_script.py --species mm10 \
                     --bam path/to/input.bam \
                     --timeout 3600000 \
                     --maxgenes 1000000 \
                     --save-pickle \
                     --outfile path/to/output.bam
  19. 19

    Peak clusters were normalized using read2 BAM files for IP against read2 BAM files for INPUT with peaksnormalize.pl (overlap_peakfi_with_bam_PE.pl), included in eclip 0.1.5+.

    $ Bash example
    # Install eCLIP (or ensure scripts are in PATH)
    # For example, clone the repository and add scripts to PATH:
    # git clone https://github.com/yeolab/eclip.git
    # export PATH="$(pwd)/eclip/scripts:$PATH"
    
    # Placeholder variables for input files
    # Replace with actual paths to your IP and INPUT read2 BAM files and the peak file to be normalized
    IP_READ2_BAM="ip_sample_read2.bam"
    INPUT_READ2_BAM="input_sample_read2.bam"
    PEAK_FILE="initial_peak_clusters.bed"
    OUTPUT_PREFIX="normalized_peak_clusters"
    
    # Normalize peak clusters using peaksnormalize.pl
    # The script uses read2 BAM files for IP against read2 BAM files for INPUT.
    peaksnormalize.pl \
        --ip_bam "${IP_READ2_BAM}" \
        --input_bam "${INPUT_READ2_BAM}" \
        --peak_file "${PEAK_FILE}" \
        --output_prefix "${OUTPUT_PREFIX}"
  20. 20

    Overlapping normalized peak regions were merged with compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl, included within eclip-0.1.5+

    $ Bash example
    # Clone the eCLIP pipeline repository to get the script
    # git clone https://github.com/yeolab/eclip.git
    # cd eclip
    
    # Placeholder for input normalized peak regions from replicates
    # These would typically be BED files generated by a previous peak calling step
    INPUT_PEAKS_REP1="replicate1_normalized_peaks.bed"
    INPUT_PEAKS_REP2="replicate2_normalized_peaks.bed"
    
    # Placeholder for the output merged peak file
    OUTPUT_MERGED_PEAKS="merged_replicate_overlapping_peaks.bed"
    
    # Execute the script to merge overlapping normalized peak regions
    # The script takes multiple input BED files and outputs a single merged BED file
    perl scripts/compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl \
        "${INPUT_PEAKS_REP1}" \
        "${INPUT_PEAKS_REP2}" \
        > "${OUTPUT_MERGED_PEAKS}"
  21. 21

    Normalized peak files were ranked by entropy score (make_informationcontent_from_peaks.pl included within the merge_peaks pipeline) and used as inputs to IDR (2.0.2) to determine reproducible peaks.

    IDR v2.0.2 GitHub
    $ Bash example
    # Install IDR (if not already installed)
    # conda install -c bioconda idr=2.0.2
    
    # Placeholder for input files, which are "Normalized peak files ranked by entropy score"
    # These files would typically be generated by 'make_informationcontent_from_peaks.pl'
    # from the merge_peaks pipeline, for example:
    # perl make_informationcontent_from_peaks.pl -i replicate1.peaks.bed -o replicate1.ranked.bed
    # perl make_informationcontent_from_peaks.pl -i replicate2.peaks.bed -o replicate2.ranked.bed
    
    # Define input ranked peak files and output prefix
    INPUT_PEAKS_REP1="replicate1.ranked.bed"
    INPUT_PEAKS_REP2="replicate2.ranked.bed"
    OUTPUT_PREFIX="sample_id"
    
    # Run IDR to determine reproducible peaks between replicates.
    # Parameters are based on the 'run_idr.sh' script within the yeolab/merge_peaks pipeline.
    idr --plot \
        --log-output-file "${OUTPUT_PREFIX}.idr.log" \
        --output-file "${OUTPUT_PREFIX}.idr.peaks" \
        "${INPUT_PEAKS_REP1}" "${INPUT_PEAKS_REP2}"
  22. 22

    Reproducible peaks were annotated by overlapping peak regions with Gencode M10 annotations

    GENCODE v2.27.1 GitHub
    $ Bash example
    # Install bedtools
    # conda install -c bioconda bedtools
    
    # Define input and output files
    REPRODUCIBLE_PEAKS="reproducible_peaks.bed" # Placeholder for your reproducible peak regions (e.g., output from IDR)
    GENCODE_GTF_URL="ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M10/gencode.vM10.annotation.gtf.gz"
    GENCODE_GENES_BED="gencode.vM10.genes.bed" # Placeholder for Gencode M10 gene annotations in BED format
    ANNOTATED_PEAKS="annotated_peaks.bed"
    
    # Download Gencode M10 annotation GTF (if not already present)
    # wget -nc "${GENCODE_GTF_URL}"
    
    # Extract gene features from GTF and convert to BED format.
    # This step assumes you want to annotate peaks with gene regions.
    # A more robust GTF to BED conversion might be needed depending on specific annotation requirements (e.g., exons, transcripts).
    # Example using awk to get gene regions (chr, start, end, gene_id|gene_name, score, strand):
    # zcat gencode.vM10.annotation.gtf.gz | awk '$3 == "gene" {
    #     split($9, a, ";");
    #     gene_id=""; gene_name="";
    #     for (i=1; i<=length(a); i++) {
    #         if (a[i] ~ /gene_id/) { gene_id = substr(a[i], index(a[i], "\"")+1, length(a[i])-index(a[i], "\"")-1); }
    #         if (a[i] ~ /gene_name/) { gene_name = substr(a[i], index(a[i], "\"")+1, length(a[i])-index(a[i], "\"")-1); }
    #     }
    #     print $1"\t"$4-1"\t"$5"\t"gene_id"|"gene_name"\t0\t"$7
    # }' > "${GENCODE_GENES_BED}"
    
    # Overlap reproducible peaks with Gencode M10 gene annotations
    # -a: input reproducible peaks BED file
    # -b: Gencode gene annotations BED file
    # -loj: left outer join, reports all entries in A, with corresponding entries in B if an overlap is found.
    #       If no overlap, B fields are reported as NULL. This is common for annotating peaks.
    bedtools intersect -a "${REPRODUCIBLE_PEAKS}" -b "${GENCODE_GENES_BED}" -loj > "${ANNOTATED_PEAKS}"

Tools Used

Raw Source Text
Sequenced reads were removed of inline barcodes and reformatted to include randomers in read headers with eclipdemux (v0.0.1). Args: --length 10
Reads were then trimmed with cutadapt (1.9.1). Args: --match-read-wildcards -O 1 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -g g_adapters.fasta -A A_adapters.fasta -a a_adapters.fasta (fasta sequences generated from parsebarcodes.sh within the eclip 0.1.5+ pipeline)
Reads were then trimmed once more with cutadapt (1.9.1) to remove double-ligation events. Args: --match-read-wildcards -O 1 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -A A_adapters.fasta (fasta sequences generated from parsebarcodes.sh within the eclip 0.1.5+ pipeline)
Trimmed reads were then mapped with STAR (2.4.0i) against a mouse-specific repeat element database (RepBase 18.05). Args: --runThreadN 16 \  --genomeDir mouse_repbase \  --readFilesIn path/to/read1 path/to/read2 \  --outFileNamePrefix out_prefix \  --outReadsUnmapped Fastx \  --outSAMtype BAM Unsorted \  --outSAMattributes All \  --outSAMunmapped Within \  --outSAMattrRGline ID:foo \  --outFilterType BySJout \  --outFilterMultimapNmax 30 \  --outFilterMultimapScoreRange 1 \  --outFilterScoreMin 10 \  --alignEndsType EndToEnd
Unmapped reads filtered of repeat elements were then mapped with STAR (2.4.0i) against a mouse genome (mm10). Args: --runThreadN 16 \  --genomeDir genomedir \  --readFilesIn /path/to/read1 /path/to/read2 \  --outFileNamePrefix out_prefix \  --outReadsUnmapped Fastx \  --outSAMtype BAM   Unsorted \  --outSAMattributes All \  --outSAMunmapped Within \  --outSAMattrRGline ID:foo \  --outFilterType BySJout \  --outFilterMultimapNmax 1 \  --outFilterMultimapScoreRange 1 \  --outFilterScoreMin 10 \  --alignEndsType EndToEnd
Aligned reads were sorted with samtools (1.4.1)
Sorted reads were collapsed with barcodecollapsepe.py included in eclip 0.1.5+ pipelines. Args: -o preRmDup.bam -m metrics_file -b rmDup.bam
PCR de-duped reads from each inline barcode were then merged with samtools (1.4.1) merge (merged.bam)
Merged alignments were split to keep just read2 using samtools (1.4.1) view. Args: -h -b -f 128
Read2 BAM files were used to identify peak clusters with Clipper (1.2.2). Args: --species mm10 --bam path/to/input.bam --timeout 3600000 --maxgenes 1000000 --save-pickle --outfile path/to/output.bam
Peak clusters were normalized using read2 BAM files for IP against read2 BAM files for INPUT with peaksnormalize.pl (overlap_peakfi_with_bam_PE.pl), included in eclip 0.1.5+.
Overlapping normalized peak regions were merged with compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl, included within eclip-0.1.5+
Normalized peak files were ranked by entropy score (make_informationcontent_from_peaks.pl included within the merge_peaks pipeline) and used as inputs to IDR (2.0.2) to determine reproducible peaks.
Reproducible peaks were annotated by overlapping peak regions with Gencode M10 annotations
Genome_build: mm10
Supplementary_files_format_and_content: tab-delimited text files include -log10pValues and Log2FoldChange values for each IDR peaks called after cutoff of 3,3 in each parameter
← Back to Analysis