GSE113193 Processing Pipeline

OTHER code_examples 32 steps

Publication

Tissue-selective restriction of RNA editing of CaV1.3 by splicing factor SRSF9.

Nucleic acids research (2018) — PMID 29733375

Dataset

GSE113193

Tissue-selective restriction of RNA editing of CaV1.3 by splicing factor SRSF9

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Takes output from raw files.

    FastQC (Inferred with models/gemini-2.5-flash) v0.11.9 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Define input raw sequencing files (example)
    RAW_FILES="sample_R1.fastq.gz sample_R2.fastq.gz"
    
    # Define output directory for quality control reports
    OUTPUT_DIR="fastqc_reports"
    
    # Create output directory if it doesn't exist
    mkdir -p "${OUTPUT_DIR}"
    
    # Install FastQC (example using conda)
    # conda install -c bioconda fastqc
    
    # Run FastQC to generate quality control reports from raw files
    fastqc ${RAW_FILES} -o "${OUTPUT_DIR}"
  2. 2

    Run to trim off both 5’ and 3’ adapters on both reads.

    fastp (Inferred with models/gemini-2.5-flash) v0.23.2 GitHub
    $ Bash example
    # Install fastp (if not already installed)
    # conda install -c bioconda fastp=0.23.2
    
    # Define input and output file paths
    INPUT_READ1="read1.fastq.gz"
    INPUT_READ2="read2.fastq.gz"
    OUTPUT_READ1="trimmed_read1.fastq.gz"
    OUTPUT_READ2="trimmed_read2.fastq.gz"
    REPORT_JSON="fastp_report.json"
    REPORT_HTML="fastp_report.html"
    
    # Run fastp to trim adapters and perform quality filtering
    # -i: input read 1
    # -o: output read 1
    # -I: input read 2
    # -O: output read 2
    # --detect_adapter_for_pe: automatically detect and trim adapters for paired-end reads
    # --qualified_quality_phred 15: filter reads with quality score < 15
    # --length_required 20: discard reads shorter than 20 bp after trimming
    # --trim_poly_g: trim polyG tails (common in Illumina NextSeq/NovaSeq)
    # --trim_poly_x: trim polyX tails
    # -j 8: use 8 threads
    # -h: HTML report output
    # -j: JSON report output
    fastp \
      -i "${INPUT_READ1}" \
      -o "${OUTPUT_READ1}" \
      -I "${INPUT_READ2}" \
      -O "${OUTPUT_READ2}" \
      --detect_adapter_for_pe \
      --qualified_quality_phred 15 \
      --length_required 20 \
      --trim_poly_g \
      --trim_poly_x \
      -j "${REPORT_JSON}" \
      -h "${REPORT_HTML}" \
      -w 8
  3. 3

    Command: quality-cutoff 6 -m 18 -a NNNNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -g CTTCCGATCTACAAGTT -g CTTCCGATCTTGGTCCT -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGT AGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz /full/path/to/files/file_R1.C01.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.metrics

    cutadapt (Inferred with models/gemini-2.5-flash) v4.x (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install cutadapt (example using conda)
    # conda install -c bioconda cutadapt
    
    # Define input and output file paths
    INPUT_R1="/full/path/to/files/file_R1.C01.fastq.gz"
    INPUT_R2="/full/path/to/files/file_R2.C01.fastq.gz"
    OUTPUT_R1="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz"
    OUTPUT_R2="/full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz"
    METRICS_FILE="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.metrics"
    
    # Define adapter sequences
    ADAPTER_R1="NNNNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC"
    ADAPTER_G1="CTTCCGATCTACAAGTT"
    ADAPTER_G2="CTTCCGATCTTGGTCCT"
    ADAPTER_A_LIST=(
        "AACTTGTAGATCGGA"
        "AGGACCAAGATCGGA"
        "ACTTGTAGATCGGAA"
        "GGACCAAGATCGGAA"
        "CTTGT AGATCGGAAG" # Note: space in original command, likely a typo or specific sequence
        "GACCAAGATCGGAAG"
        "TTGTAGATCGGAAGA"
        "ACCAAGATCGGAAGA"
        "TGTAGATCGGAAGAG"
        "CCAAGATCGGAAGAG"
        "GTAGATCGGAAGAGC"
        "CAAGATCGGAAGAGC"
        "TAGATCGGAAGAGCG"
        "AAGATCGGAAGAGCG"
        "AGATCGGAAGAGCGT"
        "GATCGGAAGAGCGTC"
        "ATCGGAAGAGCGTCG"
        "TCGGAAGAGCGTCGT"
        "CGGAAGAGCGTCGTG"
        "GGAAGAGCGTCGTGT"
    )
    
    # Construct the -A options string
    ADAPTER_A_OPTIONS=""
    for adapter in "${ADAPTER_A_LIST[@]}"; do
        ADAPTER_A_OPTIONS+=" -A \"${adapter}\""
    done
    
    # Execute cutadapt for adapter and quality trimming
    # The original command 'quality-cutoff 6' is interpreted as 'cutadapt -q 6' for quality trimming.
    # The multiple -g and -A options are handled by cutadapt.
    cutadapt -q 6 \
             -m 18 \
             -a "${ADAPTER_R1}" \
             -g "${ADAPTER_G1}" \
             -g "${ADAPTER_G2}" \
             ${ADAPTER_A_OPTIONS} \
             -o "${OUTPUT_R1}" \
             -p "${OUTPUT_R2}" \
             "${INPUT_R1}" \
             "${INPUT_R2}" > "${METRICS_FILE}"
  4. 4

    Takes output from cutadapt round 1.

    cutadapt v1.18 GitHub
    $ Bash example
    # Install cutadapt if not already available
    # conda install -c bioconda cutadapt=1.18
    
    # Define input and output files
    INPUT_FASTQ="reads_from_cutadapt_round1.fastq.gz"
    OUTPUT_FASTQ="reads_trimmed_round2.fastq.gz"
    
    # Execute cutadapt for quality, length, and poly-A trimming
    # This step assumes a previous cutadapt round (round 1) for primary adapter trimming.
    # -a A{100}: Trim poly-A tails (up to 100 'A's) from the 3' end
    # -q 20,20: Trim low-quality bases from both ends (Phred score < 20)
    # --minimum-length 18: Discard reads shorter than 18 bp after trimming
    # -o: Output file
    cutadapt -a A{100} -q 20,20 --minimum-length 18 -o "${OUTPUT_FASTQ}" "${INPUT_FASTQ}"
  5. 5

    Run to trim off the 3’ adapters on read 2, to control for double ligation events.

    cutadapt (Inferred with models/gemini-2.5-flash) v4.0 GitHub
    $ Bash example
    # Install cutadapt (example using conda)
    # conda install -c bioconda cutadapt=4.0
    
    # Define variables
    INPUT_R1="input_R1.fastq.gz"
    INPUT_R2="input_R2.fastq.gz"
    OUTPUT_R1="trimmed_R1.fastq.gz"
    OUTPUT_R2="trimmed_R2.fastq.gz"
    
    # Adapter sequence for the 3' end of Read 2.
    # For eCLIP assays, this is often the rP5 adapter or a variant, 
    # which is ligated to the 3' end of the RNA fragment.
    # Example sequence from yeolab/skipper eCLIP pipeline:
    ADAPTER_R2_SEQUENCE="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT"
    
    # Run cutadapt to trim 3' adapters from Read 2
    # -A ADAPTER_R2_SEQUENCE: Trims the given adapter from the 3' end of Read 2.
    #                         This is crucial for controlling double ligation events 
    #                         where the adapter might ligate to itself or another adapter.
    # -q 20: Trims low-quality bases from the 3' end (Phred score < 20).
    # -m 18: Discard reads shorter than 18 bp after trimming.
    # -o: Output file for Read 1
    # -p: Output file for Read 2
    cutadapt -A "${ADAPTER_R2_SEQUENCE}" \
             -q 20 \
             -m 18 \
             -o "${OUTPUT_R1}" \
             -p "${OUTPUT_R2}" \
             "${INPUT_R1}" \
             "${INPUT_R2}"
  6. 6

    Command: cutadapt -f fastq --match-read-wildcards --times 1 -e 0.1 -O 5 --quality-cutoff 6 -m 18 -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGTAGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.metrics

    cutadapt v4.0 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install cutadapt (e.g., using conda)
    # conda install -c bioconda cutadapt
    
    cutadapt -f fastq --match-read-wildcards --times 1 -e 0.1 -O 5 --quality-cutoff 6 -m 18 -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGTAGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.metrics
  7. 7

    Takes output from cutadapt round 2.

    cutadapt v4.0 GitHub
    $ Bash example
    # Install cutadapt (if not already installed)
    # conda install -c bioconda cutadapt=4.0
    
    # Define input and output files
    INPUT_FASTQ="input_from_cutadapt_round2.fastq.gz"
    OUTPUT_FASTQ="trimmed_round3.fastq.gz"
    LOG_FILE="cutadapt_round3.log"
    
    # Define parameters (these are placeholders and should be adjusted based on specific experimental design)
    # This step assumes further trimming is needed after an initial cutadapt round.
    # It could involve removing a different adapter, more stringent quality trimming, or length filtering.
    # Example: Trim remaining 3' adapters (e.g., poly-A or a different library adapter)
    # Example: Quality trim from 3' end with a cutoff of 20
    # Example: Discard reads shorter than 18 bp
    ADAPTER_SEQUENCE="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" # Placeholder for a common Illumina adapter or specific library adapter
    MIN_LENGTH=18
    QUALITY_CUTOFF=20
    
    # Execute cutadapt
    cutadapt \
        -a "${ADAPTER_SEQUENCE}" \
        --quality-cutoff="${QUALITY_CUTOFF}" \
        --minimum-length="${MIN_LENGTH}" \
        -o "${OUTPUT_FASTQ}" \
        "${INPUT_FASTQ}" \
        > "${LOG_FILE}" 2>&1
  8. 8

    Maps to human specific version of RepBase used to remove repetitive elements, helps control for spurious artifacts from rRNA (& other) repetitive reads.

    bbduk (Inferred with models/gemini-2.5-flash) v38.x GitHub
    $ Bash example
    # Install bbtools (if not already installed)
    # conda install -c bioconda bbtools
    
    # Download and prepare a human-specific RepBase library for bbduk.
    # This typically involves downloading RepBase sequences (e.g., from GIRI RepBase website),
    # filtering for human-specific elements, and compiling them into a single FASTA file.
    # Example placeholder for the RepBase reference file. The actual download and processing
    # of RepBase data may require registration or specific scripts.
    # For instance, one might download 'Homo_sapiens.fa' from RepBase and use it as the reference.
    REPBASE_REF="path/to/human_repbase.fasta" # Placeholder for the actual path to the RepBase reference file
    
    # Input FASTQ file (e.g., raw reads or adapter-trimmed reads)
    INPUT_FASTQ="input_reads.fastq.gz"
    
    # Output FASTQ file with repetitive reads removed
    OUTPUT_FASTQ="filtered_non_repetitive_reads.fastq.gz"
    
    # Run bbduk to remove reads mapping to human repetitive elements
    # k=31: kmer size for matching (default for contaminant removal)
    # hdist=1: allow 1 mismatch in kmer
    # stats: output statistics about removed reads
    # -Xmx: memory allocation
    bbduk.sh \
        in="$INPUT_FASTQ" \
        out="$OUTPUT_FASTQ" \
        ref="$REPBASE_REF" \
        k=31 \
        hdist=1 \
        stats=bbduk_repbase_filter_stats.txt \
        -Xmx4g
  9. 9

    Command: STAR --runMode alignReads --runThreadN 16 --genomeDir /path/to/RepBase_human_database_file --genomeLoad LoadAndRemove --readFilesIn /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz --outSAMunmapped Within --outFilterMultimapNmax 30 --outFilterMultimapScoreRange 1 --outFileNamePrefix /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam --outSAMattributes All --readFilesCommand zcat --outStd BAM_Unsorted --outSAMtype BAM Unsorted --outFilterType BySJout --outReadsUnmapped Fastx --outFilterScoreMin 10 --outSAMattrRGline ID:foo --alignEndsType EndToEnd > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam

    $ Bash example
    # Install STAR (example using conda)
    # conda install -c bioconda star
    
    # Define variables for clarity
    GENOME_DIR="/path/to/RepBase_human_database_file"
    READ_FILE_R1="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz"
    READ_FILE_R2="/full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz"
    OUTPUT_PREFIX="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam"
    FINAL_OUTPUT_BAM="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam"
    
    # Execute STAR alignment
    STAR --runMode alignReads \
         --runThreadN 16 \
         --genomeDir "${GENOME_DIR}" \
         --genomeLoad LoadAndRemove \
         --readFilesIn "${READ_FILE_R1}" "${READ_FILE_R2}" \
         --outSAMunmapped Within \
         --outFilterMultimapNmax 30 \
         --outFilterMultimapScoreRange 1 \
         --outFileNamePrefix "${OUTPUT_PREFIX}" \
         --outSAMattributes All \
         --readFilesCommand zcat \
         --outStd BAM_Unsorted \
         --outSAMtype BAM Unsorted \
         --outFilterType BySJout \
         --outReadsUnmapped Fastx \
         --outFilterScoreMin 10 \
         --outSAMattrRGline ID:foo \
         --alignEndsType EndToEnd > "${FINAL_OUTPUT_BAM}"
  10. 10

    Takes output from STAR rmRep.

    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star
    
    # Define variables
    GENOME_DIR="/path/to/STAR_genome_index/hg38" # Placeholder for STAR genome index directory (e.g., human hg38)
    READ1_FASTQ="input_read1.fastq.gz" # Placeholder for input FASTQ file (read 1)
    READ2_FASTQ="input_read2.fastq.gz" # Placeholder for input FASTQ file (read 2, remove if single-end)
    OUTPUT_PREFIX="aligned_reads" # Prefix for output files
    THREADS=8 # Number of threads to use
    RG_ID="sample_id" # Read group ID
    RG_SM="sample_name" # Sample name
    RG_LB="library_name" # Library name
    RG_PU="platform_unit" # Platform unit (e.g., flowcell-lane)
    
    # Create output directory if it doesn't exist
    mkdir -p star_output
    
    # Run STAR alignment (parameters adapted from Yeo lab eCLIP workflow)
    STAR \
      --runThreadN "${THREADS}" \
      --genomeDir "${GENOME_DIR}" \
      --readFilesIn "${READ1_FASTQ}" "${READ2_FASTQ}" \
      --readFilesCommand zcat \
      --outFileNamePrefix "star_output/${OUTPUT_PREFIX}_" \
      --outSAMtype BAM SortedByCoordinate \
      --outSAMattributes All \
      --outFilterMultimapNmax 1 \
      --outFilterMismatchNmax 999 \
      --outFilterMismatchNoverLmax 0.05 \
      --alignIntronMin 20 \
      --alignIntronMax 1000000 \
      --alignMatesGapMax 1000000 \
      --alignSJoverhangMin 8 \
      --alignSJDBoverhangMin 1 \
      --sjdbScore 1 \
      --limitSjdbInsertNsj 2000000 \
      --outFilterType BySJout \
      --outFilterScoreMinOverLread 0.3 \
      --outFilterMatchNoverLread 0.3 \
      --winAnchorMultimapNmax 50 \
      --seedSearchStartLmax 30 \
      --seedPerReadNmax 100000 \
      --seedPerWindowNmax 1000 \
      --alignTranscriptsPerReadNmax 100000 \
      --alignTranscriptsPerWindowNmax 10000 \
      --quantMode GeneCounts \
      --outSAMattrRGline ID:"${RG_ID}" SM:"${RG_SM}" LB:"${RG_LB}" PL:ILLUMINA PU:"${RG_PU}"
  11. 11

    Maps unique reads to the human genome.

    BWA-MEM (Inferred with models/gemini-2.5-flash) v0.7.17 GitHub
    $ Bash example
    # Install BWA and Samtools if not already available
    # conda install -c bioconda bwa samtools
    
    # Placeholder for human genome reference (e.g., hg38)
    # Ensure the reference genome is indexed with bwa index
    # bwa index human_genome.fa
    
    # Align reads to the human genome and filter for uniquely mapped reads
    # -t: number of threads
    # human_genome.fa: path to the indexed human reference genome
    # reads.fastq.gz: path to the input FASTQ file
    # | samtools view -bS -F 4 -q 20 - : pipe output to samtools to filter for mapped reads with MAPQ >= 20
    # | samtools sort -o aligned_unique_reads.bam - : pipe output to samtools to sort and save as BAM
    bwa mem -t 8 human_genome.fa reads.fastq.gz | \
    samtools view -bS -F 4 -q 20 - | \
    samtools sort -o aligned_unique_reads.bam -
    
    # Index the sorted BAM file
    samtools index aligned_unique_reads.bam
  12. 12

    Command: STAR --runMode alignReads --runThreadN 16 --genomeDir /path/to/STAR_database_file --genomeLoad LoadAndRemove --readFilesIn /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate1 /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate2 --outSAMunmapped Within --outFilterMultimapNmax 1 --outFilterMultimapScoreRange 1 --outFileNamePrefix /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam --outSAMattributes All --outStd BAM_Unsorted --outSAMtype BAM Unsorted --outFilterType BySJout --outReadsUnmapped Fastx --outFilterScoreMin 10 --outSAMattrRGline ID:foo --alignEndsType EndToEnd > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam

    $ Bash example
    # Install STAR (example using Conda):
    # conda install -c bioconda star
    
    # Define variables
    STAR_GENOME_DIR="/path/to/star_index/GRCh38" # Placeholder: Replace with the actual path to your STAR genome index directory (e.g., for human hg38)
    INPUT_R1="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate1" # Replace with actual path to mate1 FASTQ/FASTA file
    INPUT_R2="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate2" # Replace with actual path to mate2 FASTQ/FASTA file
    OUTPUT_BAM_PATH="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam" # The final aligned BAM file name and prefix for auxiliary files
    
    # Ensure the output directory exists
    mkdir -p "$(dirname "${OUTPUT_BAM_PATH}")"
    
    # Run STAR alignment
    STAR \
      --runMode alignReads \
      --runThreadN 16 \
      --genomeDir "${STAR_GENOME_DIR}" \
      --genomeLoad LoadAndRemove \
      --readFilesIn "${INPUT_R1}" "${INPUT_R2}" \
      --outSAMunmapped Within \
      --outFilterMultimapNmax 1 \
      --outFilterMultimapScoreRange 1 \
      --outFileNamePrefix "${OUTPUT_BAM_PATH}" \
      --outSAMattributes All \
      --outStd BAM_Unsorted \
      --outSAMtype BAM Unsorted \
      --outFilterType BySJout \
      --outReadsUnmapped Fastx \
      --outFilterScoreMin 10 \
      --outSAMattrRGline ID:foo \
      --alignEndsType EndToEnd > "${OUTPUT_BAM_PATH}"
    
  13. 13

    Takes output from STAR genome mapping.

    STAR v2.7.10a (Inferred from common usage for RNA-seq alignment) GitHub
    $ Bash example
    # Install STAR (example using conda)
    # conda install -c bioconda star
    
    # Define variables
    # Placeholder for STAR genome index (e.g., hg38). 
    # This directory should contain files generated by STAR --runMode genomeGenerate.
    GENOME_DIR="/path/to/STAR_index_hg38"
    READ1_FASTQ="sample_R1.fastq.gz"
    READ2_FASTQ="sample_R2.fastq.gz" # Comment out or remove if single-end
    OUTPUT_PREFIX="sample_"
    THREADS=8 # Number of threads to use
    
    # Run STAR genome mapping
    STAR \
      --genomeDir "${GENOME_DIR}" \
      --readFilesIn "${READ1_FASTQ}" "${READ2_FASTQ}" \
      --runThreadN "${THREADS}" \
      --outFileNamePrefix "${OUTPUT_PREFIX}" \
      --outSAMtype BAM SortedByCoordinate \
      --outSAMunmapped Within \
      --outSAMattributes Standard \
      --readFilesCommand zcat \
      --limitBAMsortRAM 30000000000 # Adjust based on available RAM, ~30GB recommended for large genomes
  14. 14

    Custom random-mer-aware script for PCR duplicate removal.

    umi_tools dedup (Inferred with models/gemini-2.5-flash) v1.1.2 GitHub
    $ Bash example
    # Install umi_tools (example using conda)
    # conda install -c bioconda umi-tools
    
    # Example usage for PCR duplicate removal with umi_tools dedup
    # This assumes UMIs have already been extracted and appended to read names
    # (e.g., using 'umi_tools extract' prior to alignment).
    # The '--method directional' is commonly used for UMI-based deduplication.
    # The '--paired' flag is used for paired-end sequencing data.
    
    umi_tools dedup \
        -I input_aligned_reads.bam \
        -S deduplicated_reads.bam \
        --method directional \
        --paired \
        --log deduplication.log
  15. 15

    Command: barcode_collapse_pe.py --bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam --out_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam --metrics_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.metrics

    barcode_collapse_pe.py vNot specified, associated with yeolab/eclip project GitHub
    $ Bash example
    # Install dependencies (if not already installed)
    # This script requires pysam
    # conda install -c bioconda pysam
    
    # If barcode_collapse_pe.py is not in your PATH, you might need to clone the repository
    # and add its scripts directory to PATH, or call it directly:
    # git clone https://github.com/yeolab/eclip.git
    # # export PATH=$PATH:$(pwd)/eclip/scripts
    # # python eclip/scripts/barcode_collapse_pe.py ...
    
    # Define input and output file paths as they appear in the description
    INPUT_BAM="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam"
    OUTPUT_BAM="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam"
    METRICS_FILE="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.metrics"
    
    # Execute the command
    barcode_collapse_pe.py --bam "${INPUT_BAM}" --out_file "${OUTPUT_BAM}" --metrics_file "${METRICS_FILE}"
  16. 16

    Takes output from barcode collapse PE.

    STAR (Inferred with models/gemini-2.5-flash) v2.7.10a GitHub
    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star=2.7.10a
    
    # Define variables
    # INPUT_FASTQ: Placeholder for the gzipped FASTQ file containing deduplicated paired-end reads
    # from the barcode collapse PE step.
    INPUT_FASTQ="deduplicated_reads_PE.fastq.gz"
    
    # OUTPUT_PREFIX: Prefix for output files (e.g., aligned BAM, log files)
    OUTPUT_PREFIX="aligned_reads"
    
    # GENOME_DIR: Path to the pre-built STAR genome index directory.
    # For eCLIP, this is typically a human reference genome like hg38.
    # Replace with the actual path to your genome index.
    GENOME_DIR="/path/to/STAR_genome_index/hg38"
    
    # THREADS: Number of CPU threads to use for alignment.
    THREADS=8
    
    # RAM_LIMIT: Maximum RAM (in bytes) to be used for sorting BAM. Adjust based on available system RAM.
    # Example: 30GB = 30 * 1024^3 bytes
    RAM_LIMIT=30000000000
    
    # --- STAR Alignment Command ---
    # This command aligns the deduplicated paired-end reads to the reference genome.
    # It outputs a sorted BAM file and gene counts.
    STAR --runThreadN ${THREADS} \
         --genomeDir ${GENOME_DIR} \
         --readFilesIn ${INPUT_FASTQ} \
         --readFilesCommand zcat \
         --outFileNamePrefix ${OUTPUT_PREFIX}_ \
         --outSAMtype BAM SortedByCoordinate \
         --outSAMattributes All \
         --quantMode GeneCounts \
         --outFilterMultimapNmax 20 \
         --outFilterMismatchNmax 3 \
         --alignIntronMin 20 \
         --alignIntronMax 1000000 \
         --alignMatesGapMax 1000000 \
         --limitBAMsortRAM ${RAM_LIMIT}
    
  17. 17

    Sorts resulting bam file for use downstream.

    samtools (Inferred with models/gemini-2.5-flash) v1.10 GitHub
    $ Bash example
    # Install samtools (if not already installed)
    # conda install -c bioconda samtools=1.10
    
    # Sort the BAM file by coordinate
    # Replace 'input.bam' with your actual input BAM file
    # Replace 'output.sorted.bam' with your desired output sorted BAM file name
    # Adjust '-@ 8' for the number of threads/CPUs you want to use
    samtools sort -@ 8 -o output.sorted.bam input.bam
  18. 18

    Command: java -Xmx2048m -XX:+UseParallelOldGC -XX:ParallelGCThreads=4 -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Djava.io.tmpdir=/full/path/to/files/.queue/tmp -cp /path/to/gatk/dist/Queue.jar net.sf.picard.sam.SortSam INPUT=/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam TMP_DIR=/full/path/to/files/.queue/tmp OUTPUT=/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam VALIDATION_STRINGENCY=SILENT SO=coordinate CREATE_INDEX=true

    Picard v2.x (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install Picard (e.g., via Conda or by downloading the JAR)
    # conda install -c bioconda picard
    # Or download the picard.jar from https://github.com/broadinstitute/picard/releases (for standalone versions)
    
    # Define variables for paths and files
    # NOTE: The original command uses Queue.jar from a GATK distribution for the classpath.
    # This implies that the Picard classes were bundled within that specific GATK Queue.jar,
    # or that Queue.jar was part of a larger GATK distribution where Picard classes were accessible.
    # Adjust GATK_DIST_PATH to your specific GATK distribution path where Queue.jar resides.
    GATK_DIST_PATH="/path/to/gatk/dist" # Adjust this to your GATK distribution path
    INPUT_BAM="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam" # Input BAM file
    OUTPUT_BAM="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam" # Output sorted BAM file
    TMP_DIR="/full/path/to/files/.queue/tmp" # Temporary directory
    
    # Create the temporary directory if it doesn't exist
    mkdir -p "${TMP_DIR}"
    
    # Execute Picard SortSam command
    java -Xmx2048m \
      -XX:+UseParallelOldGC \
      -XX:ParallelGCThreads=4 \
      -XX:GCTimeLimit=50 \
      -XX:GCHeapFreeLimit=10 \
      -Djava.io.tmpdir="${TMP_DIR}" \
      -cp "${GATK_DIST_PATH}/Queue.jar" \
      net.sf.picard.sam.SortSam \
      INPUT="${INPUT_BAM}" \
      TMP_DIR="${TMP_DIR}" \
      OUTPUT="${OUTPUT_BAM}" \
      VALIDATION_STRINGENCY=SILENT \
      SO=coordinate \
      CREATE_INDEX=true
  19. 19

    Takes output from sortSam, makes bam index for use downstream.

    samtools index (Inferred with models/gemini-2.5-flash) v1.19 GitHub
    $ Bash example
    # Install samtools if not already installed
    # conda install -c bioconda samtools
    
    # Assuming 'sorted.bam' is the output from sortSam
    INPUT_BAM="sorted.bam"
    
    samtools index "${INPUT_BAM}"
  20. 20

    Command: samtools index /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam.bai

    samtools v1.10 GitHub
    $ Bash example
    # Install samtools if not already installed
    # conda install -c bioconda samtools=1.10
    
    # Create an index for the sorted BAM file
    samtools index /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam.bai
  21. 21

    Takes inputs from multiple final bam files.

    samtools (Inferred with models/gemini-2.5-flash) v1.19 GitHub
    $ Bash example
    bash
    # Install samtools if not already available
    # conda install -c bioconda samtools
    
    # Merges multiple sorted BAM files into a single sorted BAM file.
    # Replace input_1.bam, input_2.bam, etc., with your actual input BAM files.
    # Replace merged_output.bam with your desired output file name.
    samtools merge merged_output.bam input_1.bam input_2.bam input_3.bam
    
  22. 22

    Merges the two technical replicates for further downstream analysis.

    samtools merge (Inferred with models/gemini-2.5-flash) v1.10 GitHub
    $ Bash example
    # Install samtools if not already installed
    # conda install -c bioconda samtools
    
    # Merge technical replicates (BAM files)
    # Replace replicate1.bam and replicate2.bam with your actual input BAM files.
    # Replace merged_replicates.bam with your desired output merged BAM file name.
    samtools merge merged_replicates.bam replicate1.bam replicate2.bam
  23. 23

    Command: samtools merge /full/path/to/files/CombinedID.merged.bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam /full/path/to/files/file_R1.D08.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam

    samtools v1.10 GitHub
    $ Bash example
    # Install samtools (e.g., using conda)
    # conda install -c bioconda samtools
    
    # Define input and output files
    OUTPUT_BAM="/full/path/to/files/CombinedID.merged.bam"
    INPUT_BAM_1="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam"
    INPUT_BAM_2="/full/path/to/files/file_R1.D08.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam"
    
    # Execute samtools merge
    samtools merge "${OUTPUT_BAM}" "${INPUT_BAM_1}" "${INPUT_BAM_2}"
  24. 24

    Takes output from sortSam, makes bam index for use downstream.

    samtools (Inferred with models/gemini-2.5-flash) v1.10 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install samtools if not already installed
    # conda install -c bioconda samtools=1.10
    
    # Assuming 'sorted.bam' is the output from sortSam
    samtools index sorted.bam
  25. 25

    Command: samtools index /full/path/to/files/CombinedID.merged.bam /full/path/to/files/CombinedID.merged.bam.bai

    samtools v(Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install samtools if not already available
    # conda install -c bioconda samtools
    
    samtools index /full/path/to/files/CombinedID.merged.bam /full/path/to/files/CombinedID.merged.bam.bai
  26. 26

    Takes output from sortSam.

    samtools index (Inferred with models/gemini-2.5-flash) v1.19 GitHub
    $ Bash example
    # Install samtools if not already installed
    # conda install -c bioconda samtools
    
    # This command creates an index file (.bai) for a sorted BAM file,
    # which is essential for many downstream applications and visualization.
    # Replace 'sorted.bam' with the actual output file from sortSam.
    samtools index sorted.bam
  27. 27

    Only outputs the second read in each pair for use with single stranded peak caller.

    reformat.sh (Inferred with models/gemini-2.5-flash) v38.90 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install BBMap suite (contains reformat.sh)
    # conda install -c bioconda bbmap
    
    # Define input and output file names
    INPUT_R1="input_R1.fastq.gz"
    INPUT_R2="input_R2.fastq.gz"
    OUTPUT_R2_ONLY="output_R2_only.fastq.gz"
    
    # Only outputs the second read in each pair for use with single stranded peak caller.
    # 'in1' and 'in2' specify the paired-end input files.
    # 'out1=null' discards the first read.
    # 'out2' specifies the output file for the second read.
    reformat.sh in1="${INPUT_R1}" in2="${INPUT_R2}" out1=null out2="${OUTPUT_R2_ONLY}"
  28. 28

    This is the final bam file to perform analysis on.

    samtools (Inferred with models/gemini-2.5-flash) v1.19 GitHub
    $ Bash example
    # Install samtools if not already available
    # conda install -c bioconda samtools
    
    # Placeholder for the input BAM file (e.g., from an alignment step)
    INPUT_BAM="input.bam"
    
    # Define the output sorted BAM file name
    OUTPUT_SORTED_BAM="final.sorted.bam"
    
    # Sort the BAM file to prepare it for indexing and downstream analysis
    samtools sort -o "${OUTPUT_SORTED_BAM}" "${INPUT_BAM}"
    
    # Index the sorted BAM file for quick access to specific regions
    samtools index "${OUTPUT_SORTED_BAM}"
    
    echo "Final BAM file '${OUTPUT_SORTED_BAM}' and its index '${OUTPUT_SORTED_BAM}.bai' are ready for analysis."
  29. 29

    Command: samtools view -hb -f 128 /full/path/to/files/CombinedID.merged.bam > /full/path/to/files/CombinedID.merged.r2.bam

    samtools v1.10+ GitHub
    $ Bash example
    # Install samtools (if not already installed)
    # conda install -c bioconda samtools
    
    # Define input and output file paths
    INPUT_BAM="/full/path/to/files/CombinedID.merged.bam"
    OUTPUT_BAM="/full/path/to/files/CombinedID.merged.r2.bam"
    
    # Filter BAM file to extract reads that are the second in a pair
    # -h: Include header in the output
    # -b: Output in BAM format
    # -f 128: Only output reads with the FLAG bit 128 set (second in pair)
    samtools view -hb -f 128 "${INPUT_BAM}" > "${OUTPUT_BAM}"
  30. 30

    Takes results from samtools view.

    samtools v1.17 GitHub
    $ Bash example
    # Install samtools if not already available
    # conda install -c bioconda samtools
    
    # This command takes an input BAM file and outputs a SAM file, including the header.
    # Replace 'input.bam' with your actual input file and 'output.sam' with your desired output file.
    samtools view -h input.bam > output.sam
  31. 31

    Calls peaks on those files.

    clipper (Inferred with models/gemini-2.5-flash) vlatest (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install clipper (if not already installed)
    # git clone https://github.com/yeolab/clipper.git
    # cd clipper
    # pip install -r requirements.txt # Check if requirements.txt exists, if not, assume basic python dependencies
    
    # For demonstration, let's assume clipper.py is in the current PATH or directory
    # And placeholder BAM files exist: input_ip.bam, input_control.bam
    
    # Download hg38 chromosome sizes if not available (using UCSC as source)
    # wget -qO- http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes > hg38.chrom.sizes
    
    # Call peaks using clipper
    # Assuming input_ip.bam and input_control.bam are the input files
    # And hg38.chrom.sizes is the genome size file (placeholder for latest assembly)
    python clipper.py -b input_ip.bam -c input_control.bam -s hg38.chrom.sizes -o peaks_output
  32. 32

    Command: clipper -b /full/path/to/files/CombinedID.merged.r2.bam -s hg19 -o /full/path/to/files/CombinedID.merged.r2.peaks.bed --bonferroni --superlocal --threshold-method binomial --save-pickle

    CLIPper vNot specified GitHub
    $ Bash example
    # Install CLIPper (example, adjust as needed)
    # CLIPper is a Python-based tool. It can often be run directly after cloning or installed via pip.
    # git clone https://github.com/yeolab/clipper.git
    # cd clipper
    # python setup.py install
    
    # Ensure required Python dependencies are installed (e.g., numpy, scipy, pysam)
    # conda install -c bioconda numpy scipy pysam
    
    # Define input and output paths
    INPUT_BAM="/full/path/to/files/CombinedID.merged.r2.bam"
    OUTPUT_BED="/full/path/to/files/CombinedID.merged.r2.peaks.bed"
    GENOME_ASSEMBLY="hg19" # Reference genome assembly for CLIPper
    
    # Execute CLIPper command for peak calling
    clipper -b "${INPUT_BAM}" -s "${GENOME_ASSEMBLY}" -o "${OUTPUT_BED}" --bonferroni --superlocal --threshold-method binomial --save-pickle

Tools Used

Raw Source Text
Takes output from raw files. Run to trim off both 5’ and 3’ adapters on both reads. Command: quality-cutoff 6 -m 18 -a NNNNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -g CTTCCGATCTACAAGTT -g CTTCCGATCTTGGTCCT -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGT AGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz /full/path/to/files/file_R1.C01.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.metrics
Takes output from cutadapt round 1. Run to trim off the 3’ adapters on read 2, to control for double ligation events. Command: cutadapt -f fastq --match-read-wildcards --times 1 -e 0.1 -O 5 --quality-cutoff 6 -m 18 -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGTAGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.metrics
Takes output from cutadapt round 2. Maps to human specific version of RepBase used to remove repetitive elements, helps control for spurious artifacts from rRNA (& other) repetitive reads. Command: STAR --runMode alignReads --runThreadN 16 --genomeDir /path/to/RepBase_human_database_file --genomeLoad LoadAndRemove --readFilesIn /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz --outSAMunmapped Within --outFilterMultimapNmax 30 --outFilterMultimapScoreRange 1 --outFileNamePrefix /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam --outSAMattributes All --readFilesCommand zcat --outStd BAM_Unsorted --outSAMtype BAM Unsorted --outFilterType BySJout --outReadsUnmapped Fastx --outFilterScoreMin 10 --outSAMattrRGline ID:foo --alignEndsType EndToEnd > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam
Takes output from STAR rmRep. Maps unique reads to the human genome. Command: STAR --runMode alignReads --runThreadN 16 --genomeDir /path/to/STAR_database_file --genomeLoad LoadAndRemove --readFilesIn /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate1 /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate2 --outSAMunmapped Within --outFilterMultimapNmax 1 --outFilterMultimapScoreRange 1 --outFileNamePrefix /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam --outSAMattributes All --outStd BAM_Unsorted --outSAMtype BAM Unsorted --outFilterType BySJout --outReadsUnmapped Fastx --outFilterScoreMin 10 --outSAMattrRGline ID:foo --alignEndsType EndToEnd > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam
Takes output from STAR genome mapping. Custom random-mer-aware script for PCR duplicate removal. Command: barcode_collapse_pe.py --bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam --out_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam --metrics_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.metrics
Takes output from barcode collapse PE. Sorts resulting bam file for use downstream. Command: java -Xmx2048m -XX:+UseParallelOldGC -XX:ParallelGCThreads=4 -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Djava.io.tmpdir=/full/path/to/files/.queue/tmp -cp /path/to/gatk/dist/Queue.jar net.sf.picard.sam.SortSam INPUT=/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam TMP_DIR=/full/path/to/files/.queue/tmp OUTPUT=/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam VALIDATION_STRINGENCY=SILENT SO=coordinate CREATE_INDEX=true
Takes output from sortSam, makes bam index for use downstream. Command: samtools index /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam.bai
Takes inputs from multiple final bam files. Merges the two technical replicates for further downstream analysis. Command: samtools merge /full/path/to/files/CombinedID.merged.bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam /full/path/to/files/file_R1.D08.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam
Takes output from sortSam, makes bam index for use downstream. Command: samtools index /full/path/to/files/CombinedID.merged.bam /full/path/to/files/CombinedID.merged.bam.bai
Takes output from sortSam. Only outputs the second read in each pair for use with single stranded peak caller. This is the final bam file to perform analysis on. Command: samtools view -hb -f 128 /full/path/to/files/CombinedID.merged.bam > /full/path/to/files/CombinedID.merged.r2.bam
Takes results from samtools view. Calls peaks on those files. Command: clipper -b /full/path/to/files/CombinedID.merged.r2.bam -s hg19 -o /full/path/to/files/CombinedID.merged.r2.peaks.bed --bonferroni --superlocal --threshold-method binomial --save-pickle
Genome_build: hg19
Supplementary_files_format_and_content: bigWig format. Maps the raw reads to both hg19 and mm10 respectively. Describes the mapped reads from the eCLIP experiments.
Supplementary_files_format_and_content: bed format, contains clusters of predicted RBP binding (on hg19 and mm10 respectively); column 4 contains -log10(p-value) and column 5 contains log2(fold-enrichment) in eCLIP versus paired size-matched input
← Back to Analysis