GSE179634 Processing Pipeline

RIP-Seq code_examples 32 steps

Publication

Splicing factor SRSF1 deficiency in the liver triggers NASH-like pathology and cell death.

Nature communications (2023) — PMID 36759613

Dataset

GSE179634

Splicing Factor SRSF1 Deficiency in the Liver Triggers NASH-like Pathology via R-Loop Induced DNA Damage and Cell Death

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Takes output from raw files.

    N/A (Inferred with models/gemini-2.5-flash) vN/A
  2. 2

    Run to trim off both 5’ and 3’ adapters on both reads.

    cutadapt (Inferred with models/gemini-2.5-flash) v4.0 (Inferred with models/gemini-2.5-flash)
    $ Bash example
    # Install cutadapt if not already installed
    # conda install -c bioconda cutadapt=4.0
    
    # Define input and output files
    INPUT_R1="input_R1.fastq.gz"
    INPUT_R2="input_R2.fastq.gz"
    OUTPUT_R1="trimmed_R1.fastq.gz"
    OUTPUT_R2="trimmed_R2.fastq.gz"
    
    # Define common Illumina adapter sequences
    # These are placeholders; actual adapters should be determined from library prep
    # ADAPTER_FWD is typically the Illumina Universal Adapter for Read 1
    ADAPTER_FWD="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA"
    # ADAPTER_REV is typically the Illumina Small RNA 3' Adapter or similar for Read 2
    ADAPTER_REV="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT"
    
    # Run cutadapt to trim 3' adapters from both reads.
    # cutadapt's -a and -A flags search for and remove the adapter sequence
    # from anywhere in the read, effectively handling both 5' and 3' occurrences
    # if the adapter sequence itself is present. For explicit 5' fixed-length
    # trimming (e.g., random Ns), -g or -G with ^ADAPTER would be used,
    # but this is not specified in the description.
    #
    # Optional common parameters (not included in the core command as not specified in description):
    # -j <threads>: Number of CPU threads to use.
    # -m <min_len>: Discard reads shorter than <min_len> after trimming.
    # -q <qual_trim>: Trim low-quality bases from 3' end.
    cutadapt -a "${ADAPTER_FWD}" \
             -A "${ADAPTER_REV}" \
             -o "${OUTPUT_R1}" \
             -p "${OUTPUT_R2}" \
             "${INPUT_R1}" \
             "${INPUT_R2}"
  3. 3

    Command: quality-cutoff 6 -m 18 -a NNNNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -g CTTCCGATCTACAAGTT -g CTTCCGATCTTGGTCCT -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGT AGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz /full/path/to/files/file_R1.C01.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.metrics

    eclip vN/A (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Clone the eCLIP repository if not already installed
    # git clone https://github.com/yeolab/eclip.git
    # cd eclip
    # # It's recommended to use a virtual environment
    # # conda create -n eclip_env python=3.8
    # # conda activate eclip_env
    # # pip install -r requirements.txt
    # # Ensure 'quality-cutoff' (which is typically 'python scripts/quality_cutoff.py') is accessible in your PATH or run directly.
    
    # Define input and output paths
    INPUT_R1="/full/path/to/files/file_R1.C01.fastq.gz"
    INPUT_R2="/full/path/to/files/file_R2.C01.fastq.gz"
    OUTPUT_R1="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz"
    OUTPUT_R2="/full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz"
    METRICS_FILE="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.metrics"
    
    # Execute the quality-cutoff command
    # Note: The original command had '-A CTTGT AGATCGGAAG'.
    # Based on the quality_cutoff.py script's argument parsing for multiple -A flags,
    # it is assumed this was a typo and should be two separate -A flags for two adapter fragments.
    quality-cutoff 6 \
      -m 18 \
      -a NNNNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC \
      -g CTTCCGATCTACAAGTT \
      -g CTTCCGATCTTGGTCCT \
      -A AACTTGTAGATCGGA \
      -A AGGACCAAGATCGGA \
      -A ACTTGTAGATCGGAA \
      -A GGACCAAGATCGGAA \
      -A CTTGT \
      -A AGATCGGAAG \
      -A TTGTAGATCGGAAGA \
      -A ACCAAGATCGGAAGA \
      -A TGTAGATCGGAAGAG \
      -A CCAAGATCGGAAGAG \
      -A GTAGATCGGAAGAGC \
      -A CAAGATCGGAAGAGC \
      -A TAGATCGGAAGAGCG \
      -A AAGATCGGAAGAGCG \
      -A AGATCGGAAGAGCGT \
      -A GATCGGAAGAGCGTC \
      -A ATCGGAAGAGCGTCG \
      -A TCGGAAGAGCGTCGT \
      -A CGGAAGAGCGTCGTG \
      -A GGAAGAGCGTCGTGT \
      -o "${OUTPUT_R1}" \
      -p "${OUTPUT_R2}" \
      "${INPUT_R1}" \
      "${INPUT_R2}" > "${METRICS_FILE}"
  4. 4

    Takes output from cutadapt round 1.

    cutadapt v2.10 GitHub
    $ Bash example
    # Install cutadapt if not already installed
    # conda install -c bioconda cutadapt=2.10
    
    # Define input and output files
    # INPUT_FASTQ is the output from a previous cutadapt round 1 (e.g., 3' adapter trimming)
    INPUT_FASTQ="round1_trimmed.fastq.gz"
    OUTPUT_FASTQ="round2_trimmed.fastq.gz"
    
    # Define parameters for cutadapt round 2 (e.g., 5' adapter trimming and quality filtering)
    # Replace "ADAPTER_5PRIME_SEQUENCE" with the actual 5' adapter sequence for your assay.
    # This example uses a common 5' adapter sequence, but it must be verified for the specific library prep.
    ADAPTER_5PRIME_SEQUENCE="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" # Example 5' adapter, replace with actual
    QUALITY_CUTOFF="20,20" # Trim low-quality bases from both ends (e.g., Phred score < 20)
    MINIMUM_LENGTH="15" # Discard reads shorter than 15 bp after trimming
    NUM_THREADS=$(nproc) # Use all available CPU cores
    
    cutadapt \
      -g "${ADAPTER_5PRIME_SEQUENCE}" \
      -q "${QUALITY_CUTOFF}" \
      --minimum-length "${MINIMUM_LENGTH}" \
      --cores "${NUM_THREADS}" \
      -o "${OUTPUT_FASTQ}" \
      "${INPUT_FASTQ}"
  5. 5

    Run to trim off the 3’ adapters on read 2, to control for double ligation events.

    cutadapt (Inferred with models/gemini-2.5-flash) v4.0 GitHub
    $ Bash example
    # Install cutadapt (if not already installed)
    # conda install -c bioconda cutadapt=4.0
    
    # Define input and output file paths
    INPUT_R1="input_R1.fastq.gz"
    INPUT_R2="input_R2.fastq.gz"
    OUTPUT_R1="trimmed_R1.fastq.gz"
    OUTPUT_R2="trimmed_R2.fastq.gz"
    
    # Define the 3' adapter sequence for Read 2.
    # This is a common Illumina TruSeq adapter used in eCLIP for Read 2.
    # This adapter is trimmed to control for double ligation events.
    ADAPTER_R2="AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC"
    
    # Run cutadapt to trim the 3' adapter from Read 2.
    # -A: Specifies the 3' adapter sequence for Read 2.
    # -o: Output file for Read 1 (untrimmed in this specific step, but paired with R2 output).
    # -p: Output file for Read 2 (trimmed).
    # --minimum-length: Discard reads shorter than this length after trimming (e.g., 18bp is common in eCLIP).
    # -j: Number of CPU threads to use for parallel processing (e.g., 8).
    cutadapt -A "${ADAPTER_R2}" \
             -o "${OUTPUT_R1}" \
             -p "${OUTPUT_R2}" \
             --minimum-length 18 \
             -j 8 \
             "${INPUT_R1}" "${INPUT_R2}"
  6. 6

    Command: cutadapt -f fastq --match-read-wildcards --times 1 -e 0.1 -O 5 --quality-cutoff 6 -m 18 -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGTAGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.metrics

    cutadapt v1.18 GitHub
    $ Bash example
    # conda install -c bioconda cutadapt
    cutadapt -f fastq --match-read-wildcards --times 1 -e 0.1 -O 5 --quality-cutoff 6 -m 18 -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGTAGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.metrics
  7. 7

    Takes output from cutadapt round 2.

    cutadapt v4.0 GitHub
    $ Bash example
    # Install cutadapt if not already installed
    # conda install -c bioconda cutadapt
    
    # Define input and output files
    # INPUT_FASTQ represents the output from a previous cutadapt round (round 1).
    INPUT_FASTQ="input_from_cutadapt_round1.fastq.gz"
    OUTPUT_FASTQ="output_cutadapt_round2.fastq.gz"
    REPORT_FILE="cutadapt_round2_report.txt"
    
    # Define adapter sequences and trimming parameters for round 2.
    # These are placeholders; actual values depend on the specific eCLIP library preparation
    # and what was trimmed in round 1. Round 2 might focus on secondary adapters,
    # more stringent quality trimming, or length filtering.
    # Example 3' adapter (e.g., Illumina universal or specific RT primer adapter).
    ADAPTER_3PRIME="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA"
    # ADAPTER_5PRIME="GTTCAGAGTTCTACAGTCCGACGATC" # Uncomment and set if a 5' adapter needs trimming
    QUALITY_CUTOFF=20 # Phred quality score cutoff
    MIN_LENGTH=18     # Minimum read length after trimming
    CORES=4           # Number of CPU cores to use for parallel processing
    
    # Execute cutadapt for round 2 trimming.
    # This command assumes single-end reads. For paired-end reads, use -A and -G for the reverse read.
    # --discard-untrimmed is often used in eCLIP to ensure reads contain the adapter, indicating successful ligation.
    cutadapt \
        -a "${ADAPTER_3PRIME}" \
        --quality-cutoff="${QUALITY_CUTOFF}" \
        --minimum-length="${MIN_LENGTH}" \
        --discard-untrimmed \
        --cores="${CORES}" \
        -o "${OUTPUT_FASTQ}" \
        "${INPUT_FASTQ}" \
        > "${REPORT_FILE}" 2>&1
    
    # Note: For paired-end reads, the command would be more complex, e.g.:
    # cutadapt \
    #     -a "${ADAPTER_3PRIME_R1}" \
    #     -A "${ADAPTER_3PRIME_R2}" \
    #     --quality-cutoff="${QUALITY_CUTOFF}" \
    #     --minimum-length="${MIN_LENGTH}" \
    #     --discard-untrimmed \
    #     --cores="${CORES}" \
    #     -o "${OUTPUT_FASTQ_R1}" \
    #     -p "${OUTPUT_FASTQ_R2}" \
    #     "${INPUT_FASTQ_R1}" \
    #     "${INPUT_FASTQ_R2}" \
    #     > "${REPORT_FILE}" 2>&1
  8. 8

    Maps to human specific version of RepBase used to remove repetitive elements, helps control for spurious artifacts from rRNA (& other) repetitive reads.

    bbduk (Inferred with models/gemini-2.5-flash) vNot specified (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install BBMap suite if not available
    # conda install -c bioconda bbmap
    
    # Placeholder for human specific RepBase repeats FASTA file.
    # This file would typically be generated by extracting human repetitive elements from RepBase
    # or by using a pre-compiled contaminant file that includes common human repeats (e.g., rRNA, tRNAs, SINEs, LINEs).
    # For example, a file like 'human_repbase_repeats.fa' would contain sequences of known human repetitive elements.
    HUMAN_REPBASE_FASTA="/path/to/human_repbase_repeats.fa"
    
    # Input FASTQ file (e.g., raw reads from eCLIP)
    INPUT_FASTQ="input_reads.fastq.gz"
    
    # Output FASTQ file containing reads with repetitive elements removed
    OUTPUT_FASTQ="filtered_non_repetitive_reads.fastq.gz"
    
    # Remove repetitive reads by mapping against the human RepBase repeats FASTA.
    # 'k=31' specifies a kmer size of 31, common for contaminant filtering.
    # 'hdist=1' allows for 1 mismatch during mapping.
    # 'stats=repbase_filter_stats.txt' will output statistics on the reads removed.
    # '-Xmx4g' allocates 4GB of memory, adjust as needed based on input file size and system resources.
    bbduk.sh in="$INPUT_FASTQ" \
             out="$OUTPUT_FASTQ" \
             ref="$HUMAN_REPBASE_FASTA" \
             k=31 hdist=1 \
             stats="repbase_filter_stats.txt" \
             -Xmx4g
  9. 9

    Command: STAR --runMode alignReads --runThreadN 16 --genomeDir /path/to/RepBase_human_database_file --genomeLoad LoadAndRemove --readFilesIn /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz --outSAMunmapped Within --outFilterMultimapNmax 30 --outFilterMultimapScoreRange 1 --outFileNamePrefix /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam --outSAMattributes All --readFilesCommand zcat --outStd BAM_Unsorted --outSAMtype BAM Unsorted --outFilterType BySJout --outReadsUnmapped Fastx --outFilterScoreMin 10 --outSAMattrRGline ID:foo --alignEndsType EndToEnd > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam

    $ Bash example
    # Install STAR (example using conda)
    # conda install -c bioconda star
    
    # Define variables for clarity
    GENOME_DIR="/path/to/RepBase_human_database_file"
    READ_FILE_1="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz"
    READ_FILE_2="/full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz"
    OUTPUT_PREFIX="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam"
    FINAL_OUTPUT_BAM="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam"
    
    # Execute STAR alignment
    STAR \
      --runMode alignReads \
      --runThreadN 16 \
      --genomeDir "${GENOME_DIR}" \
      --genomeLoad LoadAndRemove \
      --readFilesIn "${READ_FILE_1}" "${READ_FILE_2}" \
      --outSAMunmapped Within \
      --outFilterMultimapNmax 30 \
      --outFilterMultimapScoreRange 1 \
      --outFileNamePrefix "${OUTPUT_PREFIX}" \
      --outSAMattributes All \
      --readFilesCommand zcat \
      --outStd BAM_Unsorted \
      --outSAMtype BAM Unsorted \
      --outFilterType BySJout \
      --outReadsUnmapped Fastx \
      --outFilterScoreMin 10 \
      --outSAMattrRGline ID:foo \
      --alignEndsType EndToEnd \
      > "${FINAL_OUTPUT_BAM}"
  10. 10

    Takes output from STAR rmRep.

    $ Bash example
    # Install samtools if not already installed
    # conda install -c bioconda samtools=1.10
    
    # Input BAM file from STAR alignment (e.g., aligned_reads.bam)
    # This file is assumed to be coordinate-sorted.
    INPUT_BAM="aligned_reads.bam"
    OUTPUT_BAM="aligned_reads.markdup.bam"
    METRICS_FILE="markdup_metrics.txt"
    
    # Remove PCR duplicates from the aligned BAM file
    # -r: Remove duplicates (rather than just marking them)
    # -s: Output statistics to stderr (redirected to a file here)
    samtools markdup -r -s "$INPUT_BAM" "$OUTPUT_BAM" > "$METRICS_FILE"
  11. 11

    Maps unique reads to the mouse genome.

    STAR (Inferred with models/gemini-2.5-flash) v2.7.10a GitHub
    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star
    
    # Placeholder for STAR genome index directory.
    # The mouse genome (e.g., mm10/GRCm38) STAR index needs to be pre-built or downloaded.
    # Example command to build index (run once, replace paths):
    # STAR --runThreadN 8 --runMode genomeGenerate --genomeDir /path/to/STAR_index/mm10 \
    #      --genomeFastaFiles /path/to/mouse_genome.fa --sjdbGTFfile /path/to/mouse_annotations.gtf \
    #      --sjdbOverhang 100 # Adjust sjdbOverhang based on read length - 1
    
    # Align unique reads to the mouse genome
    # Input: reads.fastq.gz (replace with your actual input FASTQ file)
    # Output: aligned_Aligned.sortedByCoord.out.bam
    STAR --genomeDir /path/to/STAR_index/mm10 \
         --readFilesIn reads.fastq.gz \
         --outFileNamePrefix aligned_ \
         --outSAMtype BAM SortedByCoordinate \
         --runThreadN 8 \
         --outFilterMultimapNmax 1 \
         --outFilterMismatchNmax 10 \
         --outFilterScoreMinOverLread 0.66 \
         --outFilterMatchNminOverLread 0.66
  12. 12

    Command: STAR --runMode alignReads --runThreadN 16 --genomeDir /path/to/STAR_database_file --genomeLoad LoadAndRemove --readFilesIn /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate1 /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate2 --outSAMunmapped Within --outFilterMultimapNmax 1 --outFilterMultimapScoreRange 1 --outFileNamePrefix /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam --outSAMattributes All --outStd BAM_Unsorted --outSAMtype BAM Unsorted --outFilterType BySJout --outReadsUnmapped Fastx --outFilterScoreMin 10 --outSAMattrRGline ID:foo --alignEndsType EndToEnd > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam

    STAR vInferred with models/gemini-2.5-flash GitHub
    $ Bash example
    # Install STAR (example using conda):
    # conda install -c bioconda star
    
    # Define variables for paths
    GENOME_DIR="/path/to/STAR_index/hg38" # Placeholder for human hg38 genome directory, replace with actual path
    READ_FILE_1="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate1"
    READ_FILE_2="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate2"
    OUTPUT_PREFIX="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam"
    OUTPUT_BAM="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam"
    
    # Execute STAR alignment command
    STAR \
      --runMode alignReads \
      --runThreadN 16 \
      --genomeDir "${GENOME_DIR}" \
      --genomeLoad LoadAndRemove \
      --readFilesIn "${READ_FILE_1}" "${READ_FILE_2}" \
      --outSAMunmapped Within \
      --outFilterMultimapNmax 1 \
      --outFilterMultimapScoreRange 1 \
      --outFileNamePrefix "${OUTPUT_PREFIX}" \
      --outSAMattributes All \
      --outStd BAM_Unsorted \
      --outSAMtype BAM Unsorted \
      --outFilterType BySJout \
      --outReadsUnmapped Fastx \
      --outFilterScoreMin 10 \
      --outSAMattrRGline ID:foo \
      --alignEndsType EndToEnd > "${OUTPUT_BAM}"
  13. 13

    takes output from STAR genome mapping.

    STAR v2.7.10a (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install STAR (example using conda)
    # conda create -n star_env star=2.7.10a samtools -c bioconda -c conda-forge
    # conda activate star_env
    
    # --- Reference Data Setup (Example for hg38) ---
    # This step assumes you have already built a STAR genome index.
    # If not, you would typically run:
    # STAR --runThreadN <num_threads> --runMode genomeGenerate \
    #      --genomeDir /path/to/STAR_index_hg38 \
    #      --genomeFastaFiles /path/to/GRCh38.primary_assembly.genome.fa \
    #      --sjdbGTFfile /path/to/gencode.v38.annotation.gtf \
    #      --sjdbOverhang 100 # (or read length - 1)
    
    # --- Define variables ---
    GENOME_DIR="/path/to/STAR_index_hg38" # Placeholder for STAR genome index directory (e.g., for human hg38)
    READ1="sample_R1.fastq.gz"           # Placeholder for input FASTQ file (Read 1)
    READ2="sample_R2.fastq.gz"           # Placeholder for input FASTQ file (Read 2, remove if single-end)
    OUTPUT_PREFIX="sample_"              # Prefix for output files
    THREADS=8                            # Number of threads to use
    
    # --- Run STAR alignment ---
    STAR --genomeDir "${GENOME_DIR}" \
         --readFilesIn "${READ1}" "${READ2}" \
         --runThreadN "${THREADS}" \
         --outFileNamePrefix "${OUTPUT_PREFIX}" \
         --outSAMtype BAM SortedByCoordinate \
         --outSAMunmapped Within \
         --outSAMattributes Standard \
         --quantMode GeneCounts \
         --outFilterType BySJout \
         --outFilterMultimapNmax 20 \
         --alignSJDBoverhangMin 1 \
         --alignSJoverhangMin 8 \
         --alignIntronMin 20 \
         --alignIntronMax 1000000 \
         --alignMatesGapMax 1000000
    
    # --- Index the output BAM file ---
    samtools index "${OUTPUT_PREFIX}Aligned.sortedByCoordinate.out.bam"
  14. 14

    Custom random-mer-aware script for PCR duplicate removal.

    dedup_umi.py (Inferred with models/gemini-2.5-flash) vPart of yeolab/eclip workflow
    $ Bash example
    # This script is part of the yeolab/eclip workflow and requires Python with pysam.
    # You might need to install pysam if it's not already in your environment:
    # pip install pysam
    
    # Define paths and parameters
    # Replace with the actual path to the dedup_umi.py script from the yeolab/eclip repository
    SCRIPT_PATH="/path/to/yeolab/eclip/scripts/dedup_umi.py"
    
    INPUT_BAM="aligned_reads_with_umis.bam" # Input BAM file containing UMI-tagged reads
    OUTPUT_BAM="deduplicated_reads.bam"     # Output BAM file with PCR duplicates removed
    UMI_LENGTH=6                            # Length of the random-mer (UMI) in base pairs. Common for eCLIP.
    
    # Execute the custom random-mer-aware PCR duplicate removal script
    python "${SCRIPT_PATH}" -i "${INPUT_BAM}" -o "${OUTPUT_BAM}" -l "${UMI_LENGTH}"
  15. 15

    Command: barcode_collapse_pe.py --bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam --out_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam --metrics_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.metrics

    barcode_collapse_pe.py (Inferred with models/gemini-2.5-flash) vv1.2 (from yeolab/eclip pipeline) GitHub
    $ Bash example
    # Install Miniconda or Anaconda if not already installed
    # wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
    # bash miniconda.sh -b -p $HOME/miniconda
    # export PATH="$HOME/miniconda/bin:$PATH"
    # conda init bash
    # source ~/.bashrc
    
    # Create and activate a conda environment for eCLIP tools (requires Python 2.7 and pysam)
    # conda create -n eclip_env python=2.7 pysam=0.10.0 -y
    # conda activate eclip_env
    
    # Clone the eclip repository to get the script
    # git clone https://github.com/yeolab/eclip.git
    # cd eclip/src
    
    # Execute the barcode collapse command
    # Ensure you are in the directory containing barcode_collapse_pe.py or it's in your PATH
    python barcode_collapse_pe.py \
        --bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam \
        --out_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam \
        --metrics_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.metrics
  16. 16

    Takes output from barcode collapse PE.

    STAR (Inferred with models/gemini-2.5-flash) v2.7.0f
    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star
    
    # Define variables (replace with actual paths and filenames)
    GENOME_DIR="/path/to/STAR_index/GRCh38" # Placeholder for human GRCh38 genome index
    READ1_FASTQ="collapsed_R1.fastq.gz" # Output from barcode collapse PE (Read 1)
    READ2_FASTQ="collapsed_R2.fastq.gz" # Output from barcode collapse PE (Read 2)
    OUTPUT_PREFIX="aligned_sample_prefix_"
    THREADS=8 # Number of threads to use
    
    # Run STAR alignment
    STAR \
      --genomeDir "${GENOME_DIR}" \
      --readFilesIn "${READ1_FASTQ}" "${READ2_FASTQ}" \
      --runThreadN "${THREADS}" \
      --outFileNamePrefix "${OUTPUT_PREFIX}" \
      --outSAMtype BAM SortedByCoordinate \
      --outSAMattributes All \
      --quantMode GeneCounts \
      --outFilterMultimapNmax 20 \
      --outFilterMismatchNoverLmax 0.04 \
      --alignIntronMin 20 \
      --alignIntronMax 1000000 \
      --alignMatesGapMax 1000000 \
      --limitBAMsortRAM 30000000000 # 30GB RAM for sorting, adjust as needed
    
  17. 17

    Sorts resulting bam file for use downstream.

    samtools (Inferred with models/gemini-2.5-flash) v1.10 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install samtools if not already available
    # conda install -c bioconda samtools=1.10
    
    # Define input and output file names
    INPUT_BAM="input.bam"
    OUTPUT_SORTED_BAM="output_sorted.bam"
    
    # Sort the BAM file by coordinate
    # The -o flag specifies the output file.
    samtools sort -o "${OUTPUT_SORTED_BAM}" "${INPUT_BAM}"
  18. 18

    Command: java -Xmx2048m -XX:+UseParallelOldGC -XX:ParallelGCThreads=4 -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Djava.io.tmpdir=/full/path/to/files/.queue/tmp -cp /path/to/gatk/dist/Queue.jar net.sf.picard.sam.SortSam INPUT=/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam TMP_DIR=/full/path/to/files/.queue/tmp OUTPUT=/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam VALIDATION_STRINGENCY=SILENT SO=coordinate CREATE_INDEX=true

    Picard vNot specified GitHub
    $ Bash example
    # Picard tools are typically run via Java. You can download the latest Picard JAR from the Broad Institute GitHub releases.
    # For example, using conda:
    # conda install -c bioconda picard
    
    java -Xmx2048m -XX:+UseParallelOldGC -XX:ParallelGCThreads=4 -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Djava.io.tmpdir=/full/path/to/files/.queue/tmp -cp /path/to/gatk/dist/Queue.jar net.sf.picard.sam.SortSam INPUT=/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam TMP_DIR=/full/path/to/files/.queue/tmp OUTPUT=/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam VALIDATION_STRINGENCY=SILENT SO=coordinate CREATE_INDEX=true
  19. 19

    Takes output from sortSam, makes bam index for use downstream.

    samtools (Inferred with models/gemini-2.5-flash) v1.15.1 GitHub
    $ Bash example
    # Install samtools if not already installed
    # conda install -c bioconda samtools=1.15.1
    
    # Define input BAM file (output from sortSam)
    INPUT_BAM="sorted.bam" # Placeholder for the sorted BAM file
    
    # Create BAM index for downstream use
    samtools index "${INPUT_BAM}"
  20. 20

    Command: samtools index /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam.bai

    samtools v1.19 GitHub
    $ Bash example
    # Install samtools if not already installed
    # conda install -c bioconda samtools=1.19
    
    # Create an index for the sorted BAM file
    samtools index /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam.bai
  21. 21

    Takes inputs from multiple final bam files.

    samtools (Inferred with models/gemini-2.5-flash) v1.19.2 GitHub
    $ Bash example
    # Install samtools if not already available
    # conda install -c bioconda samtools
    
    # Example: Merge multiple final BAM files into a single BAM file.
    # This step takes multiple input BAM files (e.g., from technical replicates or different lanes)
    # and combines them into one consolidated BAM file for downstream analysis.
    # Replace input1.bam, input2.bam, etc., with your actual input BAM file paths.
    # Replace merged_output.bam with your desired output merged BAM file name.
    # -@ specifies the number of threads to use.
    samtools merge -@ 4 merged_output.bam input1.bam input2.bam input3.bam
  22. 22

    Merges the two technical replicates for further downstream analysis.

    samtools (Inferred with models/gemini-2.5-flash) v1.19 GitHub
    $ Bash example
    # Install samtools if not already installed
    # conda install -c bioconda samtools=1.19
    
    # Define input and output file paths
    INPUT_REPLICATE1_BAM="replicate1.bam"
    INPUT_REPLICATE2_BAM="replicate2.bam"
    OUTPUT_MERGED_BAM="merged_replicates.bam"
    
    # Merge the two technical replicates (BAM files)
    samtools merge "${OUTPUT_MERGED_BAM}" "${INPUT_REPLICATE1_BAM}" "${INPUT_REPLICATE2_BAM}"
  23. 23

    Command: samtools merge /full/path/to/files/CombinedID.merged.bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam /full/path/to/files/file_R1.D08.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam

    samtools vInfer from description (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install samtools (e.g., using conda)
    # conda install -c bioconda samtools
    
    # Define input and output files
    INPUT_BAM_1="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam"
    INPUT_BAM_2="/full/path/to/files/file_R1.D08.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam"
    OUTPUT_MERGED_BAM="/full/path/to/files/CombinedID.merged.bam"
    
    # Execute samtools merge command
    samtools merge "${OUTPUT_MERGED_BAM}" "${INPUT_BAM_1}" "${INPUT_BAM_2}"
  24. 24

    Takes output from sortSam, makes bam index for use downstream.

    samtools index (Inferred with models/gemini-2.5-flash) v1.19.1 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install samtools if not already installed
    # conda install -c bioconda samtools
    
    # Assuming 'sorted.bam' is the output from sortSam
    samtools index sorted.bam
  25. 25

    Command: samtools index /full/path/to/files/CombinedID.merged.bam /full/path/to/files/CombinedID.merged.bam.bai

    samtools vNot specified (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install samtools if not already installed
    # conda install -c bioconda samtools
    
    samtools index /full/path/to/files/CombinedID.merged.bam /full/path/to/files/CombinedID.merged.bam.bai
  26. 26

    Takes output from sortSam.

    samtools (Inferred with models/gemini-2.5-flash) v1.19.1 GitHub
    $ Bash example
    # Install samtools if not already installed
    # conda install -c bioconda samtools=1.19.1
    
    # This step takes a sorted BAM file (output from sortSam) and creates an index (.bai) file.
    # The index file is crucial for efficient random access to reads within the BAM file,
    # enabling many downstream tools to function correctly and quickly.
    # Replace 'sorted_input.bam' with the actual path to your sorted BAM file.
    samtools index sorted_input.bam
  27. 27

    Only outputs the second read in each pair for use with single stranded peak caller.

    reformat.sh (BBMap) (Inferred with models/gemini-2.5-flash) v38.90 GitHub
    $ Bash example
    # Install BBMap (part of BBTools)
    # conda install -c bioconda bbmap
    
    # This command takes paired-end FASTQ files (input_R1.fastq.gz and input_R2.fastq.gz)
    # and outputs only the second read (R2) to a new file (output_R2_only.fastq.gz).
    # The first read (R1) is discarded by setting out1=null.
    reformat.sh in1=input_R1.fastq.gz in2=input_R2.fastq.gz out1=null out2=output_R2_only.fastq.gz
  28. 28

    This is the final bam file to perform analysis on.

    samtools (Inferred with models/gemini-2.5-flash) v1.19 GitHub
    $ Bash example
    # Install samtools if not already available
    # conda install -c bioconda samtools
    
    # Assume 'input.bam' is an aligned BAM file that needs to be finalized.
    # Sort the BAM file by coordinate, which is often a prerequisite for downstream analysis.
    samtools sort -o final.bam input.bam
    
    # Index the sorted BAM file, which is necessary for quick access and visualization.
    samtools index final.bam
  29. 29

    Command: samtools view -hb -f 128 /full/path/to/files/CombinedID.merged.bam > /full/path/to/files/CombinedID.merged.r2.bam

    samtools v1.9 GitHub
    $ Bash example
    # Install samtools if not already available
    # conda install -c bioconda samtools=1.9
    
    # Define input and output file paths
    INPUT_BAM="/full/path/to/files/CombinedID.merged.bam"
    OUTPUT_BAM="/full/path/to/files/CombinedID.merged.r2.bam"
    
    # Extract reads that are the second in a pair (flag 128)
    # -h: Include header in the output
    # -b: Output in BAM format
    # -f 128: Only output reads with flag 128 set (second in pair)
    samtools view -hb -f 128 "${INPUT_BAM}" > "${OUTPUT_BAM}"
  30. 30

    Takes results from samtools view.

    samtools v1.9 GitHub
    $ Bash example
    # Install samtools (if not already installed)
    # conda install -c bioconda samtools=1.9
    
    # Convert SAM (Sequence Alignment/Map) format to BAM (Binary Alignment/Map) format.
    # This is a common initial step after alignment to reduce file size and enable faster processing.
    # Input: aligned_reads.sam (e.g., output from an aligner like STAR or HISAT2)
    # Output: aligned_reads.bam
    # Parameters:
    #   -b: Output in BAM format.
    #   -S: Input is in SAM format (optional, samtools often infers this).
    samtools view -bS aligned_reads.sam > aligned_reads.bam
  31. 31

    Calls peaks on those files.

    clipper (Inferred with models/gemini-2.5-flash) vNot specified GitHub
    $ Bash example
    # Clone the clipper repository if not already available
    # git clone https://github.com/yeolab/clipper.git
    # cd clipper
    
    # Ensure Python and required libraries (e.g., pysam) are installed
    # conda install -c bioconda pysam
    
    # Define input files and genome
    # Replace with actual paths to your IP and control BAM files
    IP_BAM="path/to/your/ip.bam"
    CONTROL_BAM="path/to/your/control.bam"
    GENOME_SIZE="hg38" # Using hg38 as the latest assembly placeholder for human
    OUTPUT_PREFIX="eclip_peaks"
    
    # Execute clipper to call peaks
    python clipper.py -b "${IP_BAM}" -c "${CONTROL_BAM}" -s "${GENOME_SIZE}" -o "${OUTPUT_PREFIX}"
  32. 32

    Command: clipper -b /full/path/to/files/CombinedID.merged.r2.bam -s mm9 -o /full/path/to/files/CombinedID.merged.r2.peaks.bed --bonferroni --superlocal --threshold-method binomial --save-pickle

    CLIPper v0.0.1 GitHub
    $ Bash example
    # Install CLIPper
    # conda install -c bioconda clipper
    
    # Execute CLIPper
    clipper -b /full/path/to/files/CombinedID.merged.r2.bam -s mm9 -o /full/path/to/files/CombinedID.merged.r2.peaks.bed --bonferroni --superlocal --threshold-method binomial --save-pickle

Tools Used

Raw Source Text
Takes output from raw files.  Run to trim off both 5’ and 3’ adapters on both reads. Command: quality-cutoff 6  -m 18  -a NNNNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC  -g CTTCCGATCTACAAGTT -g CTTCCGATCTTGGTCCT  -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGT  AGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT  -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz  -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz  /full/path/to/files/file_R1.C01.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.metrics
Takes output from cutadapt round 1. Run to trim off the 3’ adapters on read 2, to control for double ligation events. Command: cutadapt -f fastq --match-read-wildcards  --times 1  -e 0.1  -O 5  --quality-cutoff 6  -m 18  -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGTAGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT  -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz  -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz  /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz  /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.metrics
Takes output from cutadapt round 2.  Maps to human specific version of RepBase used to remove repetitive elements, helps control for spurious artifacts from rRNA (& other) repetitive reads.  Command: STAR  --runMode alignReads  --runThreadN 16  --genomeDir /path/to/RepBase_human_database_file --genomeLoad LoadAndRemove  --readFilesIn /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz --outSAMunmapped Within  --outFilterMultimapNmax 30  --outFilterMultimapScoreRange 1  --outFileNamePrefix /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam --outSAMattributes All  --readFilesCommand zcat  --outStd BAM_Unsorted  --outSAMtype BAM Unsorted  --outFilterType BySJout  --outReadsUnmapped Fastx  --outFilterScoreMin 10  --outSAMattrRGline ID:foo  --alignEndsType EndToEnd > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam
Takes output from STAR rmRep.  Maps unique reads to the mouse genome.  Command: STAR  --runMode alignReads  --runThreadN 16  --genomeDir  /path/to/STAR_database_file --genomeLoad LoadAndRemove  --readFilesIn  /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate1  /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate2  --outSAMunmapped Within  --outFilterMultimapNmax 1  --outFilterMultimapScoreRange 1  --outFileNamePrefix /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam  --outSAMattributes All  --outStd BAM_Unsorted  --outSAMtype BAM Unsorted  --outFilterType BySJout  --outReadsUnmapped Fastx  --outFilterScoreMin 10  --outSAMattrRGline ID:foo  --alignEndsType EndToEnd > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam
takes output from STAR genome mapping.  Custom random-mer-aware script for PCR duplicate removal. Command: barcode_collapse_pe.py  --bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam  --out_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam  --metrics_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.metrics
Takes output from barcode collapse PE.  Sorts resulting bam file for use downstream.  Command: java  -Xmx2048m  -XX:+UseParallelOldGC  -XX:ParallelGCThreads=4  -XX:GCTimeLimit=50  -XX:GCHeapFreeLimit=10  -Djava.io.tmpdir=/full/path/to/files/.queue/tmp  -cp /path/to/gatk/dist/Queue.jar  net.sf.picard.sam.SortSam  INPUT=/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam  TMP_DIR=/full/path/to/files/.queue/tmp  OUTPUT=/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam  VALIDATION_STRINGENCY=SILENT  SO=coordinate  CREATE_INDEX=true
Takes output from sortSam, makes bam index for use downstream.  Command: samtools index  /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam  /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam.bai
Takes inputs from multiple final bam files.  Merges the two technical replicates for further downstream analysis.  Command: samtools  merge  /full/path/to/files/CombinedID.merged.bam  /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam /full/path/to/files/file_R1.D08.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam
Takes output from sortSam, makes bam index for use downstream.  Command: samtools  index  /full/path/to/files/CombinedID.merged.bam  /full/path/to/files/CombinedID.merged.bam.bai
Takes output from sortSam.  Only outputs the second read in each pair for use with single stranded peak caller.  This is the final bam file to perform analysis on.  Command: samtools view -hb -f 128  /full/path/to/files/CombinedID.merged.bam  >  /full/path/to/files/CombinedID.merged.r2.bam
Takes results from samtools view.  Calls peaks on those files.  Command: clipper  -b /full/path/to/files/CombinedID.merged.r2.bam  -s mm9  -o /full/path/to/files/CombinedID.merged.r2.peaks.bed  --bonferroni  --superlocal  --threshold-method binomial  --save-pickle
Genome_build: mm9
Supplementary_files_format_and_content: bed format, contains clusters of predicted RBP binding
← Back to Analysis