GSE153279 Processing Pipeline

OTHER code_examples 13 steps

Publication

Identification of the global miR-130a targetome reveals a role for TBL1XR1 in hematopoietic stem cell self-renewal and t(8;21) AML.

Cell reports (2022) — PMID 35263585

Dataset

GSE153279

Definition of a Small Core Transcriptional Circuit Regulated by AML1-ETO [ChIP-seq, Cut&Run]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Library strategy: Cut&Run

    $ Bash example
    # Install Cromwell (Java-based workflow engine) if not already installed.
    # Download the latest stable release from https://github.com/broadinstitute/cromwell/releases
    # For example:
    # wget https://github.com/broadinstitute/cromwell/releases/download/86/cromwell-86.jar
    
    # Clone the ENCODE ChIP-seq pipeline repository if not already cloned.
    # git clone https://github.com/ENCODE-DCC/chip-seq-pipeline2.git
    # cd chip-seq-pipeline2
    
    # Create a placeholder input JSON file for Cut&Run analysis.
    # Replace 'your_sample_R1.fastq.gz', 'your_sample_R2.fastq.gz' with actual paths to your FASTQ files.
    # Replace 'your_output_directory' with your desired output location.
    # The reference genome (hg38) and associated files are placeholders using ENCODE's GCS paths.
    # For actual analysis, ensure these files are accessible or provide local paths.
    cat << EOF > inputs.json
    {
      "chip.fastq_replicates": [
        [
          "your_sample_R1.fastq.gz",
          "your_sample_R2.fastq.gz"
        ]
      ],
      "chip.genome_tsv": "gs://encode-pipeline-genome-data/vERCC/GRCh38_vERCC/GRCh38_vERCC.tsv",
      "chip.chrom_sizes": "gs://encode-pipeline-genome-data/vERCC/GRCh38_vERCC/GRCh38_vERCC.chrom.sizes",
      "chip.blacklist": "gs://encode-pipeline-genome-data/vERCC/GRCh38_vERCC/GRCh38_vERCC.blacklist.bed",
      "chip.paired_end": true,
      "chip.pipeline_type": "cutnrun",
      "chip.aligner": "bwa",
      "chip.peak_caller": "macs2",
      "chip.output_directory": "your_output_directory"
    }
    EOF
    
    # Execute the ENCODE ChIP-seq pipeline for Cut&Run using Cromwell.
    # Ensure you have the 'chip_seq.wdl' file from the cloned repository in the current directory
    # and 'cromwell-X.Y.Z.jar' (e.g., cromwell-86.jar) is accessible.
    java -jar cromwell-86.jar run chip_seq.wdl -i inputs.json
  2. 2

    For Cut&Run:

    $ Bash example
    # This pipeline is designed for ChIP-seq but is adaptable for Cut&Run data analysis.
    
    # Install Cromwell (Java-based workflow management system) if not already installed.
    # Ensure you have Java 8 or higher installed.
    # For example, download the latest stable release from Cromwell's GitHub releases page:
    # wget https://github.com/broadinstitute/cromwell/releases/download/X.Y.Z/cromwell-X.Y.Z.jar
    # mv cromwell-X.Y.Z.jar /usr/local/bin/cromwell.jar
    
    # Clone the ENCODE ChIP-seq pipeline repository
    git clone https://github.com/ENCODE-DCC/chip-seq-pipeline2.git
    cd chip-seq-pipeline2
    
    # (Optional) Checkout a specific version of the pipeline, e.g., v2.0.0
    # git checkout v2.0.0
    
    # --- Prepare input.json for Cut&Run data --- 
    # The ENCODE ChIP-seq pipeline uses WDL (Workflow Description Language) and requires an input JSON file.
    # Replace placeholders with actual paths to your data and reference files.
    # For human (hg38) reference, you would need a genome_tsv file containing paths to:
    #   - Reference FASTA
    #   - Chromosome sizes file
    #   - Blacklist BED file
    #   - Bowtie2 index
    #   - BWA index
    #   - MACS2 genome size parameter (e.g., "hs" for human)
    # Example hg38.tsv can be found in the pipeline's 'genome_data' directory or generated using provided scripts.
    
    cat << EOF > cutnrun_input.json
    {
      "chip.fastqs_rep1_R1": ["/path/to/your/cutnrun_experiment_rep1_R1.fastq.gz"],
      "chip.fastqs_rep1_R2": ["/path/to/your/cutnrun_experiment_rep1_R2.fastq.gz"],
      "chip.fastqs_ctl_rep1_R1": ["/path/to/your/cutnrun_control_rep1_R1.fastq.gz"],
      "chip.fastqs_ctl_rep1_R2": ["/path/to/your/cutnrun_control_rep1_R2.fastq.gz"],
      "chip.paired_end": true,
      "chip.genome_tsv": "/path/to/your/genome_data/hg38.tsv",
      "chip.output_prefix": "cutnrun_hg38_experiment",
      "chip.aligner": "bowtie2",
      "chip.peak_caller": "macs2",
      "chip.macs2_gsize": "hs",
      "chip.blacklist_bed": "/path/to/your/blacklist/hg38-blacklist.bed",
      "chip.enable_idr": false, # Set to true if you have multiple replicates for IDR analysis
      "chip.fraglen_range": "0 1000" # Typical fragment size range for Cut&Run
    }
    EOF
    
    # --- Execute the pipeline using Cromwell --- 
    # Replace '/path/to/cromwell.jar' with the actual path to your Cromwell JAR file.
    # You might need to configure Cromwell for a specific backend (e.g., local, SGE, SLURM) 
    # via a Cromwell configuration file (e.g., cromwell.conf).
    java -jar /path/to/cromwell.jar run chip.wdl -i cutnrun_input.json
  3. 3

    Adaptor trimming with Trimmomatic-0.32.

    Trimmomatic v0.32 GitHub
    $ Bash example
    bash
    # Install Trimmomatic if not already installed
    # conda install -c bioconda trimmomatic=0.32
    
    # Define input and output file names
    INPUT_R1="input_R1.fastq.gz"
    INPUT_R2="input_R2.fastq.gz"
    OUTPUT_R1_PAIRED="output_R1_paired.fastq.gz"
    OUTPUT_R1_UNPAIRED="output_R1_unpaired.fastq.gz"
    OUTPUT_R2_PAIRED="output_R2_paired.fastq.gz"
    OUTPUT_R2_UNPAIRED="output_R2_unpaired.fastq.gz"
    
    # Define adaptor file path (adjust path as necessary for your Trimmomatic installation)
    # A common adaptor file for Illumina paired-end data is TruSeq3-PE.fa, often found in the 'adapters' directory of Trimmomatic.
    # For Trimmomatic 0.32, this file would typically be located within the Trimmomatic installation directory.
    ADAPTOR_FILE="/path/to/Trimmomatic-0.32/adapters/TruSeq3-PE.fa"
    
    # Run Trimmomatic for paired-end adaptor trimming
    # Adjust '/path/to/Trimmomatic-0.32/trimmomatic-0.32.jar' to the actual location of your Trimmomatic JAR file.
    java -jar /path/to/Trimmomatic-0.32/trimmomatic-0.32.jar PE \
        -phred33 \
        "${INPUT_R1}" "${INPUT_R2}" \
        "${OUTPUT_R1_PAIRED}" "${OUTPUT_R1_UNPAIRED}" \
        "${OUTPUT_R2_PAIRED}" "${OUTPUT_R2_UNPAIRED}" \
        ILLUMINACLIP:"${ADAPTOR_FILE}":2:30:10 \
        LEADING:3 \
        TRAILING:3 \
        SLIDINGWINDOW:4:15 \
        MINLEN:36
    
  4. 4

    Alignment to hg19+scer genome with bowtie2.

    $ Bash example
    # Install Bowtie2 (if not already installed)
    # conda install -c bioconda bowtie2=2.3.4.3
    
    # Define variables
    INPUT_FASTQ="input.fastq.gz" # Placeholder for your input FASTQ file
    OUTPUT_SAM="output.sam"
    OUTPUT_BAM="output.bam"
    OUTPUT_SORTED_BAM="output.sorted.bam"
    
    # Reference genome index: hg19+scer (Saccharomyces cerevisiae) combined index
    # This is a custom index that needs to be pre-built using bowtie2-build from a combined FASTA file
    # containing both human hg19 and S. cerevisiae reference sequences.
    # Example path for the index prefix (replace with your actual path):
    GENOME_INDEX_PREFIX="/path/to/genome_indices/hg19_scer"
    
    # Perform alignment using Bowtie2
    # --local: local alignment mode (soft-clipping allowed, common for eCLIP)
    # -p 8: use 8 threads for parallel processing
    # -x: specify the index prefix for the reference genome
    # -U: specify single-end FASTQ input file
    # -S: specify SAM output file
    bowtie2 --local -p 8 -x "${GENOME_INDEX_PREFIX}" -U "${INPUT_FASTQ}" -S "${OUTPUT_SAM}"
    
    # Convert SAM to BAM, sort, and index (common post-alignment steps in eCLIP pipelines)
    # Install Samtools (if not already installed)
    # conda install -c bioconda samtools
    
    samtools view -bS "${OUTPUT_SAM}" > "${OUTPUT_BAM}"
    samtools sort "${OUTPUT_BAM}" -o "${OUTPUT_SORTED_BAM}"
    samtools index "${OUTPUT_SORTED_BAM}"
  5. 5

    High quality reads (F4q10) were filtered using samtools.

    samtools v1.10 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install samtools if not already installed
    # conda install -c bioconda samtools=1.10
    
    # Filter high quality reads (F4q10)
    # -b: output BAM format
    # -F 4: exclude reads where the FLAG bit 4 is set (unmapped reads)
    # -q 10: include reads with mapping quality >= 10
    samtools view -b -F 4 -q 10 -o filtered.bam input.bam
  6. 6

    Peaks called with MACS2 (narrowPeak, q=0.001).

    MACS2 v2.2.7.1 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install MACS2 (if not already installed)
    # conda install -c bioconda macs2=2.2.7.1
    
    # Define input files and parameters
    TREATMENT_BAM="treatment.bam" # Replace with your actual treatment BAM file
    CONTROL_BAM="control.bam"     # Replace with your actual control BAM file (optional but recommended for ChIP-seq)
    GENOME_SIZE="hs"              # Genome size (e.g., hs for human, mm for mouse, dm for drosophila, ce for C. elegans). Adjust as needed.
    Q_VALUE="0.001"
    OUTPUT_PREFIX="my_experiment_peaks"
    OUTPUT_DIR="macs2_output"
    INPUT_FORMAT="BAMPE"          # Input format: BAM, BED, BAMPE (for paired-end BAM). Adjust based on your data.
    
    mkdir -p "${OUTPUT_DIR}"
    
    macs2 callpeak \
        -t "${TREATMENT_BAM}" \
        -c "${CONTROL_BAM}" \
        -f "${INPUT_FORMAT}" \
        -g "${GENOME_SIZE}" \
        -q "${Q_VALUE}" \
        --outdir "${OUTPUT_DIR}" \
        -n "${OUTPUT_PREFIX}" \
        --keep-dup all \
        --call-summits # Ensures narrowPeak output and identifies peak summits
  7. 7

    For HA Cut&Run +/-dTAG, differential peaks were identified by Diffbind/DESeq2 using yeast spike-in for normalization.

    $ Bash example
    # Install R and DESeq2 (if not already installed)
    # sudo apt-get update
    # sudo apt-get install r-base
    # R -e "install.packages('BiocManager', repos='https://cloud.r-project.org')"
    # R -e "BiocManager::install('DESeq2')"
    
    # --- Placeholder for input files ---
    # These files would typically be generated by upstream steps (e.g., DiffBind for peak counts).
    # primary_counts.tsv: Tab-separated file with peak IDs as rows and sample names as columns, containing raw counts for the primary organism (e.g., human hg38).
    # yeast_counts.tsv: Tab-separated file with yeast feature IDs as rows and sample names as columns, containing raw counts for the yeast spike-in (e.g., sacCer3).
    # sample_metadata.tsv: Tab-separated file with sample names, condition, and any other relevant metadata.
    # Example content for sample_metadata.tsv:
    # sample_id\tcondition
    # sample1\tHA_CutRun_dTAG_pos
    # sample2\tHA_CutRun_dTAG_pos
    # sample3\tHA_CutRun_dTAG_neg
    # sample4\tHA_CutRun_dTAG_neg
    
    # Create an R script for DESeq2 analysis with spike-in normalization
    cat << 'EOF' > run_deseq2_spikein.R
    library(DESeq2)
    
    # Load primary organism counts (e.g., from DiffBind output)
    primary_counts <- read.csv("primary_counts.tsv", sep="\t", row.names=1)
    primary_counts <- as.matrix(primary_counts)
    
    # Load yeast spike-in counts
    yeast_counts <- read.csv("yeast_counts.tsv", sep="\t", row.names=1)
    yeast_counts <- as.matrix(yeast_counts)
    
    # Load sample metadata
    sample_metadata <- read.csv("sample_metadata.tsv", sep="\t", row.names=1)
    
    # Ensure sample order matches between counts and metadata
    sample_metadata <- sample_metadata[colnames(primary_counts), , drop=FALSE]
    yeast_counts <- yeast_counts[, colnames(primary_counts)] # Ensure yeast counts match primary samples
    
    # Calculate spike-in normalization factors
    # Sum counts for each sample from the yeast spike-in
    spikein_sums <- colSums(yeast_counts)
    # Calculate size factors based on spike-in sums
    # A common approach is to use the median ratio method on spike-in counts
    # Or simply scale by total spike-in counts
    spikein_sf <- spikein_sums / exp(mean(log(spikein_sums)))
    
    # Create DESeqDataSet object
    dds <- DESeqDataSetFromMatrix(countData = primary_counts,
                                  colData = sample_metadata,
                                  design = ~ condition)
    
    # Assign custom size factors based on spike-in
    sizeFactors(dds) <- spikein_sf
    
    # Run DESeq2 analysis
    dds <- DESeq(dds)
    
    # Get results for the comparison of interest (e.g., HA_CutRun_dTAG_pos vs HA_CutRun_dTAG_neg)
    res <- results(dds, contrast=c("condition", "HA_CutRun_dTAG_pos", "HA_CutRun_dTAG_neg"))
    
    # Order results by adjusted p-value
    res <- res[order(res$padj),]
    
    # Save results
    write.csv(as.data.frame(res), "differential_peaks_deseq2.csv")
    
    # Optional: Save normalized counts
    norm_counts <- counts(dds, normalized=TRUE)
    write.csv(as.data.frame(norm_counts), "normalized_peak_counts_deseq2.csv")
    
    # Session info for reproducibility
    sessionInfo()
    EOF
    
    # Execute the R script
    Rscript run_deseq2_spikein.R
  8. 8

    bigWig files were generated using deepTools and bedGraph files were generated using Homer.

    HOMER vv4.11.1 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install HOMER (if not already installed)
    # git clone https://github.com/homer-tools/homer.git
    # cd homer
    # perl configureHomer.pl -install
    
    # Define input and output paths
    INPUT_BAM="input.bam" # Replace with your actual input BAM file
    OUTPUT_TAG_DIR="homer_tags"
    OUTPUT_BEDGRAPH="output.bedGraph"
    FRAGMENT_SIZE="150" # Adjust fragment size as appropriate for your assay (e.g., 150 for ChIP-seq, or estimated from cross-correlation analysis)
    
    # 1. Create a HOMER tag directory from the BAM file
    # This step processes the BAM file to create a directory of tag files, which HOMER uses for downstream analysis.
    makeTagDirectory "${OUTPUT_TAG_DIR}" "${INPUT_BAM}"
    
    # 2. Generate bedGraph file from the tag directory
    # -o: Specify the output bedGraph file name
    # -fsize: Fragment size (e.g., 150bp for ChIP-seq). This parameter extends reads to the estimated fragment length.
    # -norm 1e7: Normalize read counts to 10 million total reads (a common practice for comparing samples).
    # -res 1: Set the resolution to 1bp, meaning each base pair will have a coverage value.
    # -tbp 1: Output tags per base pair.
    # -bedGraph: Specify that the output format should be bedGraph.
    makeUCSCfile "${OUTPUT_TAG_DIR}" -o "${OUTPUT_BEDGRAPH}" -fsize "${FRAGMENT_SIZE}" -norm 1e7 -res 1 -tbp 1 -bedGraph
  9. 9

    For ChIP-seq:

    $ Bash example
    # This script demonstrates how to run the ENCODE ChIP-seq pipeline locally using Cromwell.
    # Ensure Java (JRE 8 or higher) and Cromwell are installed and accessible.
    # Download Cromwell JAR from https://github.com/broadinstitute/cromwell/releases
    # Example: wget https://github.com/broadinstitute/cromwell/releases/download/85/cromwell-85.jar
    
    # 1. Clone the ENCODE ChIP-seq pipeline repository
    # git clone https://github.com/ENCODE-DCC/chip-seq-pipeline2.git
    # cd chip-seq-pipeline2
    # PIPELINE_DIR=$(pwd)
    
    # 2. Prepare input files and reference genome data
    # Replace these with your actual data paths and chosen reference genome (e.g., hg38, mm10).
    # For reference data, you can download pre-built ENCODE reference bundles or create your own.
    # As a placeholder, we use hg38. You would need to download the actual files.
    # Example for hg38 reference files (these are placeholders, actual download needed):
    # REF_GENOME_DIR="/path/to/your/reference_genome/hg38"
    # mkdir -p $REF_GENOME_DIR
    # Download hg38.tsv, hg38.blacklist.bed.gz, hg38.chrom.sizes, hg38.fa, bowtie2_indexes/hg38.*, bwa_indexes/hg38.* 
    # from ENCODE DCC or similar sources into $REF_GENOME_DIR.
    
    # Example input FASTQ files (replace with your actual data)
    # FASTQ_REP1_R1="/path/to/your/data/sample_rep1_R1.fastq.gz"
    # FASTQ_REP1_R2="/path/to/your/data/sample_rep1_R2.fastq.gz" # Use an empty array [] if single-end
    # FASTQ_CTL_REP1_R1="/path/to/your/data/control_rep1_R1.fastq.gz"
    # FASTQ_CTL_REP1_R2="/path/to/your/data/control_rep1_R2.fastq.gz" # Use an empty array [] if single-end
    
    # 3. Create an input JSON file for the WDL workflow
    # This is a minimal example. Adjust parameters based on your experiment (e.g., paired_end, pipeline_type, peak_caller).
    # Refer to the pipeline's documentation for all available parameters:
    # https://github.com/ENCODE-DCC/chip-seq-pipeline2/blob/master/chip.wdl
    # https://github.com/ENCODE-DCC/chip-seq-pipeline2/blob/master/docs/inputs.md
    cat << EOF > chip_inputs.json
    {
      "chip.genome_tsv": "${REF_GENOME_DIR}/hg38.tsv",
      "chip.fastqs_rep1_R1": ["${FASTQ_REP1_R1}"],
      "chip.fastqs_rep1_R2": ["${FASTQ_REP1_R2}"],
      "chip.fastqs_ctl_rep1_R1": ["${FASTQ_CTL_REP1_R1}"],
      "chip.fastqs_ctl_rep1_R2": ["${FASTQ_CTL_REP1_R2}"],
      "chip.paired_end": true,
      "chip.pipeline_type": "histone", # or "tf" for transcription factor
      "chip.peak_caller": "macs2", # or "spp"
      "chip.blacklist_bed": "${REF_GENOME_DIR}/hg38.blacklist.bed.gz",
      "chip.chrsz": "${REF_GENOME_DIR}/hg38.chrom.sizes",
      "chip.ref_fasta": "${REF_GENOME_DIR}/hg38.fa",
      "chip.bowtie2_idx": "${REF_GENOME_DIR}/bowtie2_indexes/hg38",
      "chip.bwa_idx": "${REF_GENOME_DIR}/bwa_indexes/hg38",
      "chip.output_prefix": "my_chipseq_experiment"
    }
    EOF
    
    # 4. Run the ChIP-seq pipeline using Cromwell
    # Replace /path/to/cromwell-*.jar with the actual path to your Cromwell JAR file.
    # Replace /path/to/chip-seq-pipeline2/chip.wdl with the actual path to the WDL file.
    java -jar /path/to/cromwell-*.jar run /path/to/chip-seq-pipeline2/chip.wdl -i chip_inputs.json
  10. 10

    Adaptor trimming with Trimmomatic-0.32.

    Trimmomatic v0.32
    $ Bash example
    # Install Trimmomatic (if not already installed)
    # For example, using conda:
    # conda install -c bioconda trimmomatic
    
    # Define input and output file names (placeholders)
    INPUT_R1="sample_R1.fastq.gz"
    INPUT_R2="sample_R2.fastq.gz"
    OUTPUT_R1_PAIRED="sample_R1_trimmed_paired.fastq.gz"
    OUTPUT_R1_UNPAIRED="sample_R1_trimmed_unpaired.fastq.gz"
    OUTPUT_R2_PAIRED="sample_R2_trimmed_paired.fastq.gz"
    OUTPUT_R2_UNPAIRED="sample_R2_trimmed_unpaired.fastq.gz"
    
    # Define the path to Trimmomatic JAR and adaptor files
    # If installed via conda, you might find them like this:
    # TRIMMOMATIC_JAR=$(find "$(conda env list | grep '*' | awk '{print $NF}')" -name "trimmomatic-*.jar" | head -n 1)
    # ADAPTOR_FILE="$(dirname "${TRIMMOMATIC_JAR}")/adapters/TruSeq3-PE.fa"
    # Or, manually specify if not using conda or if paths differ:
    TRIMMOMATIC_JAR="/path/to/trimmomatic-0.32.jar" # Adjust this path to your Trimmomatic JAR file
    ADAPTOR_FILE="/path/to/adapters/TruSeq3-PE.fa" # Adjust this path to the appropriate adaptor file (e.g., TruSeq3-PE.fa, NexteraPE-PE.fa, etc.)
    
    # Execute Trimmomatic for paired-end adaptor trimming and quality filtering
    # Parameters:
    # PE: Paired-end mode
    # -phred33: Phred score encoding (common for Illumina data)
    # ILLUMINACLIP: Adaptor trimming (adaptor_file:seed_mismatches:palindrome_clip_threshold:simple_clip_threshold)
    #   - 2: Maximum mismatch count that will be tolerated when aligning the adapter to a read.
    #   - 30: Specifies the 'palindrome clip threshold'. A score of 30 is a good default.
    #   - 10: Specifies the 'simple clip threshold'. A score of 10 is a good default.
    # LEADING: Remove leading low quality bases (below quality 3)
    # TRAILING: Remove trailing low quality bases (below quality 3)
    # SLIDINGWINDOW: Scan with a 4-base window, cut if average quality below 15
    # MINLEN: Drop reads shorter than 36 bases
    java -jar "${TRIMMOMATIC_JAR}" PE \
        -phred33 \
        "${INPUT_R1}" "${INPUT_R2}" \
        "${OUTPUT_R1_PAIRED}" "${OUTPUT_R1_UNPAIRED}" \
        "${OUTPUT_R2_PAIRED}" "${OUTPUT_R2_UNPAIRED}" \
        ILLUMINACLIP:"${ADAPTOR_FILE}":2:30:10 \
        LEADING:3 \
        TRAILING:3 \
        SLIDINGWINDOW:4:15 \
        MINLEN:36
  11. 11

    Alignment to hg19+dm3 genome with bowtie2.

    $ Bash example
    # Install Bowtie2 and Samtools if not already installed
    # conda install -c bioconda bowtie2 samtools
    
    # Define variables (replace with actual paths and filenames)
    READ1="input_R1.fastq.gz"
    READ2="input_R2.fastq.gz"
    OUTPUT_BAM="aligned_to_hg19_dm3.bam"
    BOWTIE2_INDEX_PREFIX="/path/to/your/hg19_dm3_index/hg19_dm3" # This prefix points to the Bowtie2 index files (e.g., hg19_dm3.1.bt2, hg19_dm3.2.bt2, etc.)
    NUM_THREADS=8 # Adjust based on available CPU cores
    
    # Perform alignment with Bowtie2 using parameters common for eCLIP (very-sensitive-local)
    # and pipe the SAM output to Samtools for conversion to BAM format.
    bowtie2 \
      --very-sensitive-local \
      -p ${NUM_THREADS} \
      -x ${BOWTIE2_INDEX_PREFIX} \
      -1 ${READ1} \
      -2 ${READ2} \
      | samtools view -bS - \
      > ${OUTPUT_BAM}
    
    # Optional: Sort and index the resulting BAM file
    # samtools sort -o ${OUTPUT_BAM%.bam}.sorted.bam ${OUTPUT_BAM}
    # samtools index ${OUTPUT_BAM%.bam}.sorted.bam
  12. 12

    High quality reads (F4q10) were filtered using samtools.

    samtools v1.19 GitHub
    $ Bash example
    # Install samtools (if not already installed)
    # conda install -c bioconda samtools
    
    # Filter high quality reads (F4q10) using samtools view
    # -F 4: Exclude reads where the 0x4 flag is set (i.e., exclude unmapped reads)
    # -q 10: Include only reads with a mapping quality of 10 or higher
    # -b: Output in BAM format
    samtools view -F 4 -q 10 -b input.bam > filtered.bam
  13. 13

    Normalized bigWig files were generated with deepTools.

    deepTools v3.5.1 (Inferred with models/gemini-2.5-flash)
    $ Bash example
    # Install deepTools (if not already installed)
    # conda create -n deeptools_env deeptools=3.5.1
    # conda activate deeptools_env
    
    # Placeholder variables
    INPUT_BAM="sample.bam"
    OUTPUT_BIGWIG="sample_normalized.bw"
    BIN_SIZE=10 # Bin size in base pairs
    NUMBER_OF_PROCESSORS="auto" # Use all available processors
    
    # Generate normalized bigWig file using CPM (Counts Per Million mapped reads) normalization
    bamCoverage \
        --bam "${INPUT_BAM}" \
        --outFileName "${OUTPUT_BIGWIG}" \
        --normalizeUsing CPM \
        --binSize "${BIN_SIZE}" \
        --numberOfProcessors "${NUMBER_OF_PROCESSORS}" \
        --verbose

Tools Used

Raw Source Text
Library strategy: Cut&Run
For Cut&Run:
Adaptor trimming with Trimmomatic-0.32.
Alignment to hg19+scer genome with bowtie2.
High quality reads (F4q10) were filtered using samtools.
Peaks called with MACS2 (narrowPeak, q=0.001).
For HA Cut&Run +/-dTAG, differential peaks were identified by Diffbind/DESeq2 using yeast spike-in for normalization.
bigWig files were generated using deepTools and bedGraph files were generated using Homer.
For ChIP-seq:
Adaptor trimming with Trimmomatic-0.32.
Alignment to hg19+dm3 genome with bowtie2.
High quality reads (F4q10) were filtered using samtools.
Normalized bigWig files were generated with deepTools.
Genome_build: hg19 (GRCh37)
Supplementary_files_format_and_content: bigWig, bedGraph files
Supplementary_files_format_and_content: 042420_HA_AE_Diffbind-peakinfo_q0.001.csv: Diffbind peak info
Supplementary_files_format_and_content: 042420_HA-AE_cutandrun_diffbind-deseq2_q0.001.csv: Differential peak DESeq2
Supplementary_files_format_and_content: New_ZnF_q0.001_peaks.txt: MACS2 peaks
Supplementary_files_format_and_content: CD34_HA_merged_noNeg_nomodel_extsize300_q0.001_peaks.txt: MACS2 peaks
← Back to Analysis