GSE153279 Processing Pipeline

OTHER code_examples 13 steps

Publication

Identification of the global miR-130a targetome reveals a role for TBL1XR1 in hematopoietic stem cell self-renewal and t(8;21) AML.

Cell reports (2022) — PMID 35263585

Dataset

GSE153279

Definition of a Small Core Transcriptional Circuit Regulated by AML1-ETO [ChIP-seq, Cut&Run]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Library strategy: Cut&Run

ChIP-seq vv2.0.0 GitHub

$ Bash example

# Install Cromwell (Java-based workflow engine) if not already installed.
# Download the latest stable release from https://github.com/broadinstitute/cromwell/releases
# For example:
# wget https://github.com/broadinstitute/cromwell/releases/download/86/cromwell-86.jar

# Clone the ENCODE ChIP-seq pipeline repository if not already cloned.
# git clone https://github.com/ENCODE-DCC/chip-seq-pipeline2.git
# cd chip-seq-pipeline2

# Create a placeholder input JSON file for Cut&Run analysis.
# Replace 'your_sample_R1.fastq.gz', 'your_sample_R2.fastq.gz' with actual paths to your FASTQ files.
# Replace 'your_output_directory' with your desired output location.
# The reference genome (hg38) and associated files are placeholders using ENCODE's GCS paths.
# For actual analysis, ensure these files are accessible or provide local paths.
cat << EOF > inputs.json
{
  "chip.fastq_replicates": [
    [
      "your_sample_R1.fastq.gz",
      "your_sample_R2.fastq.gz"
    ]
  ],
  "chip.genome_tsv": "gs://encode-pipeline-genome-data/vERCC/GRCh38_vERCC/GRCh38_vERCC.tsv",
  "chip.chrom_sizes": "gs://encode-pipeline-genome-data/vERCC/GRCh38_vERCC/GRCh38_vERCC.chrom.sizes",
  "chip.blacklist": "gs://encode-pipeline-genome-data/vERCC/GRCh38_vERCC/GRCh38_vERCC.blacklist.bed",
  "chip.paired_end": true,
  "chip.pipeline_type": "cutnrun",
  "chip.aligner": "bwa",
  "chip.peak_caller": "macs2",
  "chip.output_directory": "your_output_directory"
}
EOF

# Execute the ENCODE ChIP-seq pipeline for Cut&Run using Cromwell.
# Ensure you have the 'chip_seq.wdl' file from the cloned repository in the current directory
# and 'cromwell-X.Y.Z.jar' (e.g., cromwell-86.jar) is accessible.
java -jar cromwell-86.jar run chip_seq.wdl -i inputs.json

View on GitHub

For Cut&Run:

ChIP-seq vv2.0.0 GitHub

$ Bash example

# This pipeline is designed for ChIP-seq but is adaptable for Cut&Run data analysis.

# Install Cromwell (Java-based workflow management system) if not already installed.
# Ensure you have Java 8 or higher installed.
# For example, download the latest stable release from Cromwell's GitHub releases page:
# wget https://github.com/broadinstitute/cromwell/releases/download/X.Y.Z/cromwell-X.Y.Z.jar
# mv cromwell-X.Y.Z.jar /usr/local/bin/cromwell.jar

# Clone the ENCODE ChIP-seq pipeline repository
git clone https://github.com/ENCODE-DCC/chip-seq-pipeline2.git
cd chip-seq-pipeline2

# (Optional) Checkout a specific version of the pipeline, e.g., v2.0.0
# git checkout v2.0.0

# --- Prepare input.json for Cut&Run data --- 
# The ENCODE ChIP-seq pipeline uses WDL (Workflow Description Language) and requires an input JSON file.
# Replace placeholders with actual paths to your data and reference files.
# For human (hg38) reference, you would need a genome_tsv file containing paths to:
#   - Reference FASTA
#   - Chromosome sizes file
#   - Blacklist BED file
#   - Bowtie2 index
#   - BWA index
#   - MACS2 genome size parameter (e.g., "hs" for human)
# Example hg38.tsv can be found in the pipeline's 'genome_data' directory or generated using provided scripts.

cat << EOF > cutnrun_input.json
{
  "chip.fastqs_rep1_R1": ["/path/to/your/cutnrun_experiment_rep1_R1.fastq.gz"],
  "chip.fastqs_rep1_R2": ["/path/to/your/cutnrun_experiment_rep1_R2.fastq.gz"],
  "chip.fastqs_ctl_rep1_R1": ["/path/to/your/cutnrun_control_rep1_R1.fastq.gz"],
  "chip.fastqs_ctl_rep1_R2": ["/path/to/your/cutnrun_control_rep1_R2.fastq.gz"],
  "chip.paired_end": true,
  "chip.genome_tsv": "/path/to/your/genome_data/hg38.tsv",
  "chip.output_prefix": "cutnrun_hg38_experiment",
  "chip.aligner": "bowtie2",
  "chip.peak_caller": "macs2",
  "chip.macs2_gsize": "hs",
  "chip.blacklist_bed": "/path/to/your/blacklist/hg38-blacklist.bed",
  "chip.enable_idr": false, # Set to true if you have multiple replicates for IDR analysis
  "chip.fraglen_range": "0 1000" # Typical fragment size range for Cut&Run
}
EOF

# --- Execute the pipeline using Cromwell --- 
# Replace '/path/to/cromwell.jar' with the actual path to your Cromwell JAR file.
# You might need to configure Cromwell for a specific backend (e.g., local, SGE, SLURM) 
# via a Cromwell configuration file (e.g., cromwell.conf).
java -jar /path/to/cromwell.jar run chip.wdl -i cutnrun_input.json

View on GitHub

Adaptor trimming with Trimmomatic-0.32.

Trimmomatic v0.32 GitHub

$ Bash example

bash
# Install Trimmomatic if not already installed
# conda install -c bioconda trimmomatic=0.32

# Define input and output file names
INPUT_R1="input_R1.fastq.gz"
INPUT_R2="input_R2.fastq.gz"
OUTPUT_R1_PAIRED="output_R1_paired.fastq.gz"
OUTPUT_R1_UNPAIRED="output_R1_unpaired.fastq.gz"
OUTPUT_R2_PAIRED="output_R2_paired.fastq.gz"
OUTPUT_R2_UNPAIRED="output_R2_unpaired.fastq.gz"

# Define adaptor file path (adjust path as necessary for your Trimmomatic installation)
# A common adaptor file for Illumina paired-end data is TruSeq3-PE.fa, often found in the 'adapters' directory of Trimmomatic.
# For Trimmomatic 0.32, this file would typically be located within the Trimmomatic installation directory.
ADAPTOR_FILE="/path/to/Trimmomatic-0.32/adapters/TruSeq3-PE.fa"

# Run Trimmomatic for paired-end adaptor trimming
# Adjust '/path/to/Trimmomatic-0.32/trimmomatic-0.32.jar' to the actual location of your Trimmomatic JAR file.
java -jar /path/to/Trimmomatic-0.32/trimmomatic-0.32.jar PE \
    -phred33 \
    "${INPUT_R1}" "${INPUT_R2}" \
    "${OUTPUT_R1_PAIRED}" "${OUTPUT_R1_UNPAIRED}" \
    "${OUTPUT_R2_PAIRED}" "${OUTPUT_R2_UNPAIRED}" \
    ILLUMINACLIP:"${ADAPTOR_FILE}":2:30:10 \
    LEADING:3 \
    TRAILING:3 \
    SLIDINGWINDOW:4:15 \
    MINLEN:36

View on GitHub

Alignment to hg19+scer genome with bowtie2.

Bowtie2 v2.3.4.3 GitHub

$ Bash example

# Install Bowtie2 (if not already installed)
# conda install -c bioconda bowtie2=2.3.4.3

# Define variables
INPUT_FASTQ="input.fastq.gz" # Placeholder for your input FASTQ file
OUTPUT_SAM="output.sam"
OUTPUT_BAM="output.bam"
OUTPUT_SORTED_BAM="output.sorted.bam"

# Reference genome index: hg19+scer (Saccharomyces cerevisiae) combined index
# This is a custom index that needs to be pre-built using bowtie2-build from a combined FASTA file
# containing both human hg19 and S. cerevisiae reference sequences.
# Example path for the index prefix (replace with your actual path):
GENOME_INDEX_PREFIX="/path/to/genome_indices/hg19_scer"

# Perform alignment using Bowtie2
# --local: local alignment mode (soft-clipping allowed, common for eCLIP)
# -p 8: use 8 threads for parallel processing
# -x: specify the index prefix for the reference genome
# -U: specify single-end FASTQ input file
# -S: specify SAM output file
bowtie2 --local -p 8 -x "${GENOME_INDEX_PREFIX}" -U "${INPUT_FASTQ}" -S "${OUTPUT_SAM}"

# Convert SAM to BAM, sort, and index (common post-alignment steps in eCLIP pipelines)
# Install Samtools (if not already installed)
# conda install -c bioconda samtools

samtools view -bS "${OUTPUT_SAM}" > "${OUTPUT_BAM}"
samtools sort "${OUTPUT_BAM}" -o "${OUTPUT_SORTED_BAM}"
samtools index "${OUTPUT_SORTED_BAM}"

View on GitHub

High quality reads (F4q10) were filtered using samtools.

samtools v1.10 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install samtools if not already installed
# conda install -c bioconda samtools=1.10

# Filter high quality reads (F4q10)
# -b: output BAM format
# -F 4: exclude reads where the FLAG bit 4 is set (unmapped reads)
# -q 10: include reads with mapping quality >= 10
samtools view -b -F 4 -q 10 -o filtered.bam input.bam

View on GitHub

Peaks called with MACS2 (narrowPeak, q=0.001).

MACS2 v2.2.7.1 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install MACS2 (if not already installed)
# conda install -c bioconda macs2=2.2.7.1

# Define input files and parameters
TREATMENT_BAM="treatment.bam" # Replace with your actual treatment BAM file
CONTROL_BAM="control.bam"     # Replace with your actual control BAM file (optional but recommended for ChIP-seq)
GENOME_SIZE="hs"              # Genome size (e.g., hs for human, mm for mouse, dm for drosophila, ce for C. elegans). Adjust as needed.
Q_VALUE="0.001"
OUTPUT_PREFIX="my_experiment_peaks"
OUTPUT_DIR="macs2_output"
INPUT_FORMAT="BAMPE"          # Input format: BAM, BED, BAMPE (for paired-end BAM). Adjust based on your data.

mkdir -p "${OUTPUT_DIR}"

macs2 callpeak \
    -t "${TREATMENT_BAM}" \
    -c "${CONTROL_BAM}" \
    -f "${INPUT_FORMAT}" \
    -g "${GENOME_SIZE}" \
    -q "${Q_VALUE}" \
    --outdir "${OUTPUT_DIR}" \
    -n "${OUTPUT_PREFIX}" \
    --keep-dup all \
    --call-summits # Ensures narrowPeak output and identifies peak summits

View on GitHub

For HA Cut&Run +/-dTAG, differential peaks were identified by Diffbind/DESeq2 using yeast spike-in for normalization.

DESeq2 v1.40.2 GitHub

$ Bash example

# Install R and DESeq2 (if not already installed)
# sudo apt-get update
# sudo apt-get install r-base
# R -e "install.packages('BiocManager', repos='https://cloud.r-project.org')"
# R -e "BiocManager::install('DESeq2')"

# --- Placeholder for input files ---
# These files would typically be generated by upstream steps (e.g., DiffBind for peak counts).
# primary_counts.tsv: Tab-separated file with peak IDs as rows and sample names as columns, containing raw counts for the primary organism (e.g., human hg38).
# yeast_counts.tsv: Tab-separated file with yeast feature IDs as rows and sample names as columns, containing raw counts for the yeast spike-in (e.g., sacCer3).
# sample_metadata.tsv: Tab-separated file with sample names, condition, and any other relevant metadata.
# Example content for sample_metadata.tsv:
# sample_id\tcondition
# sample1\tHA_CutRun_dTAG_pos
# sample2\tHA_CutRun_dTAG_pos
# sample3\tHA_CutRun_dTAG_neg
# sample4\tHA_CutRun_dTAG_neg

# Create an R script for DESeq2 analysis with spike-in normalization
cat << 'EOF' > run_deseq2_spikein.R
library(DESeq2)

# Load primary organism counts (e.g., from DiffBind output)
primary_counts <- read.csv("primary_counts.tsv", sep="\t", row.names=1)
primary_counts <- as.matrix(primary_counts)

# Load yeast spike-in counts
yeast_counts <- read.csv("yeast_counts.tsv", sep="\t", row.names=1)
yeast_counts <- as.matrix(yeast_counts)

# Load sample metadata
sample_metadata <- read.csv("sample_metadata.tsv", sep="\t", row.names=1)

# Ensure sample order matches between counts and metadata
sample_metadata <- sample_metadata[colnames(primary_counts), , drop=FALSE]
yeast_counts <- yeast_counts[, colnames(primary_counts)] # Ensure yeast counts match primary samples

# Calculate spike-in normalization factors
# Sum counts for each sample from the yeast spike-in
spikein_sums <- colSums(yeast_counts)
# Calculate size factors based on spike-in sums
# A common approach is to use the median ratio method on spike-in counts
# Or simply scale by total spike-in counts
spikein_sf <- spikein_sums / exp(mean(log(spikein_sums)))

# Create DESeqDataSet object
dds <- DESeqDataSetFromMatrix(countData = primary_counts,
                              colData = sample_metadata,
                              design = ~ condition)

# Assign custom size factors based on spike-in
sizeFactors(dds) <- spikein_sf

# Run DESeq2 analysis
dds <- DESeq(dds)

# Get results for the comparison of interest (e.g., HA_CutRun_dTAG_pos vs HA_CutRun_dTAG_neg)
res <- results(dds, contrast=c("condition", "HA_CutRun_dTAG_pos", "HA_CutRun_dTAG_neg"))

# Order results by adjusted p-value
res <- res[order(res$padj),]

# Save results
write.csv(as.data.frame(res), "differential_peaks_deseq2.csv")

# Optional: Save normalized counts
norm_counts <- counts(dds, normalized=TRUE)
write.csv(as.data.frame(norm_counts), "normalized_peak_counts_deseq2.csv")

# Session info for reproducibility
sessionInfo()
EOF

# Execute the R script
Rscript run_deseq2_spikein.R

View on GitHub

bigWig files were generated using deepTools and bedGraph files were generated using Homer.

HOMER vv4.11.1 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install HOMER (if not already installed)
# git clone https://github.com/homer-tools/homer.git
# cd homer
# perl configureHomer.pl -install

# Define input and output paths
INPUT_BAM="input.bam" # Replace with your actual input BAM file
OUTPUT_TAG_DIR="homer_tags"
OUTPUT_BEDGRAPH="output.bedGraph"
FRAGMENT_SIZE="150" # Adjust fragment size as appropriate for your assay (e.g., 150 for ChIP-seq, or estimated from cross-correlation analysis)

# 1. Create a HOMER tag directory from the BAM file
# This step processes the BAM file to create a directory of tag files, which HOMER uses for downstream analysis.
makeTagDirectory "${OUTPUT_TAG_DIR}" "${INPUT_BAM}"

# 2. Generate bedGraph file from the tag directory
# -o: Specify the output bedGraph file name
# -fsize: Fragment size (e.g., 150bp for ChIP-seq). This parameter extends reads to the estimated fragment length.
# -norm 1e7: Normalize read counts to 10 million total reads (a common practice for comparing samples).
# -res 1: Set the resolution to 1bp, meaning each base pair will have a coverage value.
# -tbp 1: Output tags per base pair.
# -bedGraph: Specify that the output format should be bedGraph.
makeUCSCfile "${OUTPUT_TAG_DIR}" -o "${OUTPUT_BEDGRAPH}" -fsize "${FRAGMENT_SIZE}" -norm 1e7 -res 1 -tbp 1 -bedGraph

View on GitHub

For ChIP-seq:

ChIP-seq vv1.8.0 GitHub

$ Bash example

# This script demonstrates how to run the ENCODE ChIP-seq pipeline locally using Cromwell.
# Ensure Java (JRE 8 or higher) and Cromwell are installed and accessible.
# Download Cromwell JAR from https://github.com/broadinstitute/cromwell/releases
# Example: wget https://github.com/broadinstitute/cromwell/releases/download/85/cromwell-85.jar

# 1. Clone the ENCODE ChIP-seq pipeline repository
# git clone https://github.com/ENCODE-DCC/chip-seq-pipeline2.git
# cd chip-seq-pipeline2
# PIPELINE_DIR=$(pwd)

# 2. Prepare input files and reference genome data
# Replace these with your actual data paths and chosen reference genome (e.g., hg38, mm10).
# For reference data, you can download pre-built ENCODE reference bundles or create your own.
# As a placeholder, we use hg38. You would need to download the actual files.
# Example for hg38 reference files (these are placeholders, actual download needed):
# REF_GENOME_DIR="/path/to/your/reference_genome/hg38"
# mkdir -p $REF_GENOME_DIR
# Download hg38.tsv, hg38.blacklist.bed.gz, hg38.chrom.sizes, hg38.fa, bowtie2_indexes/hg38.*, bwa_indexes/hg38.* 
# from ENCODE DCC or similar sources into $REF_GENOME_DIR.

# Example input FASTQ files (replace with your actual data)
# FASTQ_REP1_R1="/path/to/your/data/sample_rep1_R1.fastq.gz"
# FASTQ_REP1_R2="/path/to/your/data/sample_rep1_R2.fastq.gz" # Use an empty array [] if single-end
# FASTQ_CTL_REP1_R1="/path/to/your/data/control_rep1_R1.fastq.gz"
# FASTQ_CTL_REP1_R2="/path/to/your/data/control_rep1_R2.fastq.gz" # Use an empty array [] if single-end

# 3. Create an input JSON file for the WDL workflow
# This is a minimal example. Adjust parameters based on your experiment (e.g., paired_end, pipeline_type, peak_caller).
# Refer to the pipeline's documentation for all available parameters:
# https://github.com/ENCODE-DCC/chip-seq-pipeline2/blob/master/chip.wdl
# https://github.com/ENCODE-DCC/chip-seq-pipeline2/blob/master/docs/inputs.md
cat << EOF > chip_inputs.json
{
  "chip.genome_tsv": "${REF_GENOME_DIR}/hg38.tsv",
  "chip.fastqs_rep1_R1": ["${FASTQ_REP1_R1}"],
  "chip.fastqs_rep1_R2": ["${FASTQ_REP1_R2}"],
  "chip.fastqs_ctl_rep1_R1": ["${FASTQ_CTL_REP1_R1}"],
  "chip.fastqs_ctl_rep1_R2": ["${FASTQ_CTL_REP1_R2}"],
  "chip.paired_end": true,
  "chip.pipeline_type": "histone", # or "tf" for transcription factor
  "chip.peak_caller": "macs2", # or "spp"
  "chip.blacklist_bed": "${REF_GENOME_DIR}/hg38.blacklist.bed.gz",
  "chip.chrsz": "${REF_GENOME_DIR}/hg38.chrom.sizes",
  "chip.ref_fasta": "${REF_GENOME_DIR}/hg38.fa",
  "chip.bowtie2_idx": "${REF_GENOME_DIR}/bowtie2_indexes/hg38",
  "chip.bwa_idx": "${REF_GENOME_DIR}/bwa_indexes/hg38",
  "chip.output_prefix": "my_chipseq_experiment"
}
EOF

# 4. Run the ChIP-seq pipeline using Cromwell
# Replace /path/to/cromwell-*.jar with the actual path to your Cromwell JAR file.
# Replace /path/to/chip-seq-pipeline2/chip.wdl with the actual path to the WDL file.
java -jar /path/to/cromwell-*.jar run /path/to/chip-seq-pipeline2/chip.wdl -i chip_inputs.json

View on GitHub

Adaptor trimming with Trimmomatic-0.32.

Trimmomatic v0.32

$ Bash example

# Install Trimmomatic (if not already installed)
# For example, using conda:
# conda install -c bioconda trimmomatic

# Define input and output file names (placeholders)
INPUT_R1="sample_R1.fastq.gz"
INPUT_R2="sample_R2.fastq.gz"
OUTPUT_R1_PAIRED="sample_R1_trimmed_paired.fastq.gz"
OUTPUT_R1_UNPAIRED="sample_R1_trimmed_unpaired.fastq.gz"
OUTPUT_R2_PAIRED="sample_R2_trimmed_paired.fastq.gz"
OUTPUT_R2_UNPAIRED="sample_R2_trimmed_unpaired.fastq.gz"

# Define the path to Trimmomatic JAR and adaptor files
# If installed via conda, you might find them like this:
# TRIMMOMATIC_JAR=$(find "$(conda env list | grep '*' | awk '{print $NF}')" -name "trimmomatic-*.jar" | head -n 1)
# ADAPTOR_FILE="$(dirname "${TRIMMOMATIC_JAR}")/adapters/TruSeq3-PE.fa"
# Or, manually specify if not using conda or if paths differ:
TRIMMOMATIC_JAR="/path/to/trimmomatic-0.32.jar" # Adjust this path to your Trimmomatic JAR file
ADAPTOR_FILE="/path/to/adapters/TruSeq3-PE.fa" # Adjust this path to the appropriate adaptor file (e.g., TruSeq3-PE.fa, NexteraPE-PE.fa, etc.)

# Execute Trimmomatic for paired-end adaptor trimming and quality filtering
# Parameters:
# PE: Paired-end mode
# -phred33: Phred score encoding (common for Illumina data)
# ILLUMINACLIP: Adaptor trimming (adaptor_file:seed_mismatches:palindrome_clip_threshold:simple_clip_threshold)
#   - 2: Maximum mismatch count that will be tolerated when aligning the adapter to a read.
#   - 30: Specifies the 'palindrome clip threshold'. A score of 30 is a good default.
#   - 10: Specifies the 'simple clip threshold'. A score of 10 is a good default.
# LEADING: Remove leading low quality bases (below quality 3)
# TRAILING: Remove trailing low quality bases (below quality 3)
# SLIDINGWINDOW: Scan with a 4-base window, cut if average quality below 15
# MINLEN: Drop reads shorter than 36 bases
java -jar "${TRIMMOMATIC_JAR}" PE \
    -phred33 \
    "${INPUT_R1}" "${INPUT_R2}" \
    "${OUTPUT_R1_PAIRED}" "${OUTPUT_R1_UNPAIRED}" \
    "${OUTPUT_R2_PAIRED}" "${OUTPUT_R2_UNPAIRED}" \
    ILLUMINACLIP:"${ADAPTOR_FILE}":2:30:10 \
    LEADING:3 \
    TRAILING:3 \
    SLIDINGWINDOW:4:15 \
    MINLEN:36

Alignment to hg19+dm3 genome with bowtie2.

Bowtie2 v2.3.4.3 GitHub

$ Bash example

# Install Bowtie2 and Samtools if not already installed
# conda install -c bioconda bowtie2 samtools

# Define variables (replace with actual paths and filenames)
READ1="input_R1.fastq.gz"
READ2="input_R2.fastq.gz"
OUTPUT_BAM="aligned_to_hg19_dm3.bam"
BOWTIE2_INDEX_PREFIX="/path/to/your/hg19_dm3_index/hg19_dm3" # This prefix points to the Bowtie2 index files (e.g., hg19_dm3.1.bt2, hg19_dm3.2.bt2, etc.)
NUM_THREADS=8 # Adjust based on available CPU cores

# Perform alignment with Bowtie2 using parameters common for eCLIP (very-sensitive-local)
# and pipe the SAM output to Samtools for conversion to BAM format.
bowtie2 \
  --very-sensitive-local \
  -p ${NUM_THREADS} \
  -x ${BOWTIE2_INDEX_PREFIX} \
  -1 ${READ1} \
  -2 ${READ2} \
  | samtools view -bS - \
  > ${OUTPUT_BAM}

# Optional: Sort and index the resulting BAM file
# samtools sort -o ${OUTPUT_BAM%.bam}.sorted.bam ${OUTPUT_BAM}
# samtools index ${OUTPUT_BAM%.bam}.sorted.bam

View on GitHub

High quality reads (F4q10) were filtered using samtools.

samtools v1.19 GitHub

$ Bash example

# Install samtools (if not already installed)
# conda install -c bioconda samtools

# Filter high quality reads (F4q10) using samtools view
# -F 4: Exclude reads where the 0x4 flag is set (i.e., exclude unmapped reads)
# -q 10: Include only reads with a mapping quality of 10 or higher
# -b: Output in BAM format
samtools view -F 4 -q 10 -b input.bam > filtered.bam

View on GitHub

Normalized bigWig files were generated with deepTools.

deepTools v3.5.1 (Inferred with models/gemini-2.5-flash)

$ Bash example

# Install deepTools (if not already installed)
# conda create -n deeptools_env deeptools=3.5.1
# conda activate deeptools_env

# Placeholder variables
INPUT_BAM="sample.bam"
OUTPUT_BIGWIG="sample_normalized.bw"
BIN_SIZE=10 # Bin size in base pairs
NUMBER_OF_PROCESSORS="auto" # Use all available processors

# Generate normalized bigWig file using CPM (Counts Per Million mapped reads) normalization
bamCoverage \
    --bam "${INPUT_BAM}" \
    --outFileName "${OUTPUT_BIGWIG}" \
    --normalizeUsing CPM \
    --binSize "${BIN_SIZE}" \
    --numberOfProcessors "${NUMBER_OF_PROCESSORS}" \
    --verbose

Tools Used

ChIP-seq Bowtie2 DESeq2 HOMER

Raw Source Text

Library strategy: Cut&Run
For Cut&Run:
Adaptor trimming with Trimmomatic-0.32.
Alignment to hg19+scer genome with bowtie2.
High quality reads (F4q10) were filtered using samtools.
Peaks called with MACS2 (narrowPeak, q=0.001).
For HA Cut&Run +/-dTAG, differential peaks were identified by Diffbind/DESeq2 using yeast spike-in for normalization.
bigWig files were generated using deepTools and bedGraph files were generated using Homer.
For ChIP-seq:
Adaptor trimming with Trimmomatic-0.32.
Alignment to hg19+dm3 genome with bowtie2.
High quality reads (F4q10) were filtered using samtools.
Normalized bigWig files were generated with deepTools.
Genome_build: hg19 (GRCh37)
Supplementary_files_format_and_content: bigWig, bedGraph files
Supplementary_files_format_and_content: 042420_HA_AE_Diffbind-peakinfo_q0.001.csv: Diffbind peak info
Supplementary_files_format_and_content: 042420_HA-AE_cutandrun_diffbind-deseq2_q0.001.csv: Differential peak DESeq2
Supplementary_files_format_and_content: New_ZnF_q0.001_peaks.txt: MACS2 peaks
Supplementary_files_format_and_content: CD34_HA_merged_noNeg_nomodel_extsize300_q0.001_peaks.txt: MACS2 peaks

← Back to Analysis