GSE12680 Processing Pipeline

ChIP-Seq code_examples 4 steps

Publication

Divergent transcription from active promoters.

Science (New York, N.Y.) (2008) — PMID 19056940

Dataset

Divergent transcription from active promoters

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Images analysis and base calling was done using solexa pipeline and reads aligned to both mouse NCBI build 36 using ELAND.

ELAND vEarly Illumina/Solexa pipeline version (circa 2006-2008) GitHub

$ Bash example

# ELAND was a proprietary aligner bundled with the early Solexa/Illumina sequencing pipeline.
# It was not typically installed via public package managers like conda.
# Installation involved setting up the full Solexa pipeline software suite on dedicated hardware.

# Define the path to the indexed mouse NCBI build 36 reference genome.
# This genome assembly is also known as mm8 (UCSC).
# Reference data would have been prepared by the Solexa pipeline's indexing tools.
# For a modern equivalent, you would download the mm8 genome from UCSC or NCBI and index it.
MOUSE_NCBI36_REFERENCE_INDEX="/path/to/mouse/NCBI36/eland_index"

# Define the input FASTQ file containing the sequencing reads.
INPUT_READS_FASTQ="sample_reads.fastq"

# Define the output file for ELAND alignment results.
# ELAND typically produced a custom text-based alignment format.
OUTPUT_ELAND_ALIGNMENT="sample_aligned.eland"

# Conceptual ELAND alignment command.
# The exact command-line interface for ELAND was often integrated within the Solexa pipeline's
# workflow scripts and might not have been a simple standalone executable for end-users.
# This command is illustrative of the core parameters: input reads, reference index, and output.
# The actual command would have been executed within the Solexa pipeline's environment.
eland_aligner -f "${INPUT_READS_FASTQ}" -g "${MOUSE_NCBI36_REFERENCE_INDEX}" -o "${OUTPUT_ELAND_ALIGNMENT}"

View on GitHub

For ChIP-Seq, sequences from all lanes were extended 200bp (maximum fragment length accounting for ~100bp of primer sequence), and allocated into 25 bp bins.

ChIP-seq v3.5.1 GitHub

$ Bash example

# conda install -c bioconda deeptools

# Placeholder for input BAM file and output BigWig file
INPUT_BAM="aligned_reads.bam"
OUTPUT_BIGWIG="extended_binned_coverage.bw"

# Placeholder for genome reference (e.g., hg38)
# The effective genome size is needed for RPGC normalization.
# For hg38, a common value is 2913022398 (excluding Ns and mitochondrial genome).
# This value can be obtained using tools like 'chrom_sizes.py' from deeptools or 'faidx' and custom scripts.
EFFECTIVE_GENOME_SIZE="2913022398" # Example for hg38

bamCoverage \
    -b "${INPUT_BAM}" \
    -o "${OUTPUT_BIGWIG}" \
    --extendReads 200 \
    --binSize 25 \
    --normalizeUsing RPGC \
    --effectiveGenomeSize "${EFFECTIVE_GENOME_SIZE}" \
    --numberOfProcessors auto

View on GitHub

Genomic bins containing statistically significant ChIP-seq enrichment were identified by comparison to a Poissonian background model, using a p-value threshold of 10-9.

ChIP-seq v2.2.7.1 GitHub

$ Bash example

# Install macs2 (e.g., using conda)
# conda install -c bioconda macs2

# Define input files and output prefix (placeholders)
# Replace 'chip.bam' with your actual ChIP-seq alignment file
# Replace 'control.bam' with your actual control/input alignment file
# Replace 'chip_peaks' with your desired output prefix
# Replace 'hs' with the appropriate genome size for your reference (e.g., 'mm' for mouse, 'ce' for C. elegans)

# Call peaks using macs2 with a Poissonian background model and specified p-value threshold
macs2 callpeak \
  -t chip.bam \
  -c control.bam \
  -f BAM \
  -g hs \
  -n chip_peaks \
  --pvalue 1e-9 \
  --outdir .

View on GitHub

Additionally, we used an empirical background model obtained from identical Solexa sequencing of DNA from whole cell extract (WCE) from matched cell samples (>5X normalized enrichment across the entire region).

clipper (Inferred with models/gemini-2.5-flash) vv1.0.0 GitHub

$ Bash example

# Install clipper (example, adjust for specific environment)
# For example, using pip:
# pip install clipper
# Or using a Docker image as specified in the eCLIP CWL workflow:
# docker pull yeolab/clipper:v1.0.0

# Placeholder for input files. These would be aligned BAM files.
# IP_BAM: The BAM file for the immunoprecipitated sample.
IP_BAM="path/to/your/ip_sample.bam"
# WCE_BAM: The BAM file for the Whole Cell Extract (WCE) control sample.
# This serves as the "empirical background model" described.
WCE_BAM="path/to/your/wce_control.bam"

# Output prefix for the peak calling results.
OUTPUT_PREFIX="eclip_peaks"

# Reference genome species. Common choices include hg38 for human.
# The description does not specify, so hg38 is used as a placeholder.
SPECIES="hg38"

# Execute clipper for peak calling.
# The description mentions ">5X normalized enrichment across the entire region".
# This is likely a characteristic of the identified peaks or a post-processing filter,
# as clipper itself primarily uses statistical thresholds for peak calling.
# Enrichment values are typically calculated by comparing IP signal to control signal.
clipper --species "${SPECIES}" \
        --bam "${IP_BAM}" \
        --control "${WCE_BAM}" \
        --output "${OUTPUT_PREFIX}"

View on GitHub

Tools Used

ChIP-seq

Raw Source Text

Images analysis and base calling was done using solexa pipeline and reads aligned to both mouse NCBI build 36 using ELAND.  For ChIP-Seq, sequences from all lanes were extended 200bp (maximum fragment length accounting for ~100bp of primer sequence), and allocated into 25 bp bins.  Genomic bins containing statistically significant ChIP-seq enrichment were identified by comparison to a Poissonian background model, using a p-value threshold of 10-9.  Additionally, we used an empirical background model obtained from identical Solexa sequencing of DNA from whole cell extract (WCE) from matched cell samples (>5X normalized enrichment across the entire region).

← Back to Analysis