GSE12680 Processing Pipeline

ChIP-Seq code_examples 4 steps

Publication

Divergent transcription from active promoters.

Science (New York, N.Y.) (2008) — PMID 19056940

Dataset

GSE12680

Divergent transcription from active promoters

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Images analysis and base calling was done using solexa pipeline and reads aligned to both mouse NCBI build 36 using ELAND.

    ELAND vEarly Illumina/Solexa pipeline version (circa 2006-2008) GitHub
    $ Bash example
    # ELAND was a proprietary aligner bundled with the early Solexa/Illumina sequencing pipeline.
    # It was not typically installed via public package managers like conda.
    # Installation involved setting up the full Solexa pipeline software suite on dedicated hardware.
    
    # Define the path to the indexed mouse NCBI build 36 reference genome.
    # This genome assembly is also known as mm8 (UCSC).
    # Reference data would have been prepared by the Solexa pipeline's indexing tools.
    # For a modern equivalent, you would download the mm8 genome from UCSC or NCBI and index it.
    MOUSE_NCBI36_REFERENCE_INDEX="/path/to/mouse/NCBI36/eland_index"
    
    # Define the input FASTQ file containing the sequencing reads.
    INPUT_READS_FASTQ="sample_reads.fastq"
    
    # Define the output file for ELAND alignment results.
    # ELAND typically produced a custom text-based alignment format.
    OUTPUT_ELAND_ALIGNMENT="sample_aligned.eland"
    
    # Conceptual ELAND alignment command.
    # The exact command-line interface for ELAND was often integrated within the Solexa pipeline's
    # workflow scripts and might not have been a simple standalone executable for end-users.
    # This command is illustrative of the core parameters: input reads, reference index, and output.
    # The actual command would have been executed within the Solexa pipeline's environment.
    eland_aligner -f "${INPUT_READS_FASTQ}" -g "${MOUSE_NCBI36_REFERENCE_INDEX}" -o "${OUTPUT_ELAND_ALIGNMENT}"
  2. 2

    For ChIP-Seq, sequences from all lanes were extended 200bp (maximum fragment length accounting for ~100bp of primer sequence), and allocated into 25 bp bins.

    $ Bash example
    # conda install -c bioconda deeptools
    
    # Placeholder for input BAM file and output BigWig file
    INPUT_BAM="aligned_reads.bam"
    OUTPUT_BIGWIG="extended_binned_coverage.bw"
    
    # Placeholder for genome reference (e.g., hg38)
    # The effective genome size is needed for RPGC normalization.
    # For hg38, a common value is 2913022398 (excluding Ns and mitochondrial genome).
    # This value can be obtained using tools like 'chrom_sizes.py' from deeptools or 'faidx' and custom scripts.
    EFFECTIVE_GENOME_SIZE="2913022398" # Example for hg38
    
    bamCoverage \
        -b "${INPUT_BAM}" \
        -o "${OUTPUT_BIGWIG}" \
        --extendReads 200 \
        --binSize 25 \
        --normalizeUsing RPGC \
        --effectiveGenomeSize "${EFFECTIVE_GENOME_SIZE}" \
        --numberOfProcessors auto
  3. 3

    Genomic bins containing statistically significant ChIP-seq enrichment were identified by comparison to a Poissonian background model, using a p-value threshold of 10-9.

    $ Bash example
    # Install macs2 (e.g., using conda)
    # conda install -c bioconda macs2
    
    # Define input files and output prefix (placeholders)
    # Replace 'chip.bam' with your actual ChIP-seq alignment file
    # Replace 'control.bam' with your actual control/input alignment file
    # Replace 'chip_peaks' with your desired output prefix
    # Replace 'hs' with the appropriate genome size for your reference (e.g., 'mm' for mouse, 'ce' for C. elegans)
    
    # Call peaks using macs2 with a Poissonian background model and specified p-value threshold
    macs2 callpeak \
      -t chip.bam \
      -c control.bam \
      -f BAM \
      -g hs \
      -n chip_peaks \
      --pvalue 1e-9 \
      --outdir .
  4. 4

    Additionally, we used an empirical background model obtained from identical Solexa sequencing of DNA from whole cell extract (WCE) from matched cell samples (>5X normalized enrichment across the entire region).

    clipper (Inferred with models/gemini-2.5-flash) vv1.0.0 GitHub
    $ Bash example
    # Install clipper (example, adjust for specific environment)
    # For example, using pip:
    # pip install clipper
    # Or using a Docker image as specified in the eCLIP CWL workflow:
    # docker pull yeolab/clipper:v1.0.0
    
    # Placeholder for input files. These would be aligned BAM files.
    # IP_BAM: The BAM file for the immunoprecipitated sample.
    IP_BAM="path/to/your/ip_sample.bam"
    # WCE_BAM: The BAM file for the Whole Cell Extract (WCE) control sample.
    # This serves as the "empirical background model" described.
    WCE_BAM="path/to/your/wce_control.bam"
    
    # Output prefix for the peak calling results.
    OUTPUT_PREFIX="eclip_peaks"
    
    # Reference genome species. Common choices include hg38 for human.
    # The description does not specify, so hg38 is used as a placeholder.
    SPECIES="hg38"
    
    # Execute clipper for peak calling.
    # The description mentions ">5X normalized enrichment across the entire region".
    # This is likely a characteristic of the identified peaks or a post-processing filter,
    # as clipper itself primarily uses statistical thresholds for peak calling.
    # Enrichment values are typically calculated by comparing IP signal to control signal.
    clipper --species "${SPECIES}" \
            --bam "${IP_BAM}" \
            --control "${WCE_BAM}" \
            --output "${OUTPUT_PREFIX}"

Tools Used

Raw Source Text
Images analysis and base calling was done using solexa pipeline and reads aligned to both mouse NCBI build 36 using ELAND.  For ChIP-Seq, sequences from all lanes were extended 200bp (maximum fragment length accounting for ~100bp of primer sequence), and allocated into 25 bp bins.  Genomic bins containing statistically significant ChIP-seq enrichment were identified by comparison to a Poissonian background model, using a p-value threshold of 10-9.  Additionally, we used an empirical background model obtained from identical Solexa sequencing of DNA from whole cell extract (WCE) from matched cell samples (>5X normalized enrichment across the entire region).
← Back to Analysis