GSE12680 Processing Pipeline
ChIP-Seq
code_examples
4 steps
Publication
Divergent transcription from active promoters.Science (New York, N.Y.) (2008) — PMID 19056940
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Images analysis and base calling was done using solexa pipeline and reads aligned to both mouse NCBI build 36 using ELAND.
$ Bash example
# ELAND was a proprietary aligner bundled with the early Solexa/Illumina sequencing pipeline. # It was not typically installed via public package managers like conda. # Installation involved setting up the full Solexa pipeline software suite on dedicated hardware. # Define the path to the indexed mouse NCBI build 36 reference genome. # This genome assembly is also known as mm8 (UCSC). # Reference data would have been prepared by the Solexa pipeline's indexing tools. # For a modern equivalent, you would download the mm8 genome from UCSC or NCBI and index it. MOUSE_NCBI36_REFERENCE_INDEX="/path/to/mouse/NCBI36/eland_index" # Define the input FASTQ file containing the sequencing reads. INPUT_READS_FASTQ="sample_reads.fastq" # Define the output file for ELAND alignment results. # ELAND typically produced a custom text-based alignment format. OUTPUT_ELAND_ALIGNMENT="sample_aligned.eland" # Conceptual ELAND alignment command. # The exact command-line interface for ELAND was often integrated within the Solexa pipeline's # workflow scripts and might not have been a simple standalone executable for end-users. # This command is illustrative of the core parameters: input reads, reference index, and output. # The actual command would have been executed within the Solexa pipeline's environment. eland_aligner -f "${INPUT_READS_FASTQ}" -g "${MOUSE_NCBI36_REFERENCE_INDEX}" -o "${OUTPUT_ELAND_ALIGNMENT}" -
2
For ChIP-Seq, sequences from all lanes were extended 200bp (maximum fragment length accounting for ~100bp of primer sequence), and allocated into 25 bp bins.
$ Bash example
# conda install -c bioconda deeptools # Placeholder for input BAM file and output BigWig file INPUT_BAM="aligned_reads.bam" OUTPUT_BIGWIG="extended_binned_coverage.bw" # Placeholder for genome reference (e.g., hg38) # The effective genome size is needed for RPGC normalization. # For hg38, a common value is 2913022398 (excluding Ns and mitochondrial genome). # This value can be obtained using tools like 'chrom_sizes.py' from deeptools or 'faidx' and custom scripts. EFFECTIVE_GENOME_SIZE="2913022398" # Example for hg38 bamCoverage \ -b "${INPUT_BAM}" \ -o "${OUTPUT_BIGWIG}" \ --extendReads 200 \ --binSize 25 \ --normalizeUsing RPGC \ --effectiveGenomeSize "${EFFECTIVE_GENOME_SIZE}" \ --numberOfProcessors auto -
3
Genomic bins containing statistically significant ChIP-seq enrichment were identified by comparison to a Poissonian background model, using a p-value threshold of 10-9.
$ Bash example
# Install macs2 (e.g., using conda) # conda install -c bioconda macs2 # Define input files and output prefix (placeholders) # Replace 'chip.bam' with your actual ChIP-seq alignment file # Replace 'control.bam' with your actual control/input alignment file # Replace 'chip_peaks' with your desired output prefix # Replace 'hs' with the appropriate genome size for your reference (e.g., 'mm' for mouse, 'ce' for C. elegans) # Call peaks using macs2 with a Poissonian background model and specified p-value threshold macs2 callpeak \ -t chip.bam \ -c control.bam \ -f BAM \ -g hs \ -n chip_peaks \ --pvalue 1e-9 \ --outdir .
-
4
Additionally, we used an empirical background model obtained from identical Solexa sequencing of DNA from whole cell extract (WCE) from matched cell samples (>5X normalized enrichment across the entire region).
$ Bash example
# Install clipper (example, adjust for specific environment) # For example, using pip: # pip install clipper # Or using a Docker image as specified in the eCLIP CWL workflow: # docker pull yeolab/clipper:v1.0.0 # Placeholder for input files. These would be aligned BAM files. # IP_BAM: The BAM file for the immunoprecipitated sample. IP_BAM="path/to/your/ip_sample.bam" # WCE_BAM: The BAM file for the Whole Cell Extract (WCE) control sample. # This serves as the "empirical background model" described. WCE_BAM="path/to/your/wce_control.bam" # Output prefix for the peak calling results. OUTPUT_PREFIX="eclip_peaks" # Reference genome species. Common choices include hg38 for human. # The description does not specify, so hg38 is used as a placeholder. SPECIES="hg38" # Execute clipper for peak calling. # The description mentions ">5X normalized enrichment across the entire region". # This is likely a characteristic of the identified peaks or a post-processing filter, # as clipper itself primarily uses statistical thresholds for peak calling. # Enrichment values are typically calculated by comparing IP signal to control signal. clipper --species "${SPECIES}" \ --bam "${IP_BAM}" \ --control "${WCE_BAM}" \ --output "${OUTPUT_PREFIX}"
Tools Used
Raw Source Text
Images analysis and base calling was done using solexa pipeline and reads aligned to both mouse NCBI build 36 using ELAND. For ChIP-Seq, sequences from all lanes were extended 200bp (maximum fragment length accounting for ~100bp of primer sequence), and allocated into 25 bp bins. Genomic bins containing statistically significant ChIP-seq enrichment were identified by comparison to a Poissonian background model, using a p-value threshold of 10-9. Additionally, we used an empirical background model obtained from identical Solexa sequencing of DNA from whole cell extract (WCE) from matched cell samples (>5X normalized enrichment across the entire region).