GSE68618 Processing Pipeline

ChIP-Seq code_examples 11 steps

Publication

Pseudotemporal Ordering of Single Cells Reveals Metabolic Control of Postnatal β Cell Proliferation.

Cell metabolism (2017) — PMID 28467932

Dataset

Aging-dependent demethylation of regulatory elements correlates with chromatin state and improved insulin secretion by pancreatic Î² cells

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Illumina Casava-1.8.2 software was used for basecalling.

Illumina Casava v1.8.2 GitHub

$ Bash example

# Illumina Casava 1.8.2 is typically installed as part of the Illumina sequencing system software or as a standalone package.
# Installation instructions are highly dependent on the specific Illumina system and OS.
# Example (installation path may vary):
# wget https://support.illumina.com/content/dam/illumina-support/documents/downloads/software/casava/casava_1_8_2_installer.run
# chmod +x casava_1_8_2_installer.run
# ./casava_1_8_2_installer.run

# Assuming the run folder is named '123456_A00001_0001_AHXXXXXX' and output directory is 'fastq_output'
# Replace with actual paths and SampleSheet.csv
RUN_FOLDER="/path/to/your/Illumina/RunFolder" # e.g., /data/runs/123456_A00001_0001_AHXXXXXX
OUTPUT_DIR="fastq_output"
SAMPLE_SHEET="/path/to/SampleSheet.csv" # Required for demultiplexing

# Create output directory if it doesn't exist
mkdir -p "${OUTPUT_DIR}"

# Step 1: Configure Bcl to Fastq conversion and demultiplexing
# This command generates a Makefile in the OUTPUT_DIR based on the run folder and sample sheet.
# Parameters like --fastq-cluster-count, --mismatches, --no-aligned-reads are common defaults.
configureBclToFastq.pl \
    --input-dir "${RUN_FOLDER}" \
    --output-dir "${OUTPUT_DIR}" \
    --sample-sheet "${SAMPLE_SHEET}" \
    --fastq-cluster-count 0 \
    --mismatches 1 \
    --no-aligned-reads

# Step 2: Execute the Makefile to perform basecalling and demultiplexing
# This will run the actual conversion of BCL files to FASTQ files.
make -C "${OUTPUT_DIR}"

View on GitHub

RNA-Seq: Low quality reads as well as ribosomal and other repeat sequences were filtered out before alignment.

RNA-seq vlatest stable GitHub

$ Bash example

# Install BBMap suite (contains BBDuk)
# conda install -c bioconda bbmap

# Define input and output files
INPUT_FASTQ="input_reads.fastq.gz"
OUTPUT_FASTQ="filtered_reads.fastq.gz"

# Define reference files for ribosomal and repeat sequences.
# These files need to be prepared or downloaded. Examples:
# - human_rRNA.fa: Can be obtained from NCBI RefSeq or specialized rRNA databases (e.g., SILVA).
# - human_repeats.fa: Can be obtained from RepeatMasker database or UCSC Genome Browser.
# For demonstration, we assume these files exist in the current directory or are specified by their full path.

# Placeholder reference files (replace with actual paths to your rRNA and repeat FASTA files)
RR_REFERENCE="path/to/human_rRNA.fa,path/to/human_repeats.fa"

# Run BBDuk to filter low quality reads and ribosomal/repeat sequences
bbduk.sh \
  in="${INPUT_FASTQ}" \
  out="${OUTPUT_FASTQ}" \
  ref="${RR_REFERENCE}" \
  k=31 hdist=1 \
  qtrim=rl trimq=20 \
  minlen=25 \
  stats="${OUTPUT_FASTQ%.fastq.gz}_bbduk_stats.txt" \
  overwrite=true

View on GitHub

RUM software was used for aligning reads to mouse genome mm9 and Refseq transcriptome and for transcript quantitation.

RefSeq vNot specified

$ Bash example

# Install RUM (RNA-seq Unified Mapper) - Example, actual installation may vary.
# RUM is typically downloaded as a tarball and run via Perl scripts.
# Example:
# wget http://cgr.wustl.edu/rum/RUM_v1.11.tar.gz
# tar -xzf RUM_v1.11.tar.gz
# export PATH=$PATH:$(pwd)/RUM_v1.11/bin

# Define input and output paths
READ1="input_reads_R1.fastq.gz"
READ2="input_reads_R2.fastq.gz" # Assuming paired-end reads for typical RNA-seq
OUTPUT_DIR="rum_output"
GENOME_FASTA="mm9.fa"
TRANSCRIPTOME_FASTA="RefSeq_mm9_transcripts.fa" # RUM typically uses transcriptome FASTA for indexing
RUM_INDEX_DIR="rum_index_mm9_refseq"

# Create dummy input files and directories for demonstration
mkdir -p "${OUTPUT_DIR}"
mkdir -p "${RUM_INDEX_DIR}"
# touch "${READ1}" "${READ2}" # In a real scenario, these would be actual read files

# Placeholder for reference genome and transcriptome files
# Download mm9 genome from UCSC (example source)
# wget -P . https://hgdownload.soe.ucsc.edu/goldenPath/mm9/bigZips/chromFa.tar.gz
# tar -xzf chromFa.tar.gz
# cat chr*.fa > "${GENOME_FASTA}"
# rm chr*.fa chromFa.tar.gz

# Download RefSeq transcriptome for mm9 (example source, actual file might need generation from annotations)
# This is a placeholder; actual RefSeq transcriptome for mm9 might need to be generated
# from RefSeq annotations (e.g., GTF/GFF) and the genome FASTA. RUM expects a FASTA file of transcript sequences.
# Example (general mouse RefSeq mRNA, not specific to mm9 assembly):
# wget -P . https://ftp.ncbi.nlm.nih.gov/refseq/M_musculus/mRNA_FASTA/mouse.1.rna.fna.gz
# gunzip mouse.1.rna.fna.gz
# mv mouse.1.rna.fna "${TRANSCRIPTOME_FASTA}"

# Create a minimal RUM configuration file (rum_config.txt)
# This file specifies paths to reference files and other parameters required by RUM.
cat <<EOF > rum_config.txt
# RUM Configuration File
# General settings
RUM_HOME=$(pwd)/RUM_v1.11 # Adjust if RUM is installed elsewhere
OUTPUT_DIR=${OUTPUT_DIR}
INDEX_DIR=${RUM_INDEX_DIR}
GENOME_FASTA=${GENOME_FASTA}
TRANSCRIPTOME_FASTA=${TRANSCRIPTOME_FASTA}
# Other parameters can be added here, e.g., number of threads, mismatch limits, etc.
# For example:
# THREADS=8
# MAX_MISMATCHES=2
EOF

# Step 1: Build RUM index for mm9 genome and Refseq transcriptome
# This step needs to be run once for the given reference files.
# rum_index_builder.pl -c rum_config.txt
# Note: The above command assumes rum_index_builder.pl is in PATH or RUM_HOME/bin.
# For demonstration, we'll assume the index is already built or the command is run separately.

# Step 2: Run RUM for alignment and transcript quantitation
# This command aligns reads to the mm9 genome and Refseq transcriptome,
# and performs transcript quantitation.
rum_runner.pl -c rum_config.txt -1 "${READ1}" -2 "${READ2}" -o "${OUTPUT_DIR}"

Differential expression analysis was carried out using a custom script implementing EdgeR software.

edgeR v3.42.0 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install R and Bioconductor if not already present
# conda install -c conda-forge r-base
# conda install -c bioconda bioconductor-edger

# Create a placeholder R script for differential expression analysis using edgeR
# This script assumes 'counts.tsv' contains raw gene counts and 'sample_info.tsv' contains sample metadata.
cat << 'EOF' > custom_edgeR_script.R
#!/usr/bin/env Rscript

# Load necessary libraries
library(edgeR)
library(getopt) # For parsing command-line arguments

# Define command-line arguments
spec <- matrix(c(
  'counts',    'c', 1, "character", "Path to the count matrix (TSV)",
  'samples',   's', 1, "character", "Path to the sample information file (TSV)",
  'output',    'o', 1, "character", "Path for the output differential expression results (TSV)",
  'design_formula', 'd', 1, "character", "R formula for the design matrix (e.g., '~ condition')",
  'contrast_coef', 't', 1, "character", "Coefficient or contrast string for testing (e.g., 'conditiontreated' or 'conditiontreated - conditionuntreated')",
  'help',      'h', 0, "logical",   "Show this help message"
), byrow=TRUE, ncol=5)

opt <- getopt(spec)

# If help was asked for, print a friendly message
if ( !is.null(opt$help) ) {
  cat(getopt(spec, usage=TRUE));
  q(status=1);
}

# Check for required arguments
if ( is.null(opt$counts) || is.null(opt$samples) || is.null(opt$output) || is.null(opt$design_formula) || is.null(opt$contrast_coef) ) {
  cat("Missing required arguments.\n");
  cat(getopt(spec, usage=TRUE));
  q(status=1);
}

counts_file <- opt$counts
samples_file <- opt$samples
output_file <- opt$output
design_formula_str <- opt$design_formula
contrast_coef_str <- opt$contrast_coef

message(paste("Loading counts from:", counts_file))
message(paste("Loading sample info from:", samples_file))
message(paste("Using design formula:", design_formula_str))
message(paste("Using contrast coefficient/string:", contrast_coef_str))

# Load count data
counts <- read.delim(counts_file, row.names = 1, check.names = FALSE)

# Load sample information
sample_info <- read.delim(samples_file, row.names = 1, check.names = FALSE)

# Ensure sample order matches between counts and sample_info
sample_info <- sample_info[colnames(counts), , drop = FALSE]

# Create DGEList object
dge <- DGEList(counts = counts)

# Filter out lowly expressed genes (optional, but good practice)
# Assuming 'condition' is a column in sample_info for filtering example
if ("condition" %in% colnames(sample_info)) {
  keep <- filterByExpr(dge, group = sample_info$condition)
} else {
  keep <- filterByExpr(dge)
}
dge <- dge[keep, , keep.lib.sizes = FALSE]

# Normalize library sizes
dge <- calcNormFactors(dge)

# Create design matrix
design <- model.matrix(as.formula(design_formula_str), data = sample_info)

# Estimate dispersion
dge <- estimateDisp(dge, design)

# Fit GLM
fit <- glmQLFit(dge, design)

# Perform differential expression testing
# Handle contrast: if it contains '-' or '+', treat as a contrast string for makeContrasts
# otherwise, treat as a coefficient name.
if (grepl("-", contrast_coef_str) || grepl("\\+", contrast_coef_str)) {
  message(paste("Interpreting contrast as a comparison string:", contrast_coef_str))
  contrast_matrix <- makeContrasts(contrasts = contrast_coef_str, levels = design)
  lrt <- glmQLFTest(fit, contrast = contrast_matrix)
} else {
  message(paste("Interpreting contrast as a single coefficient name:", contrast_coef_str))
  lrt <- glmQLFTest(fit, coef = contrast_coef_str)
}

# Get results
results <- topTags(lrt, n = Inf, sort.by = "PValue")$table

# Add adjusted p-value
results$FDR <- p.adjust(results$PValue, method = "BH")

# Write results to file
write.table(results, file = output_file, sep = "\t", quote = FALSE, row.names = TRUE)

message(paste("Differential expression results written to:", output_file))

EOF

# Create dummy input files for demonstration
# Dummy counts.tsv
cat << 'EOF' > counts.tsv
GeneID  Sample1 Sample2 Sample3 Sample4 Sample5 Sample6
GeneA   100     120     110     200     210     190
GeneB   50      60      55      100     110     95
GeneC   200     210     190     100     120     110
GeneD   10      12      11      20      22      19
GeneE   5       6       5       10      11      9
EOF

# Dummy sample_info.tsv
cat << 'EOF' > sample_info.tsv
SampleID        condition       batch
Sample1         untreated       batch1
Sample2         untreated       batch1
Sample3         untreated       batch2
Sample4         treated         batch2
Sample5         treated         batch3
Sample6         treated         batch3
EOF

# Execute the custom edgeR script
# Example 1: Differential expression between 'treated' and 'untreated' conditions, adjusting for 'batch'
# Here, 'conditiontreated' tests the effect of 'treated' vs the reference level of 'condition' (e.g., 'untreated')
Rscript custom_edgeR_script.R \
  --counts counts.tsv \
  --samples sample_info.tsv \
  --output differential_expression_results_1.tsv \
  --design_formula "~batch + condition" \
  --contrast_coef "conditiontreated"

# Example 2: If using a design like '~0 + condition' to explicitly model each condition group,
# you can specify a direct contrast between groups.
Rscript custom_edgeR_script.R \
  --counts counts.tsv \
  --samples sample_info.tsv \
  --output differential_expression_results_2.tsv \
  --design_formula "~0 + condition + batch" \
  --contrast_coef "conditiontreated - conditionuntreated"

# Clean up dummy files (optional)
# rm counts.tsv sample_info.tsv custom_edgeR_script.R

View on GitHub

ChIP-Seq: Reads were aligned to mm9 genome using bowtie version 0.12.7 with parameters -v 3 -k 1 -m 1 --best --strata.

Bowtie v0.12.7 GitHub

$ Bash example

# Install Bowtie (example using conda)
# conda install -c bioconda bowtie=0.12.7

# Define reference genome index path (placeholder)
MM9_INDEX="/path/to/mm9_index" # Replace with actual path to mm9 Bowtie index files (e.g., mm9)

# Define input reads file (placeholder)
INPUT_READS="reads.fastq" # Replace with actual input FASTQ file

# Define output SAM file (placeholder)
OUTPUT_SAM="output.sam" # Replace with desired output SAM file name

# Align reads to mm9 genome using bowtie
bowtie -v 3 -k 1 -m 1 --best --strata "${MM9_INDEX}" "${INPUT_READS}" > "${OUTPUT_SAM}"

View on GitHub

Redundant reads were discarded before creating bedgraph profiles.

sambamba (Inferred with models/gemini-2.5-flash) v0.8.2 GitHub

$ Bash example

# Install sambamba if not already installed
# conda install -c bioconda sambamba

# Assuming 'aligned_reads.bam' is the input sorted BAM file after alignment.
# The '-r' flag ensures that duplicate reads are removed, not just marked.
# The '-p' flag specifies the number of threads to use for parallel processing.
# Replace 'aligned_reads.bam' with your actual input file and 'deduplicated_reads.bam' with your desired output file name.
# Adjust the number of threads (e.g., 8) based on your system's resources.
sambamba markdup -r -p 8 aligned_reads.bam deduplicated_reads.bam

View on GitHub

Peak calling was done using HOMER for H3K4me1 and H3K27Ac marks at FDR=0.01 with respect to corresponding Input.

HOMER vNot explicitly inferable from description GitHub

$ Bash example

# Define input BAM files and output directories (placeholders)
# Replace with actual paths to your aligned BAM files
H3K4ME1_BAM="h3k4me1_rep1.bam"
H3K27AC_BAM="h3k27ac_rep1.bam"
INPUT_BAM="input_rep1.bam"

# Define output tag directories and peak files
H3K4ME1_TAG_DIR="h3k4me1_tag_dir"
H3K27AC_TAG_DIR="h3k27ac_tag_dir"
INPUT_TAG_DIR="input_tag_dir"

H3K4ME1_PEAKS_OUTPUT="h3k4me1_peaks.txt"
H3K27AC_PEAKS_OUTPUT="h3k27ac_peaks.txt"

# Install HOMER (example using conda)
# conda create -n homer_env homer
# conda activate homer_env

# Make tag directories for each sample
# Note: HOMER's makeTagDirectory can take a genome for normalization, but it's not strictly required for basic tag directory creation.
# If you have a specific genome, you might add -genome <genome_name> (e.g., -genome hg38)
makeTagDirectory "${H3K4ME1_TAG_DIR}" "${H3K4ME1_BAM}"
makeTagDirectory "${H3K27AC_TAG_DIR}" "${H3K27AC_BAM}"
makeTagDirectory "${INPUT_TAG_DIR}" "${INPUT_BAM}"

# Perform peak calling for H3K4me1
# -style histone is appropriate for histone marks
# -FDR 0.01 is specified in the description
# -i specifies the input/control tag directory
findPeaks "${H3K4ME1_TAG_DIR}" \
    -o "${H3K4ME1_PEAKS_OUTPUT}" \
    -i "${INPUT_TAG_DIR}" \
    -style histone \
    -FDR 0.01

# Perform peak calling for H3K27Ac
findPeaks "${H3K27AC_TAG_DIR}" \
    -o "${H3K27AC_PEAKS_OUTPUT}" \
    -i "${INPUT_TAG_DIR}" \
    -style histone \
    -FDR 0.01

View on GitHub

Peak calling using STAR was done for H3K27me3 marks at FDR=0.01.

STAR v2.7.10a GitHub

$ Bash example

# Install STAR (if not already installed)
# conda install -c bioconda star

# Define variables
GENOME_DIR="/path/to/STAR_genome_index/hg38" # Placeholder for human genome hg38 index
GENOME_FASTA="/path/to/hg38.fa" # Placeholder for human genome hg38 fasta
READS_R1="sample_H3K27me3_R1.fastq.gz"
READS_R2="sample_H3K27me3_R2.fastq.gz" # Assuming paired-end reads; adjust if single-end
OUTPUT_PREFIX="H3K27me3_aligned_STAR_"
FDR_THRESHOLD="0.01"

# 1. Build STAR genome index (if not already built)
# This step is typically done once per genome. Replace paths with actual genome files.
# mkdir -p "${GENOME_DIR}"
# STAR --runMode genomeGenerate \
#      --genomeDir "${GENOME_DIR}" \
#      --genomeFastaFiles "${GENOME_FASTA}" \
#      --runThreadN 8 # Adjust thread count as needed

# 2. Align reads using STAR
# While STAR is primarily an RNA-seq aligner, it can be used for DNA-seq (like ChIP-seq) alignment.
# This command performs the alignment step, which is a prerequisite for peak calling.
STAR --genomeDir "${GENOME_DIR}" \
     --readFilesIn "${READS_R1}" "${READS_R2}" \
     --runThreadN 8 \
     --outFileNamePrefix "${OUTPUT_PREFIX}" \
     --outSAMtype BAM SortedByCoordinate \
     --outBAMcompression 6

# 3. Peak calling using STAR output with FDR threshold
# STAR itself does not perform peak calling. The description "Peak calling using STAR"
# implies a custom script or a downstream tool that processes STAR's alignment output
# (e.g., Aligned.sortedByCoord.out.bam) and applies peak calling logic with the specified FDR.
# This is a conceptual placeholder for such a custom peak calling step, as STAR's primary function is alignment.

# Example of a conceptual custom peak calling script using STAR's output:
# Assuming 'custom_star_peak_caller.py' is a hypothetical script that takes STAR's BAM output
# and performs peak calling with an FDR threshold.
# python custom_star_peak_caller.py \
#        --input_bam "${OUTPUT_PREFIX}Aligned.sortedByCoord.out.bam" \
#        --fdr "${FDR_THRESHOLD}" \
#        --output_peaks "H3K27me3_peaks_FDR${FDR_THRESHOLD}.bed"

# If a standard ChIP-seq peak caller like MACS2 were used on STAR's output (common practice):
# macs2 callpeak -t "${OUTPUT_PREFIX}Aligned.sortedByCoord.out.bam" \
#                -f BAMPE \
#                -g hs \
#                -n H3K27me3_peaks \
#                --outdir . \
#                -q "${FDR_THRESHOLD}" # -q for FDR (q-value) threshold

View on GitHub

BIS-Seq: DNA sequencing data was aligned to mouse genome (mm9) with the BS Seeker program.

BS Seeker vUnknown GitHub

$ Bash example

# Install BS Seeker (original version, typically downloaded from SourceForge)
# BS Seeker is a Python script and requires Python to be installed.
# It also relies on an external aligner like Bowtie or Bowtie2, which must be installed and in your system's PATH.
# Example manual installation:
# wget https://sourceforge.net/projects/bsseeker/files/BS_Seeker_v1.0.5.tar.gz # Or the latest available version
# tar -xzf BS_Seeker_v1.0.5.tar.gz
# cd BS_Seeker_v1.0.5
# Make sure 'BS_Seeker.py' is executable or run with 'python'

# Define input and output files
input_fastq="dna_sequencing_data.fastq" # Placeholder for input DNA sequencing data (e.g., FASTQ file)
output_bam="aligned_bisulfite_reads.bam" # Placeholder for output aligned reads (BAM file)

# Define reference genome (mm9)
# Download the mouse genome (mm9) FASTA file if not already available.
# A common source is UCSC Genome Browser.
# Example download command:
# wget -P /path/to/genomes/mm9/ http://hgdownload.soe.ucsc.edu/goldenPath/mm9/bigZips/mm9.fa.gz
# gunzip /path/to/genomes/mm9/mm9.fa.gz
mm9_genome_fasta="/path/to/genomes/mm9/mm9.fa"

# Align DNA sequencing data to the mouse genome (mm9) using BS Seeker
# BS Seeker will automatically build a Bowtie/Bowtie2 index for the genome if it doesn't exist.
# Ensure 'BS_Seeker.py' is in your PATH or specify its full path.
python BS_Seeker.py \
    -i "${input_fastq}" \
    -g "${mm9_genome_fasta}" \
    -o "${output_bam}" \
    --aligner=bowtie # Specify the aligner to use (e.g., bowtie or bowtie2)
    # Additional parameters can be added as needed, e.g., -p for number of threads, -m for maximum mismatches, etc.

View on GitHub

Data was reported as the fraction of reads that were methylated, calculated as the ratio of cytosine bases (indicating methylated CpGs) to total reads for each CpG coordinate.

methylDackel (Inferred with models/gemini-2.5-flash) vv0.6.1 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install methylDackel (if not already installed)
# conda install -c bioconda methyldackel

# Define input and output files
# Replace 'path/to/your/reference_genome.fa' with the actual path to your reference genome FASTA file (e.g., hg38.fa)
REFERENCE_GENOME="path/to/your/reference_genome.fa"
INPUT_BAM="sample.bam"
OUTPUT_PREFIX="sample_methylation"

# Extract methylation information for CpG sites from a bisulfite-sequenced BAM file.
# This command will generate two output files:
# 1. ${OUTPUT_PREFIX}_CpG.bedGraph: A bedGraph file containing methylation percentages for each CpG site.
# 2. ${OUTPUT_PREFIX}_CpG.tsv: A tab-separated file with detailed CpG methylation calls.
methylDackel extract \
    -g "${REFERENCE_GENOME}" \
    "${INPUT_BAM}" \
    -o "${OUTPUT_PREFIX}"

View on GitHub

Unmethylated cytosines are ultimately converted to thymine bases during the bisulfite conversion process.

Bismark (Inferred with models/gemini-2.5-flash) v(Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install Bismark (example using conda)
# conda install -c bioconda bismark

# Define variables
# Replace /path/to/bisulfite_indexed_genome/hg38 with the actual path to your Bismark-indexed reference genome directory.
# To create a Bismark-indexed genome, run: bismark_genome_preparation --path_to_bowtie2 /path/to/bowtie2_exec /path/to/reference_fasta_dir
REFERENCE_GENOME_DIR="/path/to/bisulfite_indexed_genome/hg38"
READ1="sample_R1.fastq.gz"
READ2="sample_R2.fastq.gz"
OUTPUT_DIR="bismark_alignment_output"
SAMPLE_NAME="sample_name"

# Create output directory if it doesn't exist
mkdir -p "${OUTPUT_DIR}"

# Align bisulfite-converted paired-end reads using Bismark with Bowtie2
# This command performs alignment and generates a BAM file.
# Further steps would involve methylation extraction using bismark_methylation_extractor.
bismark \
    --bowtie2 \
    --temp_dir "${OUTPUT_DIR}/temp" \
    --output_dir "${OUTPUT_DIR}" \
    --basename "${SAMPLE_NAME}" \
    "${REFERENCE_GENOME_DIR}" \
    -1 "${READ1}" \
    -2 "${READ2}"

View on GitHub

Tools Used

RNA-seq HOMER STAR

Raw Source Text

Illumina Casava-1.8.2 software was used for basecalling.
RNA-Seq: Low quality reads as well as ribosomal and other repeat sequences were filtered out before alignment. RUM software was used for aligning reads to mouse genome mm9 and Refseq transcriptome and for transcript quantitation. Differential expression analysis was carried out using a custom script implementing EdgeR software.
ChIP-Seq: Reads were aligned to mm9 genome using bowtie version 0.12.7 with parameters -v 3 -k 1 -m 1 --best --strata. Redundant reads were discarded before creating bedgraph profiles.
Peak calling was done using HOMER for H3K4me1 and H3K27Ac marks at FDR=0.01 with respect to corresponding Input. Peak calling using STAR was done for H3K27me3 marks at FDR=0.01.
BIS-Seq: DNA sequencing data was aligned to mouse genome (mm9) with the BS Seeker program. Data was reported as the fraction of reads that were methylated, calculated as the ratio of cytosine bases (indicating methylated CpGs) to total reads for each CpG coordinate. Unmethylated cytosines are ultimately converted to thymine bases during the bisulfite conversion process.
Genome_build: mm9
Supplementary_files_format_and_content: Tab delimited file of transcript raw read counts.
Supplementary_files_format_and_content: Tab delimited file of transcript FPKM counts.
Supplementary_files_format_and_content: Peak files are tab-delimited listing genomic coordinates and their respective scores and p-values. Scores represent fold-change with respect to Input.
Supplementary_files_format_and_content: Tab-delimited files of loci, fraction of methylated cytosines and total read count.
Supplementary_files_format_and_content: RawReadCounts.txt: Raw transcript counts
Supplementary_files_format_and_content: TrancriptFPKMCounts.txt: Transcript FPKM counts
Supplementary_files_format_and_content: Old_H3K4me1.peaks.txt: peak
Supplementary_files_format_and_content: Old_H3K27me3.peaks.txt: peak
Supplementary_files_format_and_content: Young_H3K4me1.peaks.txt: peak
Supplementary_files_format_and_content: Young_H3K27me3.peaks.txt: peak
Supplementary_files_format_and_content: Young_H3K27Ac.peaks.txt: peak
Supplementary_files_format_and_content: OldBeta_bs_seeker-CG.tab: Percent methylated cytosines
Supplementary_files_format_and_content: YoungBeta_bs_seeker-CG.tab: Percent methylated cytosines

← Back to Analysis