GSE68618 Processing Pipeline
Publication
Pseudotemporal Ordering of Single Cells Reveals Metabolic Control of Postnatal β Cell Proliferation.Cell metabolism (2017) — PMID 28467932
Dataset
GSE68618Aging-dependent demethylation of regulatory elements correlates with chromatin state and improved insulin secretion by pancreatic β cells
Processing Steps
Generate Jupyter Notebook-
1
Illumina Casava-1.8.2 software was used for basecalling.
$ Bash example
# Illumina Casava 1.8.2 is typically installed as part of the Illumina sequencing system software or as a standalone package. # Installation instructions are highly dependent on the specific Illumina system and OS. # Example (installation path may vary): # wget https://support.illumina.com/content/dam/illumina-support/documents/downloads/software/casava/casava_1_8_2_installer.run # chmod +x casava_1_8_2_installer.run # ./casava_1_8_2_installer.run # Assuming the run folder is named '123456_A00001_0001_AHXXXXXX' and output directory is 'fastq_output' # Replace with actual paths and SampleSheet.csv RUN_FOLDER="/path/to/your/Illumina/RunFolder" # e.g., /data/runs/123456_A00001_0001_AHXXXXXX OUTPUT_DIR="fastq_output" SAMPLE_SHEET="/path/to/SampleSheet.csv" # Required for demultiplexing # Create output directory if it doesn't exist mkdir -p "${OUTPUT_DIR}" # Step 1: Configure Bcl to Fastq conversion and demultiplexing # This command generates a Makefile in the OUTPUT_DIR based on the run folder and sample sheet. # Parameters like --fastq-cluster-count, --mismatches, --no-aligned-reads are common defaults. configureBclToFastq.pl \ --input-dir "${RUN_FOLDER}" \ --output-dir "${OUTPUT_DIR}" \ --sample-sheet "${SAMPLE_SHEET}" \ --fastq-cluster-count 0 \ --mismatches 1 \ --no-aligned-reads # Step 2: Execute the Makefile to perform basecalling and demultiplexing # This will run the actual conversion of BCL files to FASTQ files. make -C "${OUTPUT_DIR}" -
2
RNA-Seq: Low quality reads as well as ribosomal and other repeat sequences were filtered out before alignment.
$ Bash example
# Install BBMap suite (contains BBDuk) # conda install -c bioconda bbmap # Define input and output files INPUT_FASTQ="input_reads.fastq.gz" OUTPUT_FASTQ="filtered_reads.fastq.gz" # Define reference files for ribosomal and repeat sequences. # These files need to be prepared or downloaded. Examples: # - human_rRNA.fa: Can be obtained from NCBI RefSeq or specialized rRNA databases (e.g., SILVA). # - human_repeats.fa: Can be obtained from RepeatMasker database or UCSC Genome Browser. # For demonstration, we assume these files exist in the current directory or are specified by their full path. # Placeholder reference files (replace with actual paths to your rRNA and repeat FASTA files) RR_REFERENCE="path/to/human_rRNA.fa,path/to/human_repeats.fa" # Run BBDuk to filter low quality reads and ribosomal/repeat sequences bbduk.sh \ in="${INPUT_FASTQ}" \ out="${OUTPUT_FASTQ}" \ ref="${RR_REFERENCE}" \ k=31 hdist=1 \ qtrim=rl trimq=20 \ minlen=25 \ stats="${OUTPUT_FASTQ%.fastq.gz}_bbduk_stats.txt" \ overwrite=true -
3
RUM software was used for aligning reads to mouse genome mm9 and Refseq transcriptome and for transcript quantitation.
RefSeq vNot specified$ Bash example
# Install RUM (RNA-seq Unified Mapper) - Example, actual installation may vary. # RUM is typically downloaded as a tarball and run via Perl scripts. # Example: # wget http://cgr.wustl.edu/rum/RUM_v1.11.tar.gz # tar -xzf RUM_v1.11.tar.gz # export PATH=$PATH:$(pwd)/RUM_v1.11/bin # Define input and output paths READ1="input_reads_R1.fastq.gz" READ2="input_reads_R2.fastq.gz" # Assuming paired-end reads for typical RNA-seq OUTPUT_DIR="rum_output" GENOME_FASTA="mm9.fa" TRANSCRIPTOME_FASTA="RefSeq_mm9_transcripts.fa" # RUM typically uses transcriptome FASTA for indexing RUM_INDEX_DIR="rum_index_mm9_refseq" # Create dummy input files and directories for demonstration mkdir -p "${OUTPUT_DIR}" mkdir -p "${RUM_INDEX_DIR}" # touch "${READ1}" "${READ2}" # In a real scenario, these would be actual read files # Placeholder for reference genome and transcriptome files # Download mm9 genome from UCSC (example source) # wget -P . https://hgdownload.soe.ucsc.edu/goldenPath/mm9/bigZips/chromFa.tar.gz # tar -xzf chromFa.tar.gz # cat chr*.fa > "${GENOME_FASTA}" # rm chr*.fa chromFa.tar.gz # Download RefSeq transcriptome for mm9 (example source, actual file might need generation from annotations) # This is a placeholder; actual RefSeq transcriptome for mm9 might need to be generated # from RefSeq annotations (e.g., GTF/GFF) and the genome FASTA. RUM expects a FASTA file of transcript sequences. # Example (general mouse RefSeq mRNA, not specific to mm9 assembly): # wget -P . https://ftp.ncbi.nlm.nih.gov/refseq/M_musculus/mRNA_FASTA/mouse.1.rna.fna.gz # gunzip mouse.1.rna.fna.gz # mv mouse.1.rna.fna "${TRANSCRIPTOME_FASTA}" # Create a minimal RUM configuration file (rum_config.txt) # This file specifies paths to reference files and other parameters required by RUM. cat <<EOF > rum_config.txt # RUM Configuration File # General settings RUM_HOME=$(pwd)/RUM_v1.11 # Adjust if RUM is installed elsewhere OUTPUT_DIR=${OUTPUT_DIR} INDEX_DIR=${RUM_INDEX_DIR} GENOME_FASTA=${GENOME_FASTA} TRANSCRIPTOME_FASTA=${TRANSCRIPTOME_FASTA} # Other parameters can be added here, e.g., number of threads, mismatch limits, etc. # For example: # THREADS=8 # MAX_MISMATCHES=2 EOF # Step 1: Build RUM index for mm9 genome and Refseq transcriptome # This step needs to be run once for the given reference files. # rum_index_builder.pl -c rum_config.txt # Note: The above command assumes rum_index_builder.pl is in PATH or RUM_HOME/bin. # For demonstration, we'll assume the index is already built or the command is run separately. # Step 2: Run RUM for alignment and transcript quantitation # This command aligns reads to the mm9 genome and Refseq transcriptome, # and performs transcript quantitation. rum_runner.pl -c rum_config.txt -1 "${READ1}" -2 "${READ2}" -o "${OUTPUT_DIR}" -
4
Differential expression analysis was carried out using a custom script implementing EdgeR software.
$ Bash example
# Install R and Bioconductor if not already present # conda install -c conda-forge r-base # conda install -c bioconda bioconductor-edger # Create a placeholder R script for differential expression analysis using edgeR # This script assumes 'counts.tsv' contains raw gene counts and 'sample_info.tsv' contains sample metadata. cat << 'EOF' > custom_edgeR_script.R #!/usr/bin/env Rscript # Load necessary libraries library(edgeR) library(getopt) # For parsing command-line arguments # Define command-line arguments spec <- matrix(c( 'counts', 'c', 1, "character", "Path to the count matrix (TSV)", 'samples', 's', 1, "character", "Path to the sample information file (TSV)", 'output', 'o', 1, "character", "Path for the output differential expression results (TSV)", 'design_formula', 'd', 1, "character", "R formula for the design matrix (e.g., '~ condition')", 'contrast_coef', 't', 1, "character", "Coefficient or contrast string for testing (e.g., 'conditiontreated' or 'conditiontreated - conditionuntreated')", 'help', 'h', 0, "logical", "Show this help message" ), byrow=TRUE, ncol=5) opt <- getopt(spec) # If help was asked for, print a friendly message if ( !is.null(opt$help) ) { cat(getopt(spec, usage=TRUE)); q(status=1); } # Check for required arguments if ( is.null(opt$counts) || is.null(opt$samples) || is.null(opt$output) || is.null(opt$design_formula) || is.null(opt$contrast_coef) ) { cat("Missing required arguments.\n"); cat(getopt(spec, usage=TRUE)); q(status=1); } counts_file <- opt$counts samples_file <- opt$samples output_file <- opt$output design_formula_str <- opt$design_formula contrast_coef_str <- opt$contrast_coef message(paste("Loading counts from:", counts_file)) message(paste("Loading sample info from:", samples_file)) message(paste("Using design formula:", design_formula_str)) message(paste("Using contrast coefficient/string:", contrast_coef_str)) # Load count data counts <- read.delim(counts_file, row.names = 1, check.names = FALSE) # Load sample information sample_info <- read.delim(samples_file, row.names = 1, check.names = FALSE) # Ensure sample order matches between counts and sample_info sample_info <- sample_info[colnames(counts), , drop = FALSE] # Create DGEList object dge <- DGEList(counts = counts) # Filter out lowly expressed genes (optional, but good practice) # Assuming 'condition' is a column in sample_info for filtering example if ("condition" %in% colnames(sample_info)) { keep <- filterByExpr(dge, group = sample_info$condition) } else { keep <- filterByExpr(dge) } dge <- dge[keep, , keep.lib.sizes = FALSE] # Normalize library sizes dge <- calcNormFactors(dge) # Create design matrix design <- model.matrix(as.formula(design_formula_str), data = sample_info) # Estimate dispersion dge <- estimateDisp(dge, design) # Fit GLM fit <- glmQLFit(dge, design) # Perform differential expression testing # Handle contrast: if it contains '-' or '+', treat as a contrast string for makeContrasts # otherwise, treat as a coefficient name. if (grepl("-", contrast_coef_str) || grepl("\\+", contrast_coef_str)) { message(paste("Interpreting contrast as a comparison string:", contrast_coef_str)) contrast_matrix <- makeContrasts(contrasts = contrast_coef_str, levels = design) lrt <- glmQLFTest(fit, contrast = contrast_matrix) } else { message(paste("Interpreting contrast as a single coefficient name:", contrast_coef_str)) lrt <- glmQLFTest(fit, coef = contrast_coef_str) } # Get results results <- topTags(lrt, n = Inf, sort.by = "PValue")$table # Add adjusted p-value results$FDR <- p.adjust(results$PValue, method = "BH") # Write results to file write.table(results, file = output_file, sep = "\t", quote = FALSE, row.names = TRUE) message(paste("Differential expression results written to:", output_file)) EOF # Create dummy input files for demonstration # Dummy counts.tsv cat << 'EOF' > counts.tsv GeneID Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 GeneA 100 120 110 200 210 190 GeneB 50 60 55 100 110 95 GeneC 200 210 190 100 120 110 GeneD 10 12 11 20 22 19 GeneE 5 6 5 10 11 9 EOF # Dummy sample_info.tsv cat << 'EOF' > sample_info.tsv SampleID condition batch Sample1 untreated batch1 Sample2 untreated batch1 Sample3 untreated batch2 Sample4 treated batch2 Sample5 treated batch3 Sample6 treated batch3 EOF # Execute the custom edgeR script # Example 1: Differential expression between 'treated' and 'untreated' conditions, adjusting for 'batch' # Here, 'conditiontreated' tests the effect of 'treated' vs the reference level of 'condition' (e.g., 'untreated') Rscript custom_edgeR_script.R \ --counts counts.tsv \ --samples sample_info.tsv \ --output differential_expression_results_1.tsv \ --design_formula "~batch + condition" \ --contrast_coef "conditiontreated" # Example 2: If using a design like '~0 + condition' to explicitly model each condition group, # you can specify a direct contrast between groups. Rscript custom_edgeR_script.R \ --counts counts.tsv \ --samples sample_info.tsv \ --output differential_expression_results_2.tsv \ --design_formula "~0 + condition + batch" \ --contrast_coef "conditiontreated - conditionuntreated" # Clean up dummy files (optional) # rm counts.tsv sample_info.tsv custom_edgeR_script.R -
5
ChIP-Seq: Reads were aligned to mm9 genome using bowtie version 0.12.7 with parameters -v 3 -k 1 -m 1 --best --strata.
$ Bash example
# Install Bowtie (example using conda) # conda install -c bioconda bowtie=0.12.7 # Define reference genome index path (placeholder) MM9_INDEX="/path/to/mm9_index" # Replace with actual path to mm9 Bowtie index files (e.g., mm9) # Define input reads file (placeholder) INPUT_READS="reads.fastq" # Replace with actual input FASTQ file # Define output SAM file (placeholder) OUTPUT_SAM="output.sam" # Replace with desired output SAM file name # Align reads to mm9 genome using bowtie bowtie -v 3 -k 1 -m 1 --best --strata "${MM9_INDEX}" "${INPUT_READS}" > "${OUTPUT_SAM}" -
6
Redundant reads were discarded before creating bedgraph profiles.
$ Bash example
# Install sambamba if not already installed # conda install -c bioconda sambamba # Assuming 'aligned_reads.bam' is the input sorted BAM file after alignment. # The '-r' flag ensures that duplicate reads are removed, not just marked. # The '-p' flag specifies the number of threads to use for parallel processing. # Replace 'aligned_reads.bam' with your actual input file and 'deduplicated_reads.bam' with your desired output file name. # Adjust the number of threads (e.g., 8) based on your system's resources. sambamba markdup -r -p 8 aligned_reads.bam deduplicated_reads.bam
-
7
Peak calling was done using HOMER for H3K4me1 and H3K27Ac marks at FDR=0.01 with respect to corresponding Input.
$ Bash example
# Define input BAM files and output directories (placeholders) # Replace with actual paths to your aligned BAM files H3K4ME1_BAM="h3k4me1_rep1.bam" H3K27AC_BAM="h3k27ac_rep1.bam" INPUT_BAM="input_rep1.bam" # Define output tag directories and peak files H3K4ME1_TAG_DIR="h3k4me1_tag_dir" H3K27AC_TAG_DIR="h3k27ac_tag_dir" INPUT_TAG_DIR="input_tag_dir" H3K4ME1_PEAKS_OUTPUT="h3k4me1_peaks.txt" H3K27AC_PEAKS_OUTPUT="h3k27ac_peaks.txt" # Install HOMER (example using conda) # conda create -n homer_env homer # conda activate homer_env # Make tag directories for each sample # Note: HOMER's makeTagDirectory can take a genome for normalization, but it's not strictly required for basic tag directory creation. # If you have a specific genome, you might add -genome <genome_name> (e.g., -genome hg38) makeTagDirectory "${H3K4ME1_TAG_DIR}" "${H3K4ME1_BAM}" makeTagDirectory "${H3K27AC_TAG_DIR}" "${H3K27AC_BAM}" makeTagDirectory "${INPUT_TAG_DIR}" "${INPUT_BAM}" # Perform peak calling for H3K4me1 # -style histone is appropriate for histone marks # -FDR 0.01 is specified in the description # -i specifies the input/control tag directory findPeaks "${H3K4ME1_TAG_DIR}" \ -o "${H3K4ME1_PEAKS_OUTPUT}" \ -i "${INPUT_TAG_DIR}" \ -style histone \ -FDR 0.01 # Perform peak calling for H3K27Ac findPeaks "${H3K27AC_TAG_DIR}" \ -o "${H3K27AC_PEAKS_OUTPUT}" \ -i "${INPUT_TAG_DIR}" \ -style histone \ -FDR 0.01 -
8
Peak calling using STAR was done for H3K27me3 marks at FDR=0.01.
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star # Define variables GENOME_DIR="/path/to/STAR_genome_index/hg38" # Placeholder for human genome hg38 index GENOME_FASTA="/path/to/hg38.fa" # Placeholder for human genome hg38 fasta READS_R1="sample_H3K27me3_R1.fastq.gz" READS_R2="sample_H3K27me3_R2.fastq.gz" # Assuming paired-end reads; adjust if single-end OUTPUT_PREFIX="H3K27me3_aligned_STAR_" FDR_THRESHOLD="0.01" # 1. Build STAR genome index (if not already built) # This step is typically done once per genome. Replace paths with actual genome files. # mkdir -p "${GENOME_DIR}" # STAR --runMode genomeGenerate \ # --genomeDir "${GENOME_DIR}" \ # --genomeFastaFiles "${GENOME_FASTA}" \ # --runThreadN 8 # Adjust thread count as needed # 2. Align reads using STAR # While STAR is primarily an RNA-seq aligner, it can be used for DNA-seq (like ChIP-seq) alignment. # This command performs the alignment step, which is a prerequisite for peak calling. STAR --genomeDir "${GENOME_DIR}" \ --readFilesIn "${READS_R1}" "${READS_R2}" \ --runThreadN 8 \ --outFileNamePrefix "${OUTPUT_PREFIX}" \ --outSAMtype BAM SortedByCoordinate \ --outBAMcompression 6 # 3. Peak calling using STAR output with FDR threshold # STAR itself does not perform peak calling. The description "Peak calling using STAR" # implies a custom script or a downstream tool that processes STAR's alignment output # (e.g., Aligned.sortedByCoord.out.bam) and applies peak calling logic with the specified FDR. # This is a conceptual placeholder for such a custom peak calling step, as STAR's primary function is alignment. # Example of a conceptual custom peak calling script using STAR's output: # Assuming 'custom_star_peak_caller.py' is a hypothetical script that takes STAR's BAM output # and performs peak calling with an FDR threshold. # python custom_star_peak_caller.py \ # --input_bam "${OUTPUT_PREFIX}Aligned.sortedByCoord.out.bam" \ # --fdr "${FDR_THRESHOLD}" \ # --output_peaks "H3K27me3_peaks_FDR${FDR_THRESHOLD}.bed" # If a standard ChIP-seq peak caller like MACS2 were used on STAR's output (common practice): # macs2 callpeak -t "${OUTPUT_PREFIX}Aligned.sortedByCoord.out.bam" \ # -f BAMPE \ # -g hs \ # -n H3K27me3_peaks \ # --outdir . \ # -q "${FDR_THRESHOLD}" # -q for FDR (q-value) threshold -
9
BIS-Seq: DNA sequencing data was aligned to mouse genome (mm9) with the BS Seeker program.
$ Bash example
# Install BS Seeker (original version, typically downloaded from SourceForge) # BS Seeker is a Python script and requires Python to be installed. # It also relies on an external aligner like Bowtie or Bowtie2, which must be installed and in your system's PATH. # Example manual installation: # wget https://sourceforge.net/projects/bsseeker/files/BS_Seeker_v1.0.5.tar.gz # Or the latest available version # tar -xzf BS_Seeker_v1.0.5.tar.gz # cd BS_Seeker_v1.0.5 # Make sure 'BS_Seeker.py' is executable or run with 'python' # Define input and output files input_fastq="dna_sequencing_data.fastq" # Placeholder for input DNA sequencing data (e.g., FASTQ file) output_bam="aligned_bisulfite_reads.bam" # Placeholder for output aligned reads (BAM file) # Define reference genome (mm9) # Download the mouse genome (mm9) FASTA file if not already available. # A common source is UCSC Genome Browser. # Example download command: # wget -P /path/to/genomes/mm9/ http://hgdownload.soe.ucsc.edu/goldenPath/mm9/bigZips/mm9.fa.gz # gunzip /path/to/genomes/mm9/mm9.fa.gz mm9_genome_fasta="/path/to/genomes/mm9/mm9.fa" # Align DNA sequencing data to the mouse genome (mm9) using BS Seeker # BS Seeker will automatically build a Bowtie/Bowtie2 index for the genome if it doesn't exist. # Ensure 'BS_Seeker.py' is in your PATH or specify its full path. python BS_Seeker.py \ -i "${input_fastq}" \ -g "${mm9_genome_fasta}" \ -o "${output_bam}" \ --aligner=bowtie # Specify the aligner to use (e.g., bowtie or bowtie2) # Additional parameters can be added as needed, e.g., -p for number of threads, -m for maximum mismatches, etc. -
10
Data was reported as the fraction of reads that were methylated, calculated as the ratio of cytosine bases (indicating methylated CpGs) to total reads for each CpG coordinate.
methylDackel (Inferred with models/gemini-2.5-flash) vv0.6.1 (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install methylDackel (if not already installed) # conda install -c bioconda methyldackel # Define input and output files # Replace 'path/to/your/reference_genome.fa' with the actual path to your reference genome FASTA file (e.g., hg38.fa) REFERENCE_GENOME="path/to/your/reference_genome.fa" INPUT_BAM="sample.bam" OUTPUT_PREFIX="sample_methylation" # Extract methylation information for CpG sites from a bisulfite-sequenced BAM file. # This command will generate two output files: # 1. ${OUTPUT_PREFIX}_CpG.bedGraph: A bedGraph file containing methylation percentages for each CpG site. # 2. ${OUTPUT_PREFIX}_CpG.tsv: A tab-separated file with detailed CpG methylation calls. methylDackel extract \ -g "${REFERENCE_GENOME}" \ "${INPUT_BAM}" \ -o "${OUTPUT_PREFIX}" -
11
Unmethylated cytosines are ultimately converted to thymine bases during the bisulfite conversion process.
$ Bash example
# Install Bismark (example using conda) # conda install -c bioconda bismark # Define variables # Replace /path/to/bisulfite_indexed_genome/hg38 with the actual path to your Bismark-indexed reference genome directory. # To create a Bismark-indexed genome, run: bismark_genome_preparation --path_to_bowtie2 /path/to/bowtie2_exec /path/to/reference_fasta_dir REFERENCE_GENOME_DIR="/path/to/bisulfite_indexed_genome/hg38" READ1="sample_R1.fastq.gz" READ2="sample_R2.fastq.gz" OUTPUT_DIR="bismark_alignment_output" SAMPLE_NAME="sample_name" # Create output directory if it doesn't exist mkdir -p "${OUTPUT_DIR}" # Align bisulfite-converted paired-end reads using Bismark with Bowtie2 # This command performs alignment and generates a BAM file. # Further steps would involve methylation extraction using bismark_methylation_extractor. bismark \ --bowtie2 \ --temp_dir "${OUTPUT_DIR}/temp" \ --output_dir "${OUTPUT_DIR}" \ --basename "${SAMPLE_NAME}" \ "${REFERENCE_GENOME_DIR}" \ -1 "${READ1}" \ -2 "${READ2}"
Raw Source Text
Illumina Casava-1.8.2 software was used for basecalling. RNA-Seq: Low quality reads as well as ribosomal and other repeat sequences were filtered out before alignment. RUM software was used for aligning reads to mouse genome mm9 and Refseq transcriptome and for transcript quantitation. Differential expression analysis was carried out using a custom script implementing EdgeR software. ChIP-Seq: Reads were aligned to mm9 genome using bowtie version 0.12.7 with parameters -v 3 -k 1 -m 1 --best --strata. Redundant reads were discarded before creating bedgraph profiles. Peak calling was done using HOMER for H3K4me1 and H3K27Ac marks at FDR=0.01 with respect to corresponding Input. Peak calling using STAR was done for H3K27me3 marks at FDR=0.01. BIS-Seq: DNA sequencing data was aligned to mouse genome (mm9) with the BS Seeker program. Data was reported as the fraction of reads that were methylated, calculated as the ratio of cytosine bases (indicating methylated CpGs) to total reads for each CpG coordinate. Unmethylated cytosines are ultimately converted to thymine bases during the bisulfite conversion process. Genome_build: mm9 Supplementary_files_format_and_content: Tab delimited file of transcript raw read counts. Supplementary_files_format_and_content: Tab delimited file of transcript FPKM counts. Supplementary_files_format_and_content: Peak files are tab-delimited listing genomic coordinates and their respective scores and p-values. Scores represent fold-change with respect to Input. Supplementary_files_format_and_content: Tab-delimited files of loci, fraction of methylated cytosines and total read count. Supplementary_files_format_and_content: RawReadCounts.txt: Raw transcript counts Supplementary_files_format_and_content: TrancriptFPKMCounts.txt: Transcript FPKM counts Supplementary_files_format_and_content: Old_H3K4me1.peaks.txt: peak Supplementary_files_format_and_content: Old_H3K27me3.peaks.txt: peak Supplementary_files_format_and_content: Young_H3K4me1.peaks.txt: peak Supplementary_files_format_and_content: Young_H3K27me3.peaks.txt: peak Supplementary_files_format_and_content: Young_H3K27Ac.peaks.txt: peak Supplementary_files_format_and_content: OldBeta_bs_seeker-CG.tab: Percent methylated cytosines Supplementary_files_format_and_content: YoungBeta_bs_seeker-CG.tab: Percent methylated cytosines