GSE112439 Processing Pipeline

Bisulfite-Seq code_examples 4 steps

Publication

The RNA Helicase DDX6 Controls Cellular Plasticity by Modulating P-Body Homeostasis.

Cell stem cell (2019) — PMID 31588046

Dataset

GSE112439

The RNA helicase DDX6 regulates self-renewal and differentiation of human and mouse stem cells [RRBS]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Preprocessing: 10bp were trimmed from the beginning of each read to remove possible adapter contamination

    cutadapt (Inferred with models/gemini-2.5-flash) v1.18 GitHub
    $ Bash example
    # Install cutadapt (if not already installed)
    # conda install -c bioconda cutadapt=1.18
    
    # Define input and output file paths
    INPUT_FASTQ="input.fastq.gz"
    OUTPUT_FASTQ="output.fastq.gz"
    
    # Trim 10bp from the beginning (5' end) of each read
    cutadapt -u 10 -o "${OUTPUT_FASTQ}" "${INPUT_FASTQ}"
  2. 2

    Alignment: Reads were aligned using Bsmap with the following flags: -v 0.05 -s 16 -w 100 -S 1 -p 8 -

    Bsmap
    $ Bash example
    # Install Bsmap (example using conda)
    # conda install -c bioconda bsmap
    
    # Placeholder for reference genome and input reads
    # Replace 'path/to/human_hg38.fa' with the actual path to your reference genome (e.g., hg38 for human).
    # Replace 'path/to/reads_R1.fastq.gz' and 'path/to/reads_R2.fastq.gz' with your actual paired-end input read files.
    # Bsmap is typically used for bisulfite sequencing data.
    REFERENCE_GENOME="path/to/human_hg38.fa"
    READS_R1="path/to/reads_R1.fastq.gz"
    READS_R2="path/to/reads_R2.fastq.gz"
    OUTPUT_BAM="aligned.bam"
    
    # Align reads using Bsmap with specified flags
    bsmap -v 0.05 -s 16 -w 100 -S 1 -p 8 -d "${REFERENCE_GENOME}" -o "${OUTPUT_BAM}" "${READS_R1}" "${READS_R2}"
  3. 3

    Methylation calling: CpGs in the reference sequence were compared to the sequence in the aligned read.

    methylDackel (Inferred with models/gemini-2.5-flash) v0.9.1
    $ Bash example
    # Install methylDackel (example using conda)
    # conda install -c bioconda methyldackel
    
    # Placeholder for reference genome (e.g., human hg38)
    # Download hg38 reference genome if not available
    # wget -O hg38.fa.gz http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
    # gunzip hg38.fa.gz
    # samtools faidx hg38.fa # Index the reference for methylDackel
    
    REFERENCE_GENOME="hg38.fa"
    ALIGNED_READS="aligned_reads.bam"
    OUTPUT_PREFIX="methylation_calls"
    
    # Extract methylation information from aligned reads
    # This command will generate a .bedGraph file for CpG methylation
    methylDackel extract \
        --output ${OUTPUT_PREFIX} \
        ${REFERENCE_GENOME} \
        ${ALIGNED_READS}
  4. 4

    If bisulfite conversion (signifying an unmethylated cytosine) was detected, the read added to the unmethylated count for that CpG, otherwise, it added to the methylated count.

    MethylDackel (Inferred with models/gemini-2.5-flash) v0.6.0 GitHub
    $ Bash example
    # Install MethylDackel (e.g., via conda)
    # conda install -c bioconda methyldackel
    
    # Define variables
    REFERENCE_GENOME="/path/to/GRCh38.fa" # Placeholder for reference genome (e.g., GRCh38)
    INPUT_BAM="aligned_bisulfite_reads.bam" # Input BAM file containing bisulfite-converted aligned reads
    OUTPUT_PREFIX="methylation_calls" # Prefix for output files
    
    # Execute MethylDackel to extract methylation calls at CpG sites.
    # This command processes bisulfite-converted reads from the BAM file
    # and outputs per-CpG methylation information in a methylKit-compatible format.
    # It counts reads as methylated or unmethylated based on bisulfite conversion status
    # (C to T conversion indicates unmethylated, C remaining C indicates methylated).
    MethylDackel extract \
        --methylKit \
        -o "${OUTPUT_PREFIX}" \
        "${REFERENCE_GENOME}" \
        "${INPUT_BAM}"
Raw Source Text
Preprocessing: 10bp were trimmed from the beginning of each read to remove possible adapter contamination
Alignment: Reads were aligned using Bsmap with the following flags: -v 0.05 -s 16 -w 100 -S 1 -p 8 -
Methylation calling: CpGs in the reference sequence were compared to the sequence in the aligned read. If bisulfite conversion (signifying an unmethylated cytosine) was detected, the read added to the unmethylated count for that CpG, otherwise, it added to the methylated count.
Genome_build: Homo sapiens UCSC hg19
Supplementary_files_format_and_content: Bed files contain the number of methylated and total reads for CpGs covered by RRBS with the columns: chromosome, start, end, number of reads methylated divided by number of reads seen, methylation percent*1000
← Back to Analysis