GSE68618 Processing Pipeline

ChIP-Seq code_examples 11 steps

Publication

Pseudotemporal Ordering of Single Cells Reveals Metabolic Control of Postnatal β Cell Proliferation.

Cell metabolism (2017) — PMID 28467932

Dataset

GSE68618

Aging-dependent demethylation of regulatory elements correlates with chromatin state and improved insulin secretion by pancreatic β cells

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Illumina Casava-1.8.2 software was used for basecalling.

    Illumina Casava v1.8.2 GitHub
    $ Bash example
    # Illumina Casava 1.8.2 is typically installed as part of the Illumina sequencing system software or as a standalone package.
    # Installation instructions are highly dependent on the specific Illumina system and OS.
    # Example (installation path may vary):
    # wget https://support.illumina.com/content/dam/illumina-support/documents/downloads/software/casava/casava_1_8_2_installer.run
    # chmod +x casava_1_8_2_installer.run
    # ./casava_1_8_2_installer.run
    
    # Assuming the run folder is named '123456_A00001_0001_AHXXXXXX' and output directory is 'fastq_output'
    # Replace with actual paths and SampleSheet.csv
    RUN_FOLDER="/path/to/your/Illumina/RunFolder" # e.g., /data/runs/123456_A00001_0001_AHXXXXXX
    OUTPUT_DIR="fastq_output"
    SAMPLE_SHEET="/path/to/SampleSheet.csv" # Required for demultiplexing
    
    # Create output directory if it doesn't exist
    mkdir -p "${OUTPUT_DIR}"
    
    # Step 1: Configure Bcl to Fastq conversion and demultiplexing
    # This command generates a Makefile in the OUTPUT_DIR based on the run folder and sample sheet.
    # Parameters like --fastq-cluster-count, --mismatches, --no-aligned-reads are common defaults.
    configureBclToFastq.pl \
        --input-dir "${RUN_FOLDER}" \
        --output-dir "${OUTPUT_DIR}" \
        --sample-sheet "${SAMPLE_SHEET}" \
        --fastq-cluster-count 0 \
        --mismatches 1 \
        --no-aligned-reads
    
    # Step 2: Execute the Makefile to perform basecalling and demultiplexing
    # This will run the actual conversion of BCL files to FASTQ files.
    make -C "${OUTPUT_DIR}"
  2. 2

    RNA-Seq: Low quality reads as well as ribosomal and other repeat sequences were filtered out before alignment.

    RNA-seq vlatest stable GitHub
    $ Bash example
    # Install BBMap suite (contains BBDuk)
    # conda install -c bioconda bbmap
    
    # Define input and output files
    INPUT_FASTQ="input_reads.fastq.gz"
    OUTPUT_FASTQ="filtered_reads.fastq.gz"
    
    # Define reference files for ribosomal and repeat sequences.
    # These files need to be prepared or downloaded. Examples:
    # - human_rRNA.fa: Can be obtained from NCBI RefSeq or specialized rRNA databases (e.g., SILVA).
    # - human_repeats.fa: Can be obtained from RepeatMasker database or UCSC Genome Browser.
    # For demonstration, we assume these files exist in the current directory or are specified by their full path.
    
    # Placeholder reference files (replace with actual paths to your rRNA and repeat FASTA files)
    RR_REFERENCE="path/to/human_rRNA.fa,path/to/human_repeats.fa"
    
    # Run BBDuk to filter low quality reads and ribosomal/repeat sequences
    bbduk.sh \
      in="${INPUT_FASTQ}" \
      out="${OUTPUT_FASTQ}" \
      ref="${RR_REFERENCE}" \
      k=31 hdist=1 \
      qtrim=rl trimq=20 \
      minlen=25 \
      stats="${OUTPUT_FASTQ%.fastq.gz}_bbduk_stats.txt" \
      overwrite=true
    
  3. 3

    RUM software was used for aligning reads to mouse genome mm9 and Refseq transcriptome and for transcript quantitation.

    RefSeq vNot specified
    $ Bash example
    # Install RUM (RNA-seq Unified Mapper) - Example, actual installation may vary.
    # RUM is typically downloaded as a tarball and run via Perl scripts.
    # Example:
    # wget http://cgr.wustl.edu/rum/RUM_v1.11.tar.gz
    # tar -xzf RUM_v1.11.tar.gz
    # export PATH=$PATH:$(pwd)/RUM_v1.11/bin
    
    # Define input and output paths
    READ1="input_reads_R1.fastq.gz"
    READ2="input_reads_R2.fastq.gz" # Assuming paired-end reads for typical RNA-seq
    OUTPUT_DIR="rum_output"
    GENOME_FASTA="mm9.fa"
    TRANSCRIPTOME_FASTA="RefSeq_mm9_transcripts.fa" # RUM typically uses transcriptome FASTA for indexing
    RUM_INDEX_DIR="rum_index_mm9_refseq"
    
    # Create dummy input files and directories for demonstration
    mkdir -p "${OUTPUT_DIR}"
    mkdir -p "${RUM_INDEX_DIR}"
    # touch "${READ1}" "${READ2}" # In a real scenario, these would be actual read files
    
    # Placeholder for reference genome and transcriptome files
    # Download mm9 genome from UCSC (example source)
    # wget -P . https://hgdownload.soe.ucsc.edu/goldenPath/mm9/bigZips/chromFa.tar.gz
    # tar -xzf chromFa.tar.gz
    # cat chr*.fa > "${GENOME_FASTA}"
    # rm chr*.fa chromFa.tar.gz
    
    # Download RefSeq transcriptome for mm9 (example source, actual file might need generation from annotations)
    # This is a placeholder; actual RefSeq transcriptome for mm9 might need to be generated
    # from RefSeq annotations (e.g., GTF/GFF) and the genome FASTA. RUM expects a FASTA file of transcript sequences.
    # Example (general mouse RefSeq mRNA, not specific to mm9 assembly):
    # wget -P . https://ftp.ncbi.nlm.nih.gov/refseq/M_musculus/mRNA_FASTA/mouse.1.rna.fna.gz
    # gunzip mouse.1.rna.fna.gz
    # mv mouse.1.rna.fna "${TRANSCRIPTOME_FASTA}"
    
    # Create a minimal RUM configuration file (rum_config.txt)
    # This file specifies paths to reference files and other parameters required by RUM.
    cat <<EOF > rum_config.txt
    # RUM Configuration File
    # General settings
    RUM_HOME=$(pwd)/RUM_v1.11 # Adjust if RUM is installed elsewhere
    OUTPUT_DIR=${OUTPUT_DIR}
    INDEX_DIR=${RUM_INDEX_DIR}
    GENOME_FASTA=${GENOME_FASTA}
    TRANSCRIPTOME_FASTA=${TRANSCRIPTOME_FASTA}
    # Other parameters can be added here, e.g., number of threads, mismatch limits, etc.
    # For example:
    # THREADS=8
    # MAX_MISMATCHES=2
    EOF
    
    # Step 1: Build RUM index for mm9 genome and Refseq transcriptome
    # This step needs to be run once for the given reference files.
    # rum_index_builder.pl -c rum_config.txt
    # Note: The above command assumes rum_index_builder.pl is in PATH or RUM_HOME/bin.
    # For demonstration, we'll assume the index is already built or the command is run separately.
    
    # Step 2: Run RUM for alignment and transcript quantitation
    # This command aligns reads to the mm9 genome and Refseq transcriptome,
    # and performs transcript quantitation.
    rum_runner.pl -c rum_config.txt -1 "${READ1}" -2 "${READ2}" -o "${OUTPUT_DIR}"
    
  4. 4

    Differential expression analysis was carried out using a custom script implementing EdgeR software.

    edgeR v3.42.0 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install R and Bioconductor if not already present
    # conda install -c conda-forge r-base
    # conda install -c bioconda bioconductor-edger
    
    # Create a placeholder R script for differential expression analysis using edgeR
    # This script assumes 'counts.tsv' contains raw gene counts and 'sample_info.tsv' contains sample metadata.
    cat << 'EOF' > custom_edgeR_script.R
    #!/usr/bin/env Rscript
    
    # Load necessary libraries
    library(edgeR)
    library(getopt) # For parsing command-line arguments
    
    # Define command-line arguments
    spec <- matrix(c(
      'counts',    'c', 1, "character", "Path to the count matrix (TSV)",
      'samples',   's', 1, "character", "Path to the sample information file (TSV)",
      'output',    'o', 1, "character", "Path for the output differential expression results (TSV)",
      'design_formula', 'd', 1, "character", "R formula for the design matrix (e.g., '~ condition')",
      'contrast_coef', 't', 1, "character", "Coefficient or contrast string for testing (e.g., 'conditiontreated' or 'conditiontreated - conditionuntreated')",
      'help',      'h', 0, "logical",   "Show this help message"
    ), byrow=TRUE, ncol=5)
    
    opt <- getopt(spec)
    
    # If help was asked for, print a friendly message
    if ( !is.null(opt$help) ) {
      cat(getopt(spec, usage=TRUE));
      q(status=1);
    }
    
    # Check for required arguments
    if ( is.null(opt$counts) || is.null(opt$samples) || is.null(opt$output) || is.null(opt$design_formula) || is.null(opt$contrast_coef) ) {
      cat("Missing required arguments.\n");
      cat(getopt(spec, usage=TRUE));
      q(status=1);
    }
    
    counts_file <- opt$counts
    samples_file <- opt$samples
    output_file <- opt$output
    design_formula_str <- opt$design_formula
    contrast_coef_str <- opt$contrast_coef
    
    message(paste("Loading counts from:", counts_file))
    message(paste("Loading sample info from:", samples_file))
    message(paste("Using design formula:", design_formula_str))
    message(paste("Using contrast coefficient/string:", contrast_coef_str))
    
    # Load count data
    counts <- read.delim(counts_file, row.names = 1, check.names = FALSE)
    
    # Load sample information
    sample_info <- read.delim(samples_file, row.names = 1, check.names = FALSE)
    
    # Ensure sample order matches between counts and sample_info
    sample_info <- sample_info[colnames(counts), , drop = FALSE]
    
    # Create DGEList object
    dge <- DGEList(counts = counts)
    
    # Filter out lowly expressed genes (optional, but good practice)
    # Assuming 'condition' is a column in sample_info for filtering example
    if ("condition" %in% colnames(sample_info)) {
      keep <- filterByExpr(dge, group = sample_info$condition)
    } else {
      keep <- filterByExpr(dge)
    }
    dge <- dge[keep, , keep.lib.sizes = FALSE]
    
    # Normalize library sizes
    dge <- calcNormFactors(dge)
    
    # Create design matrix
    design <- model.matrix(as.formula(design_formula_str), data = sample_info)
    
    # Estimate dispersion
    dge <- estimateDisp(dge, design)
    
    # Fit GLM
    fit <- glmQLFit(dge, design)
    
    # Perform differential expression testing
    # Handle contrast: if it contains '-' or '+', treat as a contrast string for makeContrasts
    # otherwise, treat as a coefficient name.
    if (grepl("-", contrast_coef_str) || grepl("\\+", contrast_coef_str)) {
      message(paste("Interpreting contrast as a comparison string:", contrast_coef_str))
      contrast_matrix <- makeContrasts(contrasts = contrast_coef_str, levels = design)
      lrt <- glmQLFTest(fit, contrast = contrast_matrix)
    } else {
      message(paste("Interpreting contrast as a single coefficient name:", contrast_coef_str))
      lrt <- glmQLFTest(fit, coef = contrast_coef_str)
    }
    
    # Get results
    results <- topTags(lrt, n = Inf, sort.by = "PValue")$table
    
    # Add adjusted p-value
    results$FDR <- p.adjust(results$PValue, method = "BH")
    
    # Write results to file
    write.table(results, file = output_file, sep = "\t", quote = FALSE, row.names = TRUE)
    
    message(paste("Differential expression results written to:", output_file))
    
    EOF
    
    # Create dummy input files for demonstration
    # Dummy counts.tsv
    cat << 'EOF' > counts.tsv
    GeneID  Sample1 Sample2 Sample3 Sample4 Sample5 Sample6
    GeneA   100     120     110     200     210     190
    GeneB   50      60      55      100     110     95
    GeneC   200     210     190     100     120     110
    GeneD   10      12      11      20      22      19
    GeneE   5       6       5       10      11      9
    EOF
    
    # Dummy sample_info.tsv
    cat << 'EOF' > sample_info.tsv
    SampleID        condition       batch
    Sample1         untreated       batch1
    Sample2         untreated       batch1
    Sample3         untreated       batch2
    Sample4         treated         batch2
    Sample5         treated         batch3
    Sample6         treated         batch3
    EOF
    
    # Execute the custom edgeR script
    # Example 1: Differential expression between 'treated' and 'untreated' conditions, adjusting for 'batch'
    # Here, 'conditiontreated' tests the effect of 'treated' vs the reference level of 'condition' (e.g., 'untreated')
    Rscript custom_edgeR_script.R \
      --counts counts.tsv \
      --samples sample_info.tsv \
      --output differential_expression_results_1.tsv \
      --design_formula "~batch + condition" \
      --contrast_coef "conditiontreated"
    
    # Example 2: If using a design like '~0 + condition' to explicitly model each condition group,
    # you can specify a direct contrast between groups.
    Rscript custom_edgeR_script.R \
      --counts counts.tsv \
      --samples sample_info.tsv \
      --output differential_expression_results_2.tsv \
      --design_formula "~0 + condition + batch" \
      --contrast_coef "conditiontreated - conditionuntreated"
    
    # Clean up dummy files (optional)
    # rm counts.tsv sample_info.tsv custom_edgeR_script.R
  5. 5

    ChIP-Seq: Reads were aligned to mm9 genome using bowtie version 0.12.7 with parameters -v 3 -k 1 -m 1 --best --strata.

    Bowtie v0.12.7 GitHub
    $ Bash example
    # Install Bowtie (example using conda)
    # conda install -c bioconda bowtie=0.12.7
    
    # Define reference genome index path (placeholder)
    MM9_INDEX="/path/to/mm9_index" # Replace with actual path to mm9 Bowtie index files (e.g., mm9)
    
    # Define input reads file (placeholder)
    INPUT_READS="reads.fastq" # Replace with actual input FASTQ file
    
    # Define output SAM file (placeholder)
    OUTPUT_SAM="output.sam" # Replace with desired output SAM file name
    
    # Align reads to mm9 genome using bowtie
    bowtie -v 3 -k 1 -m 1 --best --strata "${MM9_INDEX}" "${INPUT_READS}" > "${OUTPUT_SAM}"
  6. 6

    Redundant reads were discarded before creating bedgraph profiles.

    sambamba (Inferred with models/gemini-2.5-flash) v0.8.2 GitHub
    $ Bash example
    # Install sambamba if not already installed
    # conda install -c bioconda sambamba
    
    # Assuming 'aligned_reads.bam' is the input sorted BAM file after alignment.
    # The '-r' flag ensures that duplicate reads are removed, not just marked.
    # The '-p' flag specifies the number of threads to use for parallel processing.
    # Replace 'aligned_reads.bam' with your actual input file and 'deduplicated_reads.bam' with your desired output file name.
    # Adjust the number of threads (e.g., 8) based on your system's resources.
    sambamba markdup -r -p 8 aligned_reads.bam deduplicated_reads.bam
  7. 7

    Peak calling was done using HOMER for H3K4me1 and H3K27Ac marks at FDR=0.01 with respect to corresponding Input.

    HOMER vNot explicitly inferable from description GitHub
    $ Bash example
    # Define input BAM files and output directories (placeholders)
    # Replace with actual paths to your aligned BAM files
    H3K4ME1_BAM="h3k4me1_rep1.bam"
    H3K27AC_BAM="h3k27ac_rep1.bam"
    INPUT_BAM="input_rep1.bam"
    
    # Define output tag directories and peak files
    H3K4ME1_TAG_DIR="h3k4me1_tag_dir"
    H3K27AC_TAG_DIR="h3k27ac_tag_dir"
    INPUT_TAG_DIR="input_tag_dir"
    
    H3K4ME1_PEAKS_OUTPUT="h3k4me1_peaks.txt"
    H3K27AC_PEAKS_OUTPUT="h3k27ac_peaks.txt"
    
    # Install HOMER (example using conda)
    # conda create -n homer_env homer
    # conda activate homer_env
    
    # Make tag directories for each sample
    # Note: HOMER's makeTagDirectory can take a genome for normalization, but it's not strictly required for basic tag directory creation.
    # If you have a specific genome, you might add -genome <genome_name> (e.g., -genome hg38)
    makeTagDirectory "${H3K4ME1_TAG_DIR}" "${H3K4ME1_BAM}"
    makeTagDirectory "${H3K27AC_TAG_DIR}" "${H3K27AC_BAM}"
    makeTagDirectory "${INPUT_TAG_DIR}" "${INPUT_BAM}"
    
    # Perform peak calling for H3K4me1
    # -style histone is appropriate for histone marks
    # -FDR 0.01 is specified in the description
    # -i specifies the input/control tag directory
    findPeaks "${H3K4ME1_TAG_DIR}" \
        -o "${H3K4ME1_PEAKS_OUTPUT}" \
        -i "${INPUT_TAG_DIR}" \
        -style histone \
        -FDR 0.01
    
    # Perform peak calling for H3K27Ac
    findPeaks "${H3K27AC_TAG_DIR}" \
        -o "${H3K27AC_PEAKS_OUTPUT}" \
        -i "${INPUT_TAG_DIR}" \
        -style histone \
        -FDR 0.01
  8. 8

    Peak calling using STAR was done for H3K27me3 marks at FDR=0.01.

    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star
    
    # Define variables
    GENOME_DIR="/path/to/STAR_genome_index/hg38" # Placeholder for human genome hg38 index
    GENOME_FASTA="/path/to/hg38.fa" # Placeholder for human genome hg38 fasta
    READS_R1="sample_H3K27me3_R1.fastq.gz"
    READS_R2="sample_H3K27me3_R2.fastq.gz" # Assuming paired-end reads; adjust if single-end
    OUTPUT_PREFIX="H3K27me3_aligned_STAR_"
    FDR_THRESHOLD="0.01"
    
    # 1. Build STAR genome index (if not already built)
    # This step is typically done once per genome. Replace paths with actual genome files.
    # mkdir -p "${GENOME_DIR}"
    # STAR --runMode genomeGenerate \
    #      --genomeDir "${GENOME_DIR}" \
    #      --genomeFastaFiles "${GENOME_FASTA}" \
    #      --runThreadN 8 # Adjust thread count as needed
    
    # 2. Align reads using STAR
    # While STAR is primarily an RNA-seq aligner, it can be used for DNA-seq (like ChIP-seq) alignment.
    # This command performs the alignment step, which is a prerequisite for peak calling.
    STAR --genomeDir "${GENOME_DIR}" \
         --readFilesIn "${READS_R1}" "${READS_R2}" \
         --runThreadN 8 \
         --outFileNamePrefix "${OUTPUT_PREFIX}" \
         --outSAMtype BAM SortedByCoordinate \
         --outBAMcompression 6
    
    # 3. Peak calling using STAR output with FDR threshold
    # STAR itself does not perform peak calling. The description "Peak calling using STAR"
    # implies a custom script or a downstream tool that processes STAR's alignment output
    # (e.g., Aligned.sortedByCoord.out.bam) and applies peak calling logic with the specified FDR.
    # This is a conceptual placeholder for such a custom peak calling step, as STAR's primary function is alignment.
    
    # Example of a conceptual custom peak calling script using STAR's output:
    # Assuming 'custom_star_peak_caller.py' is a hypothetical script that takes STAR's BAM output
    # and performs peak calling with an FDR threshold.
    # python custom_star_peak_caller.py \
    #        --input_bam "${OUTPUT_PREFIX}Aligned.sortedByCoord.out.bam" \
    #        --fdr "${FDR_THRESHOLD}" \
    #        --output_peaks "H3K27me3_peaks_FDR${FDR_THRESHOLD}.bed"
    
    # If a standard ChIP-seq peak caller like MACS2 were used on STAR's output (common practice):
    # macs2 callpeak -t "${OUTPUT_PREFIX}Aligned.sortedByCoord.out.bam" \
    #                -f BAMPE \
    #                -g hs \
    #                -n H3K27me3_peaks \
    #                --outdir . \
    #                -q "${FDR_THRESHOLD}" # -q for FDR (q-value) threshold
    
  9. 9

    BIS-Seq: DNA sequencing data was aligned to mouse genome (mm9) with the BS Seeker program.

    BS Seeker vUnknown GitHub
    $ Bash example
    # Install BS Seeker (original version, typically downloaded from SourceForge)
    # BS Seeker is a Python script and requires Python to be installed.
    # It also relies on an external aligner like Bowtie or Bowtie2, which must be installed and in your system's PATH.
    # Example manual installation:
    # wget https://sourceforge.net/projects/bsseeker/files/BS_Seeker_v1.0.5.tar.gz # Or the latest available version
    # tar -xzf BS_Seeker_v1.0.5.tar.gz
    # cd BS_Seeker_v1.0.5
    # Make sure 'BS_Seeker.py' is executable or run with 'python'
    
    # Define input and output files
    input_fastq="dna_sequencing_data.fastq" # Placeholder for input DNA sequencing data (e.g., FASTQ file)
    output_bam="aligned_bisulfite_reads.bam" # Placeholder for output aligned reads (BAM file)
    
    # Define reference genome (mm9)
    # Download the mouse genome (mm9) FASTA file if not already available.
    # A common source is UCSC Genome Browser.
    # Example download command:
    # wget -P /path/to/genomes/mm9/ http://hgdownload.soe.ucsc.edu/goldenPath/mm9/bigZips/mm9.fa.gz
    # gunzip /path/to/genomes/mm9/mm9.fa.gz
    mm9_genome_fasta="/path/to/genomes/mm9/mm9.fa"
    
    # Align DNA sequencing data to the mouse genome (mm9) using BS Seeker
    # BS Seeker will automatically build a Bowtie/Bowtie2 index for the genome if it doesn't exist.
    # Ensure 'BS_Seeker.py' is in your PATH or specify its full path.
    python BS_Seeker.py \
        -i "${input_fastq}" \
        -g "${mm9_genome_fasta}" \
        -o "${output_bam}" \
        --aligner=bowtie # Specify the aligner to use (e.g., bowtie or bowtie2)
        # Additional parameters can be added as needed, e.g., -p for number of threads, -m for maximum mismatches, etc.
    
  10. 10

    Data was reported as the fraction of reads that were methylated, calculated as the ratio of cytosine bases (indicating methylated CpGs) to total reads for each CpG coordinate.

    methylDackel (Inferred with models/gemini-2.5-flash) vv0.6.1 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install methylDackel (if not already installed)
    # conda install -c bioconda methyldackel
    
    # Define input and output files
    # Replace 'path/to/your/reference_genome.fa' with the actual path to your reference genome FASTA file (e.g., hg38.fa)
    REFERENCE_GENOME="path/to/your/reference_genome.fa"
    INPUT_BAM="sample.bam"
    OUTPUT_PREFIX="sample_methylation"
    
    # Extract methylation information for CpG sites from a bisulfite-sequenced BAM file.
    # This command will generate two output files:
    # 1. ${OUTPUT_PREFIX}_CpG.bedGraph: A bedGraph file containing methylation percentages for each CpG site.
    # 2. ${OUTPUT_PREFIX}_CpG.tsv: A tab-separated file with detailed CpG methylation calls.
    methylDackel extract \
        -g "${REFERENCE_GENOME}" \
        "${INPUT_BAM}" \
        -o "${OUTPUT_PREFIX}"
  11. 11

    Unmethylated cytosines are ultimately converted to thymine bases during the bisulfite conversion process.

    Bismark (Inferred with models/gemini-2.5-flash) v(Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install Bismark (example using conda)
    # conda install -c bioconda bismark
    
    # Define variables
    # Replace /path/to/bisulfite_indexed_genome/hg38 with the actual path to your Bismark-indexed reference genome directory.
    # To create a Bismark-indexed genome, run: bismark_genome_preparation --path_to_bowtie2 /path/to/bowtie2_exec /path/to/reference_fasta_dir
    REFERENCE_GENOME_DIR="/path/to/bisulfite_indexed_genome/hg38"
    READ1="sample_R1.fastq.gz"
    READ2="sample_R2.fastq.gz"
    OUTPUT_DIR="bismark_alignment_output"
    SAMPLE_NAME="sample_name"
    
    # Create output directory if it doesn't exist
    mkdir -p "${OUTPUT_DIR}"
    
    # Align bisulfite-converted paired-end reads using Bismark with Bowtie2
    # This command performs alignment and generates a BAM file.
    # Further steps would involve methylation extraction using bismark_methylation_extractor.
    bismark \
        --bowtie2 \
        --temp_dir "${OUTPUT_DIR}/temp" \
        --output_dir "${OUTPUT_DIR}" \
        --basename "${SAMPLE_NAME}" \
        "${REFERENCE_GENOME_DIR}" \
        -1 "${READ1}" \
        -2 "${READ2}"
    

Tools Used

Raw Source Text
Illumina Casava-1.8.2 software was used for basecalling.
RNA-Seq: Low quality reads as well as ribosomal and other repeat sequences were filtered out before alignment. RUM software was used for aligning reads to mouse genome mm9 and Refseq transcriptome and for transcript quantitation. Differential expression analysis was carried out using a custom script implementing EdgeR software.
ChIP-Seq: Reads were aligned to mm9 genome using bowtie version 0.12.7 with parameters -v 3 -k 1 -m 1 --best --strata. Redundant reads were discarded before creating bedgraph profiles.
Peak calling was done using HOMER for H3K4me1 and H3K27Ac marks at FDR=0.01 with respect to corresponding Input. Peak calling using STAR was done for H3K27me3 marks at FDR=0.01.
BIS-Seq: DNA sequencing data was aligned to mouse genome (mm9) with the BS Seeker program. Data was reported as the fraction of reads that were methylated, calculated as the ratio of cytosine bases (indicating methylated CpGs) to total reads for each CpG coordinate.  Unmethylated cytosines are ultimately converted to thymine bases during the bisulfite conversion process.
Genome_build: mm9
Supplementary_files_format_and_content: Tab delimited file of transcript raw read counts.
Supplementary_files_format_and_content: Tab delimited file of transcript FPKM counts.
Supplementary_files_format_and_content: Peak files are tab-delimited listing genomic coordinates and their respective scores and p-values. Scores represent fold-change with respect to Input.
Supplementary_files_format_and_content: Tab-delimited files of loci, fraction of methylated cytosines and total read count.
Supplementary_files_format_and_content: RawReadCounts.txt: Raw transcript counts
Supplementary_files_format_and_content: TrancriptFPKMCounts.txt: Transcript FPKM counts
Supplementary_files_format_and_content: Old_H3K4me1.peaks.txt: peak
Supplementary_files_format_and_content: Old_H3K27me3.peaks.txt: peak
Supplementary_files_format_and_content: Young_H3K4me1.peaks.txt: peak
Supplementary_files_format_and_content: Young_H3K27me3.peaks.txt: peak
Supplementary_files_format_and_content: Young_H3K27Ac.peaks.txt: peak
Supplementary_files_format_and_content: OldBeta_bs_seeker-CG.tab: Percent methylated cytosines
Supplementary_files_format_and_content: YoungBeta_bs_seeker-CG.tab: Percent methylated cytosines
← Back to Analysis