GSE124071 Processing Pipeline

ATAC-seq code_examples 3 steps

Publication

The RNA Helicase DDX6 Controls Cellular Plasticity by Modulating P-Body Homeostasis.

Cell stem cell (2019) — PMID 31588046

Dataset

The RNA helicase DDX6 regulates self-renewal and differentiation of human and mouse stem cells [ATAC-Seq]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Illumina Casava1.7 software used for basecalling.

Illumina Casava v1.7 GitHub

$ Bash example

# Illumina Casava 1.7 software was used for basecalling and demultiplexing,
# converting BCL files generated by the sequencer into FASTQ files.
# The 'configureBclToFastq.pl' script was a key component of the Casava 1.7 pipeline.

# This command is a placeholder demonstrating the typical usage of configureBclToFastq.pl
# within the Casava 1.7 pipeline. Actual paths and sample sheet would vary based on the specific run.

# Define placeholder paths for an Illumina run directory structure
BCL_INPUT_DIR="/path/to/illumina_run/Data/Intensities/BaseCalls/"
SAMPLE_SHEET="/path/to/illumina_run/SampleSheet.csv"
OUTPUT_DIR="/path/to/output/fastq/"

# Ensure output directory exists
mkdir -p "${OUTPUT_DIR}"

# Execute the bcl to fastq conversion using configureBclToFastq.pl
# Note: 'configureBclToFastq.pl' is part of the Casava 1.7 suite and needs to be in the system's PATH
# or called with its full path.
configureBclToFastq.pl --input-dir "${BCL_INPUT_DIR}" \
                       --output-dir "${OUTPUT_DIR}" \
                       --sample-sheet "${SAMPLE_SHEET}" \
                       --force

View on GitHub

Sequenced reads were trimmed for adaptor sequence, and masked for low-complexity or low-quality sequence, then mapped to reference genome hg19 using BWA

BWA vBWA-MEM GitHub

$ Bash example

# Install necessary tools (if not already installed)
# conda install -c bioconda bwa samtools fastp

# Define reference genome path and name
REF_GENOME_DIR="./reference"
REF_GENOME_NAME="hg19"
REF_GENOME_FA="${REF_GENOME_DIR}/${REF_GENOME_NAME}.fa"
REF_GENOME_URL="http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz"

# Create reference directory if it doesn't exist
mkdir -p "${REF_GENOME_DIR}"

# Download and decompress hg19 reference genome if not present
if [ ! -f "${REF_GENOME_FA}" ]; then
    echo "Downloading hg19 reference genome..."
    wget -O "${REF_GENOME_DIR}/${REF_GENOME_NAME}.fa.gz" "${REF_GENOME_URL}"
    gunzip "${REF_GENOME_DIR}/${REF_GENOME_NAME}.fa.gz"
fi

# Index the reference genome for BWA if not already indexed
if [ ! -f "${REF_GENOME_FA}.bwt" ]; then
    echo "Indexing hg19 reference genome with BWA..."
    bwa index "${REF_GENOME_FA}"
fi

# Define input and output file names
INPUT_FASTQ="input.fastq.gz" # Placeholder for your input reads
TRIMMED_FASTQ="trimmed_reads.fastq.gz"
ALIGNED_SAM="aligned_reads.sam"
ALIGNED_BAM="aligned_reads.bam"
SORTED_BAM="sorted_aligned_reads.bam"

# Step 1: Trim adaptor sequences and mask low-quality/low-complexity sequences using fastp
# This step addresses "trimmed for adaptor sequence, and masked for low-complexity or low-quality sequence"
echo "Trimming and quality filtering reads with fastp..."
fastp -i "${INPUT_FASTQ}" -o "${TRIMMED_FASTQ}" \
      --detect_adapter_for_pe \
      --trim_poly_g \
      --trim_poly_x \
      --low_complexity_filter \
      --qualified_quality_phred 15 \
      --length_required 30 \
      --json "${TRIMMED_FASTQ%.fastq.gz}.json" \
      --html "${TRIMMED_FASTQ%.fastq.gz}.html"

# Step 2: Map trimmed reads to reference genome hg19 using BWA-MEM
echo "Mapping reads to hg19 with BWA-MEM..."
bwa mem "${REF_GENOME_FA}" "${TRIMMED_FASTQ}" > "${ALIGNED_SAM}"

# Step 3: Convert SAM to BAM, sort, and index the BAM file
echo "Converting SAM to BAM, sorting, and indexing..."
samtools view -bS "${ALIGNED_SAM}" | samtools sort -o "${SORTED_BAM}" -
samtools index "${SORTED_BAM}"

echo "Pipeline completed. Sorted BAM file: ${SORTED_BAM}"

View on GitHub

peaks were called using HOTSPOT with default parameter

HOTSPOT vUnspecified (Inferred with models/gemini-2.5-flash)

$ Bash example

# Install hotspot (example, adjust based on availability)
# conda install -c bioconda hotspot # If available via Bioconda
# Or clone from GitHub and install dependencies:
# git clone https://github.com/ENCODE-DCC/hotspot.git
# cd hotspot
# python setup.py install # Or just run the script directly

# Placeholder for input BED file (e.g., from aligned reads, pre-processed for signal).
# Replace 'input.bed' with the actual path to your input BED file.
# Replace 'hg38.chrom.sizes' with the path to your genome size file.
# Example hg38.chrom.sizes can be obtained from UCSC:
# wget -qO- http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes > hg38.chrom.sizes

# Run HOTSPOT with default parameters
# Assuming hotspot.py is in your PATH or specified with its full path
hotspot.py -i input.bed -o hotspot_peaks.bed -g hg38.chrom.sizes

Raw Source Text

Illumina Casava1.7 software used for basecalling.
Sequenced reads were trimmed for adaptor sequence, and masked for low-complexity or low-quality sequence, then mapped to reference genome hg19 using BWA
peaks were called using HOTSPOT with default parameter
Genome_build: hg19
Supplementary_files_format_and_content: bed files for peak calls

← Back to Analysis