GSE124071 Processing Pipeline

ATAC-seq code_examples 3 steps

Publication

The RNA Helicase DDX6 Controls Cellular Plasticity by Modulating P-Body Homeostasis.

Cell stem cell (2019) — PMID 31588046

Dataset

GSE124071

The RNA helicase DDX6 regulates self-renewal and differentiation of human and mouse stem cells [ATAC-Seq]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Illumina Casava1.7 software used for basecalling.

    Illumina Casava v1.7 GitHub
    $ Bash example
    # Illumina Casava 1.7 software was used for basecalling and demultiplexing,
    # converting BCL files generated by the sequencer into FASTQ files.
    # The 'configureBclToFastq.pl' script was a key component of the Casava 1.7 pipeline.
    
    # This command is a placeholder demonstrating the typical usage of configureBclToFastq.pl
    # within the Casava 1.7 pipeline. Actual paths and sample sheet would vary based on the specific run.
    
    # Define placeholder paths for an Illumina run directory structure
    BCL_INPUT_DIR="/path/to/illumina_run/Data/Intensities/BaseCalls/"
    SAMPLE_SHEET="/path/to/illumina_run/SampleSheet.csv"
    OUTPUT_DIR="/path/to/output/fastq/"
    
    # Ensure output directory exists
    mkdir -p "${OUTPUT_DIR}"
    
    # Execute the bcl to fastq conversion using configureBclToFastq.pl
    # Note: 'configureBclToFastq.pl' is part of the Casava 1.7 suite and needs to be in the system's PATH
    # or called with its full path.
    configureBclToFastq.pl --input-dir "${BCL_INPUT_DIR}" \
                           --output-dir "${OUTPUT_DIR}" \
                           --sample-sheet "${SAMPLE_SHEET}" \
                           --force
  2. 2

    Sequenced reads were trimmed for adaptor sequence, and masked for low-complexity or low-quality sequence, then mapped to reference genome hg19 using BWA

    BWA vBWA-MEM GitHub
    $ Bash example
    # Install necessary tools (if not already installed)
    # conda install -c bioconda bwa samtools fastp
    
    # Define reference genome path and name
    REF_GENOME_DIR="./reference"
    REF_GENOME_NAME="hg19"
    REF_GENOME_FA="${REF_GENOME_DIR}/${REF_GENOME_NAME}.fa"
    REF_GENOME_URL="http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz"
    
    # Create reference directory if it doesn't exist
    mkdir -p "${REF_GENOME_DIR}"
    
    # Download and decompress hg19 reference genome if not present
    if [ ! -f "${REF_GENOME_FA}" ]; then
        echo "Downloading hg19 reference genome..."
        wget -O "${REF_GENOME_DIR}/${REF_GENOME_NAME}.fa.gz" "${REF_GENOME_URL}"
        gunzip "${REF_GENOME_DIR}/${REF_GENOME_NAME}.fa.gz"
    fi
    
    # Index the reference genome for BWA if not already indexed
    if [ ! -f "${REF_GENOME_FA}.bwt" ]; then
        echo "Indexing hg19 reference genome with BWA..."
        bwa index "${REF_GENOME_FA}"
    fi
    
    # Define input and output file names
    INPUT_FASTQ="input.fastq.gz" # Placeholder for your input reads
    TRIMMED_FASTQ="trimmed_reads.fastq.gz"
    ALIGNED_SAM="aligned_reads.sam"
    ALIGNED_BAM="aligned_reads.bam"
    SORTED_BAM="sorted_aligned_reads.bam"
    
    # Step 1: Trim adaptor sequences and mask low-quality/low-complexity sequences using fastp
    # This step addresses "trimmed for adaptor sequence, and masked for low-complexity or low-quality sequence"
    echo "Trimming and quality filtering reads with fastp..."
    fastp -i "${INPUT_FASTQ}" -o "${TRIMMED_FASTQ}" \
          --detect_adapter_for_pe \
          --trim_poly_g \
          --trim_poly_x \
          --low_complexity_filter \
          --qualified_quality_phred 15 \
          --length_required 30 \
          --json "${TRIMMED_FASTQ%.fastq.gz}.json" \
          --html "${TRIMMED_FASTQ%.fastq.gz}.html"
    
    # Step 2: Map trimmed reads to reference genome hg19 using BWA-MEM
    echo "Mapping reads to hg19 with BWA-MEM..."
    bwa mem "${REF_GENOME_FA}" "${TRIMMED_FASTQ}" > "${ALIGNED_SAM}"
    
    # Step 3: Convert SAM to BAM, sort, and index the BAM file
    echo "Converting SAM to BAM, sorting, and indexing..."
    samtools view -bS "${ALIGNED_SAM}" | samtools sort -o "${SORTED_BAM}" -
    samtools index "${SORTED_BAM}"
    
    echo "Pipeline completed. Sorted BAM file: ${SORTED_BAM}"
  3. 3

    peaks were called using HOTSPOT with default parameter

    HOTSPOT vUnspecified (Inferred with models/gemini-2.5-flash)
    $ Bash example
    # Install hotspot (example, adjust based on availability)
    # conda install -c bioconda hotspot # If available via Bioconda
    # Or clone from GitHub and install dependencies:
    # git clone https://github.com/ENCODE-DCC/hotspot.git
    # cd hotspot
    # python setup.py install # Or just run the script directly
    
    # Placeholder for input BED file (e.g., from aligned reads, pre-processed for signal).
    # Replace 'input.bed' with the actual path to your input BED file.
    # Replace 'hg38.chrom.sizes' with the path to your genome size file.
    # Example hg38.chrom.sizes can be obtained from UCSC:
    # wget -qO- http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes > hg38.chrom.sizes
    
    # Run HOTSPOT with default parameters
    # Assuming hotspot.py is in your PATH or specified with its full path
    hotspot.py -i input.bed -o hotspot_peaks.bed -g hg38.chrom.sizes
Raw Source Text
Illumina Casava1.7 software used for basecalling.
Sequenced reads were trimmed for adaptor sequence, and masked for low-complexity or low-quality sequence, then mapped to reference genome hg19 using BWA
peaks were called using HOTSPOT with default parameter
Genome_build: hg19
Supplementary_files_format_and_content: bed files for peak calls
← Back to Analysis