GSE124071 Processing Pipeline
ATAC-seq
code_examples
3 steps
Publication
The RNA Helicase DDX6 Controls Cellular Plasticity by Modulating P-Body Homeostasis.Cell stem cell (2019) — PMID 31588046
Dataset
GSE124071The RNA helicase DDX6 regulates self-renewal and differentiation of human and mouse stem cells [ATAC-Seq]
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Illumina Casava1.7 software used for basecalling.
$ Bash example
# Illumina Casava 1.7 software was used for basecalling and demultiplexing, # converting BCL files generated by the sequencer into FASTQ files. # The 'configureBclToFastq.pl' script was a key component of the Casava 1.7 pipeline. # This command is a placeholder demonstrating the typical usage of configureBclToFastq.pl # within the Casava 1.7 pipeline. Actual paths and sample sheet would vary based on the specific run. # Define placeholder paths for an Illumina run directory structure BCL_INPUT_DIR="/path/to/illumina_run/Data/Intensities/BaseCalls/" SAMPLE_SHEET="/path/to/illumina_run/SampleSheet.csv" OUTPUT_DIR="/path/to/output/fastq/" # Ensure output directory exists mkdir -p "${OUTPUT_DIR}" # Execute the bcl to fastq conversion using configureBclToFastq.pl # Note: 'configureBclToFastq.pl' is part of the Casava 1.7 suite and needs to be in the system's PATH # or called with its full path. configureBclToFastq.pl --input-dir "${BCL_INPUT_DIR}" \ --output-dir "${OUTPUT_DIR}" \ --sample-sheet "${SAMPLE_SHEET}" \ --force -
2
Sequenced reads were trimmed for adaptor sequence, and masked for low-complexity or low-quality sequence, then mapped to reference genome hg19 using BWA
$ Bash example
# Install necessary tools (if not already installed) # conda install -c bioconda bwa samtools fastp # Define reference genome path and name REF_GENOME_DIR="./reference" REF_GENOME_NAME="hg19" REF_GENOME_FA="${REF_GENOME_DIR}/${REF_GENOME_NAME}.fa" REF_GENOME_URL="http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz" # Create reference directory if it doesn't exist mkdir -p "${REF_GENOME_DIR}" # Download and decompress hg19 reference genome if not present if [ ! -f "${REF_GENOME_FA}" ]; then echo "Downloading hg19 reference genome..." wget -O "${REF_GENOME_DIR}/${REF_GENOME_NAME}.fa.gz" "${REF_GENOME_URL}" gunzip "${REF_GENOME_DIR}/${REF_GENOME_NAME}.fa.gz" fi # Index the reference genome for BWA if not already indexed if [ ! -f "${REF_GENOME_FA}.bwt" ]; then echo "Indexing hg19 reference genome with BWA..." bwa index "${REF_GENOME_FA}" fi # Define input and output file names INPUT_FASTQ="input.fastq.gz" # Placeholder for your input reads TRIMMED_FASTQ="trimmed_reads.fastq.gz" ALIGNED_SAM="aligned_reads.sam" ALIGNED_BAM="aligned_reads.bam" SORTED_BAM="sorted_aligned_reads.bam" # Step 1: Trim adaptor sequences and mask low-quality/low-complexity sequences using fastp # This step addresses "trimmed for adaptor sequence, and masked for low-complexity or low-quality sequence" echo "Trimming and quality filtering reads with fastp..." fastp -i "${INPUT_FASTQ}" -o "${TRIMMED_FASTQ}" \ --detect_adapter_for_pe \ --trim_poly_g \ --trim_poly_x \ --low_complexity_filter \ --qualified_quality_phred 15 \ --length_required 30 \ --json "${TRIMMED_FASTQ%.fastq.gz}.json" \ --html "${TRIMMED_FASTQ%.fastq.gz}.html" # Step 2: Map trimmed reads to reference genome hg19 using BWA-MEM echo "Mapping reads to hg19 with BWA-MEM..." bwa mem "${REF_GENOME_FA}" "${TRIMMED_FASTQ}" > "${ALIGNED_SAM}" # Step 3: Convert SAM to BAM, sort, and index the BAM file echo "Converting SAM to BAM, sorting, and indexing..." samtools view -bS "${ALIGNED_SAM}" | samtools sort -o "${SORTED_BAM}" - samtools index "${SORTED_BAM}" echo "Pipeline completed. Sorted BAM file: ${SORTED_BAM}" -
3
peaks were called using HOTSPOT with default parameter
HOTSPOT vUnspecified (Inferred with models/gemini-2.5-flash)$ Bash example
# Install hotspot (example, adjust based on availability) # conda install -c bioconda hotspot # If available via Bioconda # Or clone from GitHub and install dependencies: # git clone https://github.com/ENCODE-DCC/hotspot.git # cd hotspot # python setup.py install # Or just run the script directly # Placeholder for input BED file (e.g., from aligned reads, pre-processed for signal). # Replace 'input.bed' with the actual path to your input BED file. # Replace 'hg38.chrom.sizes' with the path to your genome size file. # Example hg38.chrom.sizes can be obtained from UCSC: # wget -qO- http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes > hg38.chrom.sizes # Run HOTSPOT with default parameters # Assuming hotspot.py is in your PATH or specified with its full path hotspot.py -i input.bed -o hotspot_peaks.bed -g hg38.chrom.sizes
Raw Source Text
Illumina Casava1.7 software used for basecalling. Sequenced reads were trimmed for adaptor sequence, and masked for low-complexity or low-quality sequence, then mapped to reference genome hg19 using BWA peaks were called using HOTSPOT with default parameter Genome_build: hg19 Supplementary_files_format_and_content: bed files for peak calls