GSE271652 Processing Pipeline

ATAC-seq code_examples 5 steps

Publication

Autism-associated CHD8 controls reactive gliosis and neuroinflammation via remodeling chromatin in astrocytes.

Cell reports (2024) — PMID 39154337

Dataset

Autism-associated CHD8 controls reactive gliosis and neuroinflammation via remodeling chromatin in astrocytes [ATAC-seq]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Reads were downsampled to an equivalent number of reads per sample using seqtk sample (version 1.2)

seqtk sample v1.2 GitHub

$ Bash example

# Install seqtk if not already available
# conda install -c bioconda seqtk

# Define input and output files (placeholders)
INPUT_FASTQ="sample_R1.fastq.gz" # Replace with your actual input FASTQ file
OUTPUT_FASTQ="sample_downsampled_R1.fastq.gz" # Replace with your desired output FASTQ file

# Determine the target number of reads.
# This typically involves finding the minimum read count across all samples
# and using that as the target for downsampling all samples to an equivalent number.
# For example, if the minimum read count across all samples is 10,000,000:
TARGET_READ_COUNT="10000000" # Placeholder: Replace with the actual target read count

# Downsample reads using seqtk sample
# -s11 sets a random seed for reproducibility
seqtk sample -s11 "${INPUT_FASTQ}" "${TARGET_READ_COUNT}" > "${OUTPUT_FASTQ}"

View on GitHub

Adaptors were trimmed with trimmomatic (version 0.39) [1] with the options ILLUMINACLIP:Trimmomatic-0.39/adapters/NexteraPE-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:25 MINLEN:24

Trimmomatic v0.39

$ Bash example

# Install Trimmomatic (if not already installed)
# wget http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.39.zip
# unzip Trimmomatic-0.39.zip
# TRIMMOMATIC_DIR="$(pwd)/Trimmomatic-0.39"

# Define input and output file paths (placeholders)
# Replace with actual input/output file names
INPUT_R1="input_R1.fastq.gz"
INPUT_R2="input_R2.fastq.gz"
OUTPUT_R1_PAIRED="output_R1_paired.fastq.gz"
OUTPUT_R1_UNPAIRED="output_R1_unpaired.fastq.gz"
OUTPUT_R2_PAIRED="output_R2_paired.fastq.gz"
OUTPUT_R2_UNPAIRED="output_R2_unpaired.fastq.gz"

# Define Trimmomatic JAR path and adapter file path
# Adjust TRIMMOMATIC_JAR and ADAPTER_FILE paths if Trimmomatic is installed elsewhere
TRIMMOMATIC_JAR="Trimmomatic-0.39/trimmomatic-0.39.jar"
ADAPTER_FILE="Trimmomatic-0.39/adapters/NexteraPE-PE.fa"

# Run Trimmomatic
java -jar "${TRIMMOMATIC_JAR}" PE \
  "${INPUT_R1}" "${INPUT_R2}" \
  "${OUTPUT_R1_PAIRED}" "${OUTPUT_R1_UNPAIRED}" \
  "${OUTPUT_R2_PAIRED}" "${OUTPUT_R2_UNPAIRED}" \
  ILLUMINACLIP:"${ADAPTER_FILE}":2:30:10 \
  LEADING:3 TRAILING:3 SLIDINGWINDOW:4:25 MINLEN:24

Reads were then aligned to the mouse genome (mm10) with bowtie2 (version 2.4.4) using default parameters

Bowtie2 v2.4.4 GitHub

$ Bash example

# Install Bowtie2 (if not already installed)
# conda install -c bioconda bowtie2=2.4.4

# Placeholder for Bowtie2 index for mouse genome (mm10)
# Ensure the index is built and available at this path.
# Example: bowtie2-build /path/to/mm10.fa /path/to/bowtie2_indices/mm10_index
MM10_INDEX_PREFIX="/path/to/bowtie2_indices/mm10_index"

# Placeholder for input FASTQ file(s)
# For single-end reads:
INPUT_FASTQ="input_reads.fastq" # Replace with your actual input FASTQ file

# For paired-end reads (uncomment and modify if applicable):
# INPUT_FASTQ_R1="input_reads_R1.fastq"
# INPUT_FASTQ_R2="input_reads_R2.fastq"

# Placeholder for output SAM file
OUTPUT_SAM="aligned_reads.sam" # Replace with your desired output SAM file

# Align reads to the mouse genome (mm10) with bowtie2 (version 2.4.4) using default parameters
# -x: specifies the index prefix
# -U: specifies unpaired reads (use -1 <reads_1.fastq> -2 <reads_2.fastq> for paired-end)
# -S: specifies the output SAM file
# Default parameters are used as stated in the description.
bowtie2 -x "${MM10_INDEX_PREFIX}" -U "${INPUT_FASTQ}" -S "${OUTPUT_SAM}"

# If using paired-end reads, use the following command instead:
# bowtie2 -x "${MM10_INDEX_PREFIX}" -1 "${INPUT_FASTQ_R1}" -2 "${INPUT_FASTQ_R2}" -S "${OUTPUT_SAM}"

# Optional: Convert SAM to BAM and sort (requires samtools)
# samtools view -bS "${OUTPUT_SAM}" | samtools sort -o "aligned_reads.bam"
# samtools index "aligned_reads.bam"

View on GitHub

Duplicates were removed with picard through gatk MarkDuplicates (version 4.2.5.0)

Picard v4.2.5.0

$ Bash example

# Install GATK (which includes Picard tools like MarkDuplicates)
# conda install -c bioconda gatk4

# Run MarkDuplicates to remove duplicates
# -I: Input BAM file
# -O: Output BAM file with duplicates removed
# -M: File to write duplication metrics to
gatk MarkDuplicates \
  -I input.bam \
  -O output.bam \
  -M metrics.txt

Peak detection was performed using macs2 callpeak (version 2.2.7.1) with the parameters â-g mm --qvalue 0.05 --shift 100 --extsize 200 --format BAMPE --keep-dup=all --cutoff-analysis âbdgââ.

MACS2 v2.2.7 GitHub

$ Bash example

# Install MACS2 if not already installed
# conda install -c bioconda macs2

# Placeholder for input files and output prefix
# Replace treatment.bam with your actual treatment BAM file
# Replace control.bam with your actual control BAM file (if applicable, MACS2 can run without control)
# Replace output_prefix with your desired output file prefix

macs2 callpeak -t treatment.bam -c control.bam -f BAMPE -g mm -n output_prefix --qvalue 0.05 --shift 100 --extsize 200 --keep-dup=all --cutoff-analysis --bdg

View on GitHub

Tools Used

Bowtie2

Raw Source Text

Reads were downsampled to an equivalent number of reads per sample using seqtk sample (version 1.2)
Adaptors were trimmed with trimmomatic (version 0.39) [1] with the options ILLUMINACLIP:Trimmomatic-0.39/adapters/NexteraPE-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:25 MINLEN:24
Reads were then aligned to the mouse genome (mm10) with bowtie2 (version 2.4.4) using default parameters
Duplicates were removed with picard through gatk MarkDuplicates (version 4.2.5.0)
Peak detection was performed using macs2 callpeak (version 2.2.7.1) with the parameters â-g mm --qvalue 0.05 --shift 100 --extsize 200 --format BAMPE --keep-dup=all --cutoff-analysis âbdgââ.
Assembly: mm10
Supplementary files format and content: bedgraph data files for each sample

← Back to Analysis