GSE271652 Processing Pipeline
Publication
Autism-associated CHD8 controls reactive gliosis and neuroinflammation via remodeling chromatin in astrocytes.Cell reports (2024) — PMID 39154337
Dataset
GSE271652Autism-associated CHD8 controls reactive gliosis and neuroinflammation via remodeling chromatin in astrocytes [ATAC-seq]
Processing Steps
Generate Jupyter Notebook-
1
Reads were downsampled to an equivalent number of reads per sample using seqtk sample (version 1.2)
$ Bash example
# Install seqtk if not already available # conda install -c bioconda seqtk # Define input and output files (placeholders) INPUT_FASTQ="sample_R1.fastq.gz" # Replace with your actual input FASTQ file OUTPUT_FASTQ="sample_downsampled_R1.fastq.gz" # Replace with your desired output FASTQ file # Determine the target number of reads. # This typically involves finding the minimum read count across all samples # and using that as the target for downsampling all samples to an equivalent number. # For example, if the minimum read count across all samples is 10,000,000: TARGET_READ_COUNT="10000000" # Placeholder: Replace with the actual target read count # Downsample reads using seqtk sample # -s11 sets a random seed for reproducibility seqtk sample -s11 "${INPUT_FASTQ}" "${TARGET_READ_COUNT}" > "${OUTPUT_FASTQ}" -
2
Adaptors were trimmed with trimmomatic (version 0.39) [1] with the options ILLUMINACLIP:Trimmomatic-0.39/adapters/NexteraPE-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:25 MINLEN:24
Trimmomatic v0.39$ Bash example
# Install Trimmomatic (if not already installed) # wget http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.39.zip # unzip Trimmomatic-0.39.zip # TRIMMOMATIC_DIR="$(pwd)/Trimmomatic-0.39" # Define input and output file paths (placeholders) # Replace with actual input/output file names INPUT_R1="input_R1.fastq.gz" INPUT_R2="input_R2.fastq.gz" OUTPUT_R1_PAIRED="output_R1_paired.fastq.gz" OUTPUT_R1_UNPAIRED="output_R1_unpaired.fastq.gz" OUTPUT_R2_PAIRED="output_R2_paired.fastq.gz" OUTPUT_R2_UNPAIRED="output_R2_unpaired.fastq.gz" # Define Trimmomatic JAR path and adapter file path # Adjust TRIMMOMATIC_JAR and ADAPTER_FILE paths if Trimmomatic is installed elsewhere TRIMMOMATIC_JAR="Trimmomatic-0.39/trimmomatic-0.39.jar" ADAPTER_FILE="Trimmomatic-0.39/adapters/NexteraPE-PE.fa" # Run Trimmomatic java -jar "${TRIMMOMATIC_JAR}" PE \ "${INPUT_R1}" "${INPUT_R2}" \ "${OUTPUT_R1_PAIRED}" "${OUTPUT_R1_UNPAIRED}" \ "${OUTPUT_R2_PAIRED}" "${OUTPUT_R2_UNPAIRED}" \ ILLUMINACLIP:"${ADAPTER_FILE}":2:30:10 \ LEADING:3 TRAILING:3 SLIDINGWINDOW:4:25 MINLEN:24 -
3
Reads were then aligned to the mouse genome (mm10) with bowtie2 (version 2.4.4) using default parameters
$ Bash example
# Install Bowtie2 (if not already installed) # conda install -c bioconda bowtie2=2.4.4 # Placeholder for Bowtie2 index for mouse genome (mm10) # Ensure the index is built and available at this path. # Example: bowtie2-build /path/to/mm10.fa /path/to/bowtie2_indices/mm10_index MM10_INDEX_PREFIX="/path/to/bowtie2_indices/mm10_index" # Placeholder for input FASTQ file(s) # For single-end reads: INPUT_FASTQ="input_reads.fastq" # Replace with your actual input FASTQ file # For paired-end reads (uncomment and modify if applicable): # INPUT_FASTQ_R1="input_reads_R1.fastq" # INPUT_FASTQ_R2="input_reads_R2.fastq" # Placeholder for output SAM file OUTPUT_SAM="aligned_reads.sam" # Replace with your desired output SAM file # Align reads to the mouse genome (mm10) with bowtie2 (version 2.4.4) using default parameters # -x: specifies the index prefix # -U: specifies unpaired reads (use -1 <reads_1.fastq> -2 <reads_2.fastq> for paired-end) # -S: specifies the output SAM file # Default parameters are used as stated in the description. bowtie2 -x "${MM10_INDEX_PREFIX}" -U "${INPUT_FASTQ}" -S "${OUTPUT_SAM}" # If using paired-end reads, use the following command instead: # bowtie2 -x "${MM10_INDEX_PREFIX}" -1 "${INPUT_FASTQ_R1}" -2 "${INPUT_FASTQ_R2}" -S "${OUTPUT_SAM}" # Optional: Convert SAM to BAM and sort (requires samtools) # samtools view -bS "${OUTPUT_SAM}" | samtools sort -o "aligned_reads.bam" # samtools index "aligned_reads.bam" -
4
Duplicates were removed with picard through gatk MarkDuplicates (version 4.2.5.0)
Picard v4.2.5.0$ Bash example
# Install GATK (which includes Picard tools like MarkDuplicates) # conda install -c bioconda gatk4 # Run MarkDuplicates to remove duplicates # -I: Input BAM file # -O: Output BAM file with duplicates removed # -M: File to write duplication metrics to gatk MarkDuplicates \ -I input.bam \ -O output.bam \ -M metrics.txt
-
5
Peak detection was performed using macs2 callpeak (version 2.2.7.1) with the parameters â-g mm --qvalue 0.05 --shift 100 --extsize 200 --format BAMPE --keep-dup=all --cutoff-analysis âbdgââ.
$ Bash example
# Install MACS2 if not already installed # conda install -c bioconda macs2 # Placeholder for input files and output prefix # Replace treatment.bam with your actual treatment BAM file # Replace control.bam with your actual control BAM file (if applicable, MACS2 can run without control) # Replace output_prefix with your desired output file prefix macs2 callpeak -t treatment.bam -c control.bam -f BAMPE -g mm -n output_prefix --qvalue 0.05 --shift 100 --extsize 200 --keep-dup=all --cutoff-analysis --bdg
Tools Used
Raw Source Text
Reads were downsampled to an equivalent number of reads per sample using seqtk sample (version 1.2) Adaptors were trimmed with trimmomatic (version 0.39) [1] with the options ILLUMINACLIP:Trimmomatic-0.39/adapters/NexteraPE-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:25 MINLEN:24 Reads were then aligned to the mouse genome (mm10) with bowtie2 (version 2.4.4) using default parameters Duplicates were removed with picard through gatk MarkDuplicates (version 4.2.5.0) Peak detection was performed using macs2 callpeak (version 2.2.7.1) with the parameters â-g mm --qvalue 0.05 --shift 100 --extsize 200 --format BAMPE --keep-dup=all --cutoff-analysis âbdgââ. Assembly: mm10 Supplementary files format and content: bedgraph data files for each sample