GSE273094 Processing Pipeline
ChIP-Seq
code_examples
3 steps
Publication
Zfp697 is an RNA-binding protein that regulates skeletal muscle inflammation and remodeling.Proceedings of the National Academy of Sciences of the United States of America (2024) — PMID 39141348
Dataset
GSE273094Zfp697 is an RNA-binding protein that regulates skeletal muscle inflammation and remodeling (Zfp697 transduced primary mouse myotubes ChIP-Seq)
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Reads were aligned to the GRCm38.p6 version of the mouse genome using STAR and duplicated reads were removed with MarkDuplicates.
$ Bash example
# Install STAR (example) # conda install -c bioconda star # Install Picard (example) # conda install -c bioconda picard # Define variables # Replace with actual paths to your reference genome and input reads GENOME_DIR="path/to/STAR_index/GRCm38.p6" # Directory containing STAR genome index GENOME_FASTA="path/to/reference/GRCm38.p6.fa" # Reference FASTA for index generation GENOME_GTF="path/to/annotation/GRCm38.p6.gtf" # GTF for splice junction annotation (recommended for RNA-seq) READS_R1="sample_R1.fastq.gz" # Input R1 FASTQ file READS_R2="sample_R2.fastq.gz" # Input R2 FASTQ file (remove if single-end) OUTPUT_PREFIX="sample_aligned" # Prefix for output files ALIGNED_BAM="${OUTPUT_PREFIX}Aligned.sortedByCoordinate.out.bam" DEDUP_BAM="${OUTPUT_PREFIX}.deduplicated.bam" DEDUP_METRICS="${OUTPUT_PREFIX}.deduplication_metrics.txt" NUM_THREADS=8 # Number of threads to use for STAR PICARD_JAR="picard.jar" # Path to picard.jar, or just "picard.jar" if in PATH # --- STAR Alignment --- # Generate STAR genome index (run once per genome version) # STAR --runMode genomeGenerate \ # --genomeDir "${GENOME_DIR}" \ # --genomeFastaFiles "${GENOME_FASTA}" \ # --sjdbGTFfile "${GENOME_GTF}" \ # --runThreadN "${NUM_THREADS}" # Align reads to the GRCm38.p6 mouse genome STAR --genomeDir "${GENOME_DIR}" \ --readFilesIn "${READS_R1}" "${READS_R2}" \ --runThreadN "${NUM_THREADS}" \ --outFileNamePrefix "${OUTPUT_PREFIX}" \ --outSAMtype BAM SortedByCoordinate \ --outSAMunmapped Within \ --outSAMattributes All \ --outFilterMultimapNmax 20 \ --outFilterMismatchNmax 999 \ --outFilterMismatchNoverLmax 0.1 \ --alignIntronMin 20 \ --alignIntronMax 1000000 \ --alignMatesGapMax 1000000 \ --limitBAMsortRAM 30000000000 # Adjust based on available RAM (e.g., 30GB) # --- MarkDuplicates (Picard) --- # Remove duplicated reads java -jar "${PICARD_JAR}" MarkDuplicates \ I="${ALIGNED_BAM}" \ O="${DEDUP_BAM}" \ M="${DEDUP_METRICS}" \ REMOVE_DUPLICATES=true \ ASSUME_SORTED=true -
2
We used MACS3 callpeak for peak calling (q- value < 0.05 and fold change ⥠1.5).
MACS2$ Bash example
# Install MACS3 (example using conda) # conda install -c bioconda macs3 # Define input and output files TREATMENT_BAM="treatment.bam" # Replace with your actual treatment/IP BAM file CONTROL_BAM="control.bam" # Replace with your actual control/input BAM file OUTPUT_PREFIX="my_peaks" # Prefix for output files OUTPUT_DIR="./" # Run MACS3 callpeak macs3 callpeak \ -t "${TREATMENT_BAM}" \ -c "${CONTROL_BAM}" \ -f BAM \ -g hs \ -n "${OUTPUT_PREFIX}" \ --outdir "${OUTPUT_DIR}" \ -q 0.05 \ --fold-enrichment 1.5 -
3
Bedtools were used to generate coverage tracks.
$ Bash example
# Install bedtools if not already installed # conda install -c bioconda bedtools # Example: Generate a bedGraph file representing genome-wide coverage from a BAM file. # This bedGraph can then be converted to a bigWig file for visualization as a coverage track. # Input: # - aligned_reads.bam: A sorted and indexed BAM file containing aligned sequencing reads. # Output: # - coverage.bedgraph: A bedGraph file showing per-base coverage depth across the genome. bedtools genomecov -ibam aligned_reads.bam -bg > coverage.bedgraph
Tools Used
Raw Source Text
Reads were aligned to the GRCm38.p6 version of the mouse genome using STAR and duplicated reads were removed with MarkDuplicates. We used MACS3 callpeak for peak calling (q- value < 0.05 and fold change ⥠1.5). Bedtools were used to generate coverage tracks. Assembly: GRCm38.p6 Supplementary files format and content: Excel xlsx files containing peak information Supplementary files format and content: Bigwig (bw) files containing track coverage information