GSE273094 Processing Pipeline

ChIP-Seq code_examples 3 steps

Publication

Zfp697 is an RNA-binding protein that regulates skeletal muscle inflammation and remodeling.

Proceedings of the National Academy of Sciences of the United States of America (2024) — PMID 39141348

Dataset

GSE273094

Zfp697 is an RNA-binding protein that regulates skeletal muscle inflammation and remodeling (Zfp697 transduced primary mouse myotubes ChIP-Seq)

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Reads were aligned to the GRCm38.p6 version of the mouse genome using STAR and duplicated reads were removed with MarkDuplicates.

    $ Bash example
    # Install STAR (example)
    # conda install -c bioconda star
    
    # Install Picard (example)
    # conda install -c bioconda picard
    
    # Define variables
    # Replace with actual paths to your reference genome and input reads
    GENOME_DIR="path/to/STAR_index/GRCm38.p6" # Directory containing STAR genome index
    GENOME_FASTA="path/to/reference/GRCm38.p6.fa" # Reference FASTA for index generation
    GENOME_GTF="path/to/annotation/GRCm38.p6.gtf" # GTF for splice junction annotation (recommended for RNA-seq)
    READS_R1="sample_R1.fastq.gz" # Input R1 FASTQ file
    READS_R2="sample_R2.fastq.gz" # Input R2 FASTQ file (remove if single-end)
    OUTPUT_PREFIX="sample_aligned" # Prefix for output files
    ALIGNED_BAM="${OUTPUT_PREFIX}Aligned.sortedByCoordinate.out.bam"
    DEDUP_BAM="${OUTPUT_PREFIX}.deduplicated.bam"
    DEDUP_METRICS="${OUTPUT_PREFIX}.deduplication_metrics.txt"
    NUM_THREADS=8 # Number of threads to use for STAR
    PICARD_JAR="picard.jar" # Path to picard.jar, or just "picard.jar" if in PATH
    
    # --- STAR Alignment ---
    # Generate STAR genome index (run once per genome version)
    # STAR --runMode genomeGenerate \
    #      --genomeDir "${GENOME_DIR}" \
    #      --genomeFastaFiles "${GENOME_FASTA}" \
    #      --sjdbGTFfile "${GENOME_GTF}" \
    #      --runThreadN "${NUM_THREADS}"
    
    # Align reads to the GRCm38.p6 mouse genome
    STAR --genomeDir "${GENOME_DIR}" \
         --readFilesIn "${READS_R1}" "${READS_R2}" \
         --runThreadN "${NUM_THREADS}" \
         --outFileNamePrefix "${OUTPUT_PREFIX}" \
         --outSAMtype BAM SortedByCoordinate \
         --outSAMunmapped Within \
         --outSAMattributes All \
         --outFilterMultimapNmax 20 \
         --outFilterMismatchNmax 999 \
         --outFilterMismatchNoverLmax 0.1 \
         --alignIntronMin 20 \
         --alignIntronMax 1000000 \
         --alignMatesGapMax 1000000 \
         --limitBAMsortRAM 30000000000 # Adjust based on available RAM (e.g., 30GB)
    
    # --- MarkDuplicates (Picard) ---
    # Remove duplicated reads
    java -jar "${PICARD_JAR}" MarkDuplicates \
         I="${ALIGNED_BAM}" \
         O="${DEDUP_BAM}" \
         M="${DEDUP_METRICS}" \
         REMOVE_DUPLICATES=true \
         ASSUME_SORTED=true
  2. 2

    We used MACS3 callpeak for peak calling (q- value < 0.05 and fold change ≥ 1.5).

    MACS2
    $ Bash example
    # Install MACS3 (example using conda)
    # conda install -c bioconda macs3
    
    # Define input and output files
    TREATMENT_BAM="treatment.bam" # Replace with your actual treatment/IP BAM file
    CONTROL_BAM="control.bam"   # Replace with your actual control/input BAM file
    OUTPUT_PREFIX="my_peaks"    # Prefix for output files
    OUTPUT_DIR="./"
    
    # Run MACS3 callpeak
    macs3 callpeak \
      -t "${TREATMENT_BAM}" \
      -c "${CONTROL_BAM}" \
      -f BAM \
      -g hs \
      -n "${OUTPUT_PREFIX}" \
      --outdir "${OUTPUT_DIR}" \
      -q 0.05 \
      --fold-enrichment 1.5
  3. 3

    Bedtools were used to generate coverage tracks.

    bedtools v2.31.0 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install bedtools if not already installed
    # conda install -c bioconda bedtools
    
    # Example: Generate a bedGraph file representing genome-wide coverage from a BAM file.
    # This bedGraph can then be converted to a bigWig file for visualization as a coverage track.
    # Input:
    #   - aligned_reads.bam: A sorted and indexed BAM file containing aligned sequencing reads.
    # Output:
    #   - coverage.bedgraph: A bedGraph file showing per-base coverage depth across the genome.
    
    bedtools genomecov -ibam aligned_reads.bam -bg > coverage.bedgraph

Tools Used

Raw Source Text
Reads were aligned to the GRCm38.p6 version of the mouse genome using STAR and duplicated reads were removed with MarkDuplicates.
We used MACS3 callpeak for peak calling (q- value < 0.05 and fold change ≥ 1.5).
Bedtools were used to generate coverage tracks.
Assembly: GRCm38.p6
Supplementary files format and content: Excel xlsx files containing peak information
Supplementary files format and content: Bigwig (bw) files containing track coverage information
← Back to Analysis