GSE273094 Processing Pipeline

ChIP-Seq code_examples 3 steps

Publication

Zfp697 is an RNA-binding protein that regulates skeletal muscle inflammation and remodeling.

Proceedings of the National Academy of Sciences of the United States of America (2024) — PMID 39141348

Dataset

Zfp697 is an RNA-binding protein that regulates skeletal muscle inflammation and remodeling (Zfp697 transduced primary mouse myotubes ChIP-Seq)

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Reads were aligned to the GRCm38.p6 version of the mouse genome using STAR and duplicated reads were removed with MarkDuplicates.

STAR v2.7.10a GitHub

$ Bash example

# Install STAR (example)
# conda install -c bioconda star

# Install Picard (example)
# conda install -c bioconda picard

# Define variables
# Replace with actual paths to your reference genome and input reads
GENOME_DIR="path/to/STAR_index/GRCm38.p6" # Directory containing STAR genome index
GENOME_FASTA="path/to/reference/GRCm38.p6.fa" # Reference FASTA for index generation
GENOME_GTF="path/to/annotation/GRCm38.p6.gtf" # GTF for splice junction annotation (recommended for RNA-seq)
READS_R1="sample_R1.fastq.gz" # Input R1 FASTQ file
READS_R2="sample_R2.fastq.gz" # Input R2 FASTQ file (remove if single-end)
OUTPUT_PREFIX="sample_aligned" # Prefix for output files
ALIGNED_BAM="${OUTPUT_PREFIX}Aligned.sortedByCoordinate.out.bam"
DEDUP_BAM="${OUTPUT_PREFIX}.deduplicated.bam"
DEDUP_METRICS="${OUTPUT_PREFIX}.deduplication_metrics.txt"
NUM_THREADS=8 # Number of threads to use for STAR
PICARD_JAR="picard.jar" # Path to picard.jar, or just "picard.jar" if in PATH

# --- STAR Alignment ---
# Generate STAR genome index (run once per genome version)
# STAR --runMode genomeGenerate \
#      --genomeDir "${GENOME_DIR}" \
#      --genomeFastaFiles "${GENOME_FASTA}" \
#      --sjdbGTFfile "${GENOME_GTF}" \
#      --runThreadN "${NUM_THREADS}"

# Align reads to the GRCm38.p6 mouse genome
STAR --genomeDir "${GENOME_DIR}" \
     --readFilesIn "${READS_R1}" "${READS_R2}" \
     --runThreadN "${NUM_THREADS}" \
     --outFileNamePrefix "${OUTPUT_PREFIX}" \
     --outSAMtype BAM SortedByCoordinate \
     --outSAMunmapped Within \
     --outSAMattributes All \
     --outFilterMultimapNmax 20 \
     --outFilterMismatchNmax 999 \
     --outFilterMismatchNoverLmax 0.1 \
     --alignIntronMin 20 \
     --alignIntronMax 1000000 \
     --alignMatesGapMax 1000000 \
     --limitBAMsortRAM 30000000000 # Adjust based on available RAM (e.g., 30GB)

# --- MarkDuplicates (Picard) ---
# Remove duplicated reads
java -jar "${PICARD_JAR}" MarkDuplicates \
     I="${ALIGNED_BAM}" \
     O="${DEDUP_BAM}" \
     M="${DEDUP_METRICS}" \
     REMOVE_DUPLICATES=true \
     ASSUME_SORTED=true

View on GitHub

We used MACS3 callpeak for peak calling (q- value < 0.05 and fold change â¥ 1.5).

MACS2

$ Bash example

# Install MACS3 (example using conda)
# conda install -c bioconda macs3

# Define input and output files
TREATMENT_BAM="treatment.bam" # Replace with your actual treatment/IP BAM file
CONTROL_BAM="control.bam"   # Replace with your actual control/input BAM file
OUTPUT_PREFIX="my_peaks"    # Prefix for output files
OUTPUT_DIR="./"

# Run MACS3 callpeak
macs3 callpeak \
  -t "${TREATMENT_BAM}" \
  -c "${CONTROL_BAM}" \
  -f BAM \
  -g hs \
  -n "${OUTPUT_PREFIX}" \
  --outdir "${OUTPUT_DIR}" \
  -q 0.05 \
  --fold-enrichment 1.5

Bedtools were used to generate coverage tracks.

bedtools v2.31.0 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install bedtools if not already installed
# conda install -c bioconda bedtools

# Example: Generate a bedGraph file representing genome-wide coverage from a BAM file.
# This bedGraph can then be converted to a bigWig file for visualization as a coverage track.
# Input:
#   - aligned_reads.bam: A sorted and indexed BAM file containing aligned sequencing reads.
# Output:
#   - coverage.bedgraph: A bedGraph file showing per-base coverage depth across the genome.

bedtools genomecov -ibam aligned_reads.bam -bg > coverage.bedgraph

View on GitHub

Tools Used

STAR

Raw Source Text

Reads were aligned to the GRCm38.p6 version of the mouse genome using STAR and duplicated reads were removed with MarkDuplicates.
We used MACS3 callpeak for peak calling (q- value < 0.05 and fold change â¥ 1.5).
Bedtools were used to generate coverage tracks.
Assembly: GRCm38.p6
Supplementary files format and content: Excel xlsx files containing peak information
Supplementary files format and content: Bigwig (bw) files containing track coverage information

← Back to Analysis