GSE142307 Processing Pipeline

ChIP-Seq code_examples 2 steps

Publication

An in vivo genome-wide CRISPR screen identifies the RNA-binding protein Staufen2 as a key regulator of myeloid leukemia.

Nature cancer (2020) — PMID 34109316

Dataset

GSE142307

Effect of Stau2 knockdown on H3K4 methylation in human bcCML cells (K562)

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Bowtie2-alignment tool

    $ Bash example
    # Install Bowtie2 and Samtools (if not already installed)
    # conda install -c bioconda bowtie2=2.4.5 samtools=1.17 # Samtools version for compatibility
    
    # Define variables
    GENOME_INDEX="/path/to/genome/index/hg38" # Placeholder for hg38 genome index (e.g., from ENCODE)
    INPUT_FASTQ="input.fastq.gz" # Single-end FASTQ file
    OUTPUT_BAM="aligned_reads.bam"
    THREADS=8 # Number of threads to use
    SAMPLE_ID="sample_1" # Unique sample identifier
    SAMPLE_NAME="MySample" # Sample name
    LIBRARY_NAME="eCLIP_Library" # Library name
    FLOWCELL_LANE="FCID_Lane1" # Flowcell ID and lane
    
    # Align reads with Bowtie2 and pipe to samtools for BAM conversion
    # -x: Path to the genome index basename
    # -U: Path to the single-end FASTQ file
    # -p: Number of threads
    # --rg-id, --rg: Read group information for SAM/BAM header
    # -S: Output SAM format to stdout, then pipe to samtools
    bowtie2 -x "${GENOME_INDEX}" \
            -U "${INPUT_FASTQ}" \
            -p "${THREADS}" \
            --rg-id "${SAMPLE_ID}" \
            --rg "SM:${SAMPLE_NAME}" \
            --rg "LB:${LIBRARY_NAME}" \
            --rg "PL:ILLUMINA" \
            --rg "PU:${FLOWCELL_LANE}" \
            -S - | samtools view -bS -o "${OUTPUT_BAM}" -
  2. 2

    Macs2-peak calling

    MACS2 v2.2.7.1 GitHub
    $ Bash example
    # Install MACS2 (if not already installed)
    # conda install -c bioconda macs2
    
    # Define input files and parameters
    TREATMENT_BAM="treatment.sorted.bam" # Path to the treatment BAM file (e.g., ChIP-seq IP sample)
    CONTROL_BAM="control.sorted.bam"   # Path to the control BAM file (e.g., Input or IgG control)
    GENOME_SIZE="hs"                     # Effective genome size. Use 'hs' for human, 'mm' for mouse, 'ce' for C. elegans, 'dm' for D. melanogaster.
                                         # For other genomes, provide the estimated mappable genome size in base pairs (e.g., 2.7e9 for human hg38).
    OUTPUT_PREFIX="my_chip_peaks"        # Prefix for all output files (e.g., my_chip_peaks_peaks.narrowPeak)
    OUTPUT_DIR="macs2_output"            # Directory where all output files will be saved
    Q_VALUE_CUTOFF="0.01"                # FDR cutoff for peak calling. Common values are 0.01 (1%) or 0.05 (5%).
    
    # Create the output directory if it doesn't exist
    mkdir -p "${OUTPUT_DIR}"
    
    # Execute MACS2 peak calling
    # -t: Treatment file (ChIP-seq IP)
    # -c: Control file (Input or IgG)
    # -f: Format of input files (BAMPE for paired-end BAM, BAM for single-end BAM)
    # -g: Effective genome size
    # -n: Name of the experiment, used as prefix for output files
    # --outdir: Output directory
    # -q: Q-value (FDR) cutoff for peak detection
    # --keep-dup all: Keep all reads at the same genomic location (default is 'auto')
    # --verbose 2: Set verbose level to 2 for more detailed logging
    macs2 callpeak \
        -t "${TREATMENT_BAM}" \
        -c "${CONTROL_BAM}" \
        -f BAMPE \
        -g "${GENOME_SIZE}" \
        -n "${OUTPUT_PREFIX}" \
        --outdir "${OUTPUT_DIR}" \
        -q "${Q_VALUE_CUTOFF}" \
        --keep-dup all \
        --verbose 2

Tools Used

Raw Source Text
Bowtie2-alignment tool
Macs2-peak calling
Genome_build: hg38
← Back to Analysis