GSE142307 Processing Pipeline
ChIP-Seq
code_examples
2 steps
Publication
An in vivo genome-wide CRISPR screen identifies the RNA-binding protein Staufen2 as a key regulator of myeloid leukemia.Nature cancer (2020) — PMID 34109316
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Bowtie2-alignment tool
$ Bash example
# Install Bowtie2 and Samtools (if not already installed) # conda install -c bioconda bowtie2=2.4.5 samtools=1.17 # Samtools version for compatibility # Define variables GENOME_INDEX="/path/to/genome/index/hg38" # Placeholder for hg38 genome index (e.g., from ENCODE) INPUT_FASTQ="input.fastq.gz" # Single-end FASTQ file OUTPUT_BAM="aligned_reads.bam" THREADS=8 # Number of threads to use SAMPLE_ID="sample_1" # Unique sample identifier SAMPLE_NAME="MySample" # Sample name LIBRARY_NAME="eCLIP_Library" # Library name FLOWCELL_LANE="FCID_Lane1" # Flowcell ID and lane # Align reads with Bowtie2 and pipe to samtools for BAM conversion # -x: Path to the genome index basename # -U: Path to the single-end FASTQ file # -p: Number of threads # --rg-id, --rg: Read group information for SAM/BAM header # -S: Output SAM format to stdout, then pipe to samtools bowtie2 -x "${GENOME_INDEX}" \ -U "${INPUT_FASTQ}" \ -p "${THREADS}" \ --rg-id "${SAMPLE_ID}" \ --rg "SM:${SAMPLE_NAME}" \ --rg "LB:${LIBRARY_NAME}" \ --rg "PL:ILLUMINA" \ --rg "PU:${FLOWCELL_LANE}" \ -S - | samtools view -bS -o "${OUTPUT_BAM}" - -
2
Macs2-peak calling
$ Bash example
# Install MACS2 (if not already installed) # conda install -c bioconda macs2 # Define input files and parameters TREATMENT_BAM="treatment.sorted.bam" # Path to the treatment BAM file (e.g., ChIP-seq IP sample) CONTROL_BAM="control.sorted.bam" # Path to the control BAM file (e.g., Input or IgG control) GENOME_SIZE="hs" # Effective genome size. Use 'hs' for human, 'mm' for mouse, 'ce' for C. elegans, 'dm' for D. melanogaster. # For other genomes, provide the estimated mappable genome size in base pairs (e.g., 2.7e9 for human hg38). OUTPUT_PREFIX="my_chip_peaks" # Prefix for all output files (e.g., my_chip_peaks_peaks.narrowPeak) OUTPUT_DIR="macs2_output" # Directory where all output files will be saved Q_VALUE_CUTOFF="0.01" # FDR cutoff for peak calling. Common values are 0.01 (1%) or 0.05 (5%). # Create the output directory if it doesn't exist mkdir -p "${OUTPUT_DIR}" # Execute MACS2 peak calling # -t: Treatment file (ChIP-seq IP) # -c: Control file (Input or IgG) # -f: Format of input files (BAMPE for paired-end BAM, BAM for single-end BAM) # -g: Effective genome size # -n: Name of the experiment, used as prefix for output files # --outdir: Output directory # -q: Q-value (FDR) cutoff for peak detection # --keep-dup all: Keep all reads at the same genomic location (default is 'auto') # --verbose 2: Set verbose level to 2 for more detailed logging macs2 callpeak \ -t "${TREATMENT_BAM}" \ -c "${CONTROL_BAM}" \ -f BAMPE \ -g "${GENOME_SIZE}" \ -n "${OUTPUT_PREFIX}" \ --outdir "${OUTPUT_DIR}" \ -q "${Q_VALUE_CUTOFF}" \ --keep-dup all \ --verbose 2
Tools Used
Raw Source Text
Bowtie2-alignment tool Macs2-peak calling Genome_build: hg38