GSE55887 Processing Pipeline

ChIP-Seq code_examples 5 steps

Publication

Crosstalk between CRISPR-Cas9 and the human transcriptome.

Nature communications (2022) — PMID 35236841

Dataset

Genome-wide analysis reveals characteristics of off-target sites bound by the Cas9 endonuclease

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Illlumina MiSeq Sequencer was used for base calling.

MiSeq Reporter (Inferred with models/gemini-2.5-flash) vNot specified (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Note: Base calling itself is performed by the Illumina MiSeq's internal Real-Time Analysis (RTA) software, which is part of the MiSeq Reporter suite.
# The following command demonstrates the typical next step in a bioinformatics pipeline: converting the raw BCL files (output of base calling) into FASTQ files using bcl2fastq.

# Installation (example using conda):
# conda install -c bioconda bcl2fastq2

# Example command for bcl2fastq:
# Replace /path/to/MiSeq/run/folder with the actual path to your MiSeq run directory containing BCL files.
# Replace /path/to/output/fastq with your desired output directory for FASTQ files.
bcl2fastq --runfolder-dir /path/to/MiSeq/run/folder \
          --output-dir /path/to/output/fastq \
          --no-lane-splitting \
          --minimum-trimmed-read-length 8 \
          --mask-short-adapter-reads 8 \
          --ignore-missing-bcl \
          --ignore-missing-stats \
          --ignore-missing-filter

View on GitHub

Sequence reads were aligned to hg19 reference genome using Bowtie for HA-Chip data and Bowtie2 for deep sequencing data.

Bowtie v1.x (for Bowtie), 2.x (for Bowtie2) (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install Bowtie and Bowtie2
# conda install -c bioconda bowtie bowtie2

# Create a directory for reference genome and indices
mkdir -p reference_hg19
cd reference_hg19

# Download hg19 reference genome (UCSC build)
# wget https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz
# gunzip hg19.fa.gz

# For demonstration, let's assume hg19.fa is already present in reference_hg19/
# If you downloaded it, uncomment the lines above and ensure the file is named hg19.fa

# Build Bowtie index for hg19
# bowtie-build hg19.fa hg19_index_bowtie

# Build Bowtie2 index for hg19
# bowtie2-build hg19.fa hg19_index_bowtie2

cd ..

# --- Example for HA-Chip data alignment with Bowtie ---
# Assume input_ha_chip.fastq is your HA-Chip sequencing data (single-end)
# For demonstration, create a dummy fastq file:
# echo "@read1\nAGCTAGCTAGCTAGCT\n+\nIIIIIIIIIIIIIIII" > input_ha_chip.fastq

bowtie -S -p 8 reference_hg19/hg19_index_bowtie input_ha_chip.fastq > output_ha_chip.sam

# --- Example for deep sequencing data alignment with Bowtie2 ---
# Assume input_deep_seq.fastq is your deep sequencing data (single-end)
# For demonstration, create a dummy fastq file:
# echo "@read1\nAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCT\n+\nIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII" > input_deep_seq.fastq

bowtie2 -x reference_hg19/hg19_index_bowtie2 -p 8 -U input_deep_seq.fastq -S output_deep_seq.sam

View on GitHub

3
Two re-sequenced files : sgRNA2_1_Replicate1 and sgRNA3_1_Replicate1 were merged with sgRNA2_Replicate1 and sgRNA3_Replicate1 respectively

cat (Inferred with models/gemini-2.5-flash) vN/A
$ Bash example
```
# Merge sgRNA2 files
cat sgRNA2_1_Replicate1.fastq sgRNA2_Replicate1.fastq > sgRNA2_merged.fastq

# Merge sgRNA3 files
cat sgRNA3_1_Replicate1.fastq sgRNA3_Replicate1.fastq > sgRNA3_merged.fastq
```

Peaks were called by using MACS14 tool for HA-ChIP (-c=Control_dCas9only -p=1e-6)

MACS v1.4 GitHub

$ Bash example

# Install MACS14 (MACS 1.4.2 is often referred to as MACS14)
# conda install -c bioconda macs=1.4.2

# Define input files and parameters
# Placeholder: Replace with actual treatment and control BAM files
HA_CHIP_BAM="HA_ChIP.bam"
CONTROL_BAM="Control_dCas9only.bam"

# Placeholder: Define genome size (e.g., 'hs' for human, 'mm' for mouse, or a specific number)
# For a specific number, e.g., 2.7e9 for human, use -g 2.7e9
GENOME_SIZE="hs"

# Output prefix for MACS files
OUTPUT_PREFIX="HA_ChIP_peaks"

# Run MACS14 peak calling
# -t: Treatment file (ChIP-seq data)
# -c: Control file (Input DNA or IgG control)
# -f: Format of input files (e.g., 'BAM', 'BED', 'ELAND', 'BOWTIE', 'SAM', 'TAGS')
# -g: Genome size (e.g., 'hs' for human, 'mm' for mouse, or a specific number)
# -n: Name of the experiment, which will be used as a prefix for output files
# -p: P-value cutoff for peak detection
macs14 -t "${HA_CHIP_BAM}" -c "${CONTROL_BAM}" -f BAM -g "${GENOME_SIZE}" -n "${OUTPUT_PREFIX}" -p 1e-6

View on GitHub

5
Further data processing steps have been described in the method sections.

(Inferred with models/gemini-2.5-flash) v(Inferred with models/gemini-2.5-flash)
$ Bash example
```
# No specific command can be inferred from the generic description: 'Further data processing steps have been described in the method sections.'
```

Raw Source Text

Illlumina MiSeq Sequencer was used for base calling.
Sequence reads were aligned to hg19 reference genome using Bowtie for HA-Chip data and Bowtie2 for deep sequencing data.
Two re-sequenced files : sgRNA2_1_Replicate1 and sgRNA3_1_Replicate1 were merged with sgRNA2_Replicate1 and sgRNA3_Replicate1 respectively
Peaks were called by using MACS14 tool for HA-ChIP (-c=Control_dCas9only -p=1e-6)
Further data processing steps have been described in the method sections.
Genome_build: hg19
Supplementary_files_format_and_content: bigWig and Bed files containing peaks.

← Back to Analysis