GSE55887 Processing Pipeline
ChIP-Seq
code_examples
5 steps
Publication
Crosstalk between CRISPR-Cas9 and the human transcriptome.Nature communications (2022) — PMID 35236841
Dataset
GSE55887Genome-wide analysis reveals characteristics of off-target sites bound by the Cas9 endonuclease
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Illlumina MiSeq Sequencer was used for base calling.
MiSeq Reporter (Inferred with models/gemini-2.5-flash) vNot specified (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Note: Base calling itself is performed by the Illumina MiSeq's internal Real-Time Analysis (RTA) software, which is part of the MiSeq Reporter suite. # The following command demonstrates the typical next step in a bioinformatics pipeline: converting the raw BCL files (output of base calling) into FASTQ files using bcl2fastq. # Installation (example using conda): # conda install -c bioconda bcl2fastq2 # Example command for bcl2fastq: # Replace /path/to/MiSeq/run/folder with the actual path to your MiSeq run directory containing BCL files. # Replace /path/to/output/fastq with your desired output directory for FASTQ files. bcl2fastq --runfolder-dir /path/to/MiSeq/run/folder \ --output-dir /path/to/output/fastq \ --no-lane-splitting \ --minimum-trimmed-read-length 8 \ --mask-short-adapter-reads 8 \ --ignore-missing-bcl \ --ignore-missing-stats \ --ignore-missing-filter -
2
Sequence reads were aligned to hg19 reference genome using Bowtie for HA-Chip data and Bowtie2 for deep sequencing data.
$ Bash example
# Install Bowtie and Bowtie2 # conda install -c bioconda bowtie bowtie2 # Create a directory for reference genome and indices mkdir -p reference_hg19 cd reference_hg19 # Download hg19 reference genome (UCSC build) # wget https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz # gunzip hg19.fa.gz # For demonstration, let's assume hg19.fa is already present in reference_hg19/ # If you downloaded it, uncomment the lines above and ensure the file is named hg19.fa # Build Bowtie index for hg19 # bowtie-build hg19.fa hg19_index_bowtie # Build Bowtie2 index for hg19 # bowtie2-build hg19.fa hg19_index_bowtie2 cd .. # --- Example for HA-Chip data alignment with Bowtie --- # Assume input_ha_chip.fastq is your HA-Chip sequencing data (single-end) # For demonstration, create a dummy fastq file: # echo "@read1\nAGCTAGCTAGCTAGCT\n+\nIIIIIIIIIIIIIIII" > input_ha_chip.fastq bowtie -S -p 8 reference_hg19/hg19_index_bowtie input_ha_chip.fastq > output_ha_chip.sam # --- Example for deep sequencing data alignment with Bowtie2 --- # Assume input_deep_seq.fastq is your deep sequencing data (single-end) # For demonstration, create a dummy fastq file: # echo "@read1\nAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCT\n+\nIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII" > input_deep_seq.fastq bowtie2 -x reference_hg19/hg19_index_bowtie2 -p 8 -U input_deep_seq.fastq -S output_deep_seq.sam
-
3
Two re-sequenced files : sgRNA2_1_Replicate1 and sgRNA3_1_Replicate1 were merged with sgRNA2_Replicate1 and sgRNA3_Replicate1 respectively
cat (Inferred with models/gemini-2.5-flash) vN/A$ Bash example
# Merge sgRNA2 files cat sgRNA2_1_Replicate1.fastq sgRNA2_Replicate1.fastq > sgRNA2_merged.fastq # Merge sgRNA3 files cat sgRNA3_1_Replicate1.fastq sgRNA3_Replicate1.fastq > sgRNA3_merged.fastq
-
4
Peaks were called by using MACS14 tool for HA-ChIP (-c=Control_dCas9only -p=1e-6)
$ Bash example
# Install MACS14 (MACS 1.4.2 is often referred to as MACS14) # conda install -c bioconda macs=1.4.2 # Define input files and parameters # Placeholder: Replace with actual treatment and control BAM files HA_CHIP_BAM="HA_ChIP.bam" CONTROL_BAM="Control_dCas9only.bam" # Placeholder: Define genome size (e.g., 'hs' for human, 'mm' for mouse, or a specific number) # For a specific number, e.g., 2.7e9 for human, use -g 2.7e9 GENOME_SIZE="hs" # Output prefix for MACS files OUTPUT_PREFIX="HA_ChIP_peaks" # Run MACS14 peak calling # -t: Treatment file (ChIP-seq data) # -c: Control file (Input DNA or IgG control) # -f: Format of input files (e.g., 'BAM', 'BED', 'ELAND', 'BOWTIE', 'SAM', 'TAGS') # -g: Genome size (e.g., 'hs' for human, 'mm' for mouse, or a specific number) # -n: Name of the experiment, which will be used as a prefix for output files # -p: P-value cutoff for peak detection macs14 -t "${HA_CHIP_BAM}" -c "${CONTROL_BAM}" -f BAM -g "${GENOME_SIZE}" -n "${OUTPUT_PREFIX}" -p 1e-6 -
5
Further data processing steps have been described in the method sections.
(Inferred with models/gemini-2.5-flash) v(Inferred with models/gemini-2.5-flash)$ Bash example
# No specific command can be inferred from the generic description: 'Further data processing steps have been described in the method sections.'
Raw Source Text
Illlumina MiSeq Sequencer was used for base calling. Sequence reads were aligned to hg19 reference genome using Bowtie for HA-Chip data and Bowtie2 for deep sequencing data. Two re-sequenced files : sgRNA2_1_Replicate1 and sgRNA3_1_Replicate1 were merged with sgRNA2_Replicate1 and sgRNA3_Replicate1 respectively Peaks were called by using MACS14 tool for HA-ChIP (-c=Control_dCas9only -p=1e-6) Further data processing steps have been described in the method sections. Genome_build: hg19 Supplementary_files_format_and_content: bigWig and Bed files containing peaks.