GSE55887 Processing Pipeline

ChIP-Seq code_examples 5 steps

Publication

Crosstalk between CRISPR-Cas9 and the human transcriptome.

Nature communications (2022) — PMID 35236841

Dataset

GSE55887

Genome-wide analysis reveals characteristics of off-target sites bound by the Cas9 endonuclease

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Illlumina MiSeq Sequencer was used for base calling.

    MiSeq Reporter (Inferred with models/gemini-2.5-flash) vNot specified (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Note: Base calling itself is performed by the Illumina MiSeq's internal Real-Time Analysis (RTA) software, which is part of the MiSeq Reporter suite.
    # The following command demonstrates the typical next step in a bioinformatics pipeline: converting the raw BCL files (output of base calling) into FASTQ files using bcl2fastq.
    
    # Installation (example using conda):
    # conda install -c bioconda bcl2fastq2
    
    # Example command for bcl2fastq:
    # Replace /path/to/MiSeq/run/folder with the actual path to your MiSeq run directory containing BCL files.
    # Replace /path/to/output/fastq with your desired output directory for FASTQ files.
    bcl2fastq --runfolder-dir /path/to/MiSeq/run/folder \
              --output-dir /path/to/output/fastq \
              --no-lane-splitting \
              --minimum-trimmed-read-length 8 \
              --mask-short-adapter-reads 8 \
              --ignore-missing-bcl \
              --ignore-missing-stats \
              --ignore-missing-filter
  2. 2

    Sequence reads were aligned to hg19 reference genome using Bowtie for HA-Chip data and Bowtie2 for deep sequencing data.

    Bowtie v1.x (for Bowtie), 2.x (for Bowtie2) (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install Bowtie and Bowtie2
    # conda install -c bioconda bowtie bowtie2
    
    # Create a directory for reference genome and indices
    mkdir -p reference_hg19
    cd reference_hg19
    
    # Download hg19 reference genome (UCSC build)
    # wget https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz
    # gunzip hg19.fa.gz
    
    # For demonstration, let's assume hg19.fa is already present in reference_hg19/
    # If you downloaded it, uncomment the lines above and ensure the file is named hg19.fa
    
    # Build Bowtie index for hg19
    # bowtie-build hg19.fa hg19_index_bowtie
    
    # Build Bowtie2 index for hg19
    # bowtie2-build hg19.fa hg19_index_bowtie2
    
    cd ..
    
    # --- Example for HA-Chip data alignment with Bowtie ---
    # Assume input_ha_chip.fastq is your HA-Chip sequencing data (single-end)
    # For demonstration, create a dummy fastq file:
    # echo "@read1\nAGCTAGCTAGCTAGCT\n+\nIIIIIIIIIIIIIIII" > input_ha_chip.fastq
    
    bowtie -S -p 8 reference_hg19/hg19_index_bowtie input_ha_chip.fastq > output_ha_chip.sam
    
    # --- Example for deep sequencing data alignment with Bowtie2 ---
    # Assume input_deep_seq.fastq is your deep sequencing data (single-end)
    # For demonstration, create a dummy fastq file:
    # echo "@read1\nAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCT\n+\nIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII" > input_deep_seq.fastq
    
    bowtie2 -x reference_hg19/hg19_index_bowtie2 -p 8 -U input_deep_seq.fastq -S output_deep_seq.sam
  3. 3

    Two re-sequenced files : sgRNA2_1_Replicate1 and sgRNA3_1_Replicate1 were merged with sgRNA2_Replicate1 and sgRNA3_Replicate1 respectively

    cat (Inferred with models/gemini-2.5-flash) vN/A
    $ Bash example
    # Merge sgRNA2 files
    cat sgRNA2_1_Replicate1.fastq sgRNA2_Replicate1.fastq > sgRNA2_merged.fastq
    
    # Merge sgRNA3 files
    cat sgRNA3_1_Replicate1.fastq sgRNA3_Replicate1.fastq > sgRNA3_merged.fastq
  4. 4

    Peaks were called by using MACS14 tool for HA-ChIP (-c=Control_dCas9only -p=1e-6)

    MACS v1.4 GitHub
    $ Bash example
    # Install MACS14 (MACS 1.4.2 is often referred to as MACS14)
    # conda install -c bioconda macs=1.4.2
    
    # Define input files and parameters
    # Placeholder: Replace with actual treatment and control BAM files
    HA_CHIP_BAM="HA_ChIP.bam"
    CONTROL_BAM="Control_dCas9only.bam"
    
    # Placeholder: Define genome size (e.g., 'hs' for human, 'mm' for mouse, or a specific number)
    # For a specific number, e.g., 2.7e9 for human, use -g 2.7e9
    GENOME_SIZE="hs"
    
    # Output prefix for MACS files
    OUTPUT_PREFIX="HA_ChIP_peaks"
    
    # Run MACS14 peak calling
    # -t: Treatment file (ChIP-seq data)
    # -c: Control file (Input DNA or IgG control)
    # -f: Format of input files (e.g., 'BAM', 'BED', 'ELAND', 'BOWTIE', 'SAM', 'TAGS')
    # -g: Genome size (e.g., 'hs' for human, 'mm' for mouse, or a specific number)
    # -n: Name of the experiment, which will be used as a prefix for output files
    # -p: P-value cutoff for peak detection
    macs14 -t "${HA_CHIP_BAM}" -c "${CONTROL_BAM}" -f BAM -g "${GENOME_SIZE}" -n "${OUTPUT_PREFIX}" -p 1e-6
  5. 5

    Further data processing steps have been described in the method sections.

    (Inferred with models/gemini-2.5-flash) v(Inferred with models/gemini-2.5-flash)
    $ Bash example
    # No specific command can be inferred from the generic description: 'Further data processing steps have been described in the method sections.'
Raw Source Text
Illlumina MiSeq Sequencer was used for base calling.
Sequence reads were aligned to hg19 reference genome using Bowtie for HA-Chip data and Bowtie2 for deep sequencing data.
Two re-sequenced files : sgRNA2_1_Replicate1 and sgRNA3_1_Replicate1 were merged with sgRNA2_Replicate1 and sgRNA3_Replicate1 respectively
Peaks were called by using MACS14 tool for HA-ChIP (-c=Control_dCas9only -p=1e-6)
Further data processing steps have been described in the method sections.
Genome_build: hg19
Supplementary_files_format_and_content: bigWig and Bed files containing peaks.
← Back to Analysis