GSE234078 Processing Pipeline

GSE code_examples 3 steps

Publication

A super-enhancer-regulated RNA-binding protein cascade drives pancreatic cancer.

Nature communications (2023) — PMID 37673892

Dataset

GSE234078

A super-enhancer regulated RNA-binding protein cascade drives pancreatic cancer

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    For human samples, reads were aligned using Bowtie2 to GRCh37 and SE peaks were called using HOMER v4.11.1 default settings (-style super, Fold change >4, p-value <0.0001).

    Bowtie2 vNot specified GitHub
    $ Bash example
    # --- Installation (commented out) ---
    # conda install -c bioconda bowtie2 samtools homer
    
    # --- Reference Data Setup ---
    # Download GRCh37 (hg19) primary assembly FASTA
    # wget -O GRCh37.fa.gz ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
    # gunzip GRCh37.fa.gz
    GENOME_FASTA="GRCh37.fa" # Path to your unzipped GRCh37 FASTA file
    
    # Build Bowtie2 index for GRCh37
    BOWTIE2_INDEX_PREFIX="GRCh37_index" # Prefix for Bowtie2 index files
    # bowtie2-build "${GENOME_FASTA}" "${BOWTIE2_INDEX_PREFIX}"
    
    # Configure HOMER genome (hg19 is GRCh37)
    HOMER_GENOME="hg19" # HOMER uses UCSC genome names; hg19 corresponds to GRCh37
    # perl /path/to/homer/bin/configureHomer.pl -install "${HOMER_GENOME}"
    
    # --- Input/Output Variables ---
    READS_FASTQ="reads.fastq" # Replace with your actual input FASTQ file
    OUTPUT_SAM="aligned.sam"
    OUTPUT_BAM="aligned.bam"
    OUTPUT_SORTED_BAM="aligned.sorted.bam"
    CONTROL_BAM="control.sorted.bam" # Replace with your actual control BAM file (crucial for Super Enhancer calling)
    OUTPUT_PEAKS_DIR="homer_super_enhancer_peaks"
    OUTPUT_PEAKS_FILE="${OUTPUT_PEAKS_DIR}/super_enhancers.txt"
    
    # --- 1. Alignment using Bowtie2 ---
    # Reads were aligned using Bowtie2 to GRCh37
    echo "Starting Bowtie2 alignment..."
    bowtie2 -x "${BOWTIE2_INDEX_PREFIX}" -U "${READS_FASTQ}" -S "${OUTPUT_SAM}"
    echo "Bowtie2 alignment complete."
    
    # Convert SAM to BAM, sort, and index
    echo "Converting SAM to BAM, sorting, and indexing..."
    samtools view -bS "${OUTPUT_SAM}" > "${OUTPUT_BAM}"
    samtools sort "${OUTPUT_BAM}" -o "${OUTPUT_SORTED_BAM}"
    samtools index "${OUTPUT_SORTED_BAM}"
    echo "BAM processing complete."
    
    # --- 2. Super Enhancer (SE) Peak Calling using HOMER v4.11.1 ---
    # SE peaks were called using HOMER v4.11.1 default settings (-style super, Fold change >4, p-value <0.0001)
    echo "Starting HOMER Super Enhancer peak calling..."
    mkdir -p "${OUTPUT_PEAKS_DIR}"
    findPeaks "${OUTPUT_SORTED_BAM}" -style super -o auto -F 4 -p 0.0001 -g "${HOMER_GENOME}" -input "${CONTROL_BAM}" -o "${OUTPUT_PEAKS_FILE}"
    echo "HOMER Super Enhancer peak calling complete. Results in ${OUTPUT_PEAKS_FILE}"
  2. 2

    For mouse samples, reads were aligned using Bowtie2 to MGSCv37 (mm9) and differential peaks were called using HOMER’s79 default settings (Fold change >4, p-value <0.0001) using '-style factor'.

    $ Bash example
    # Install Bowtie2 (example using conda)
    # conda install -c bioconda bowtie2=2.4.5
    
    # Install HOMER (example using conda)
    # conda install -c bioconda homer=4.11
    
    # --- Bowtie2 Alignment ---
    # Define input and output files
    READS_R1="sample_R1.fastq.gz" # Replace with actual R1 reads file
    READS_R2="sample_R2.fastq.gz" # Replace with actual R2 reads file (remove if single-end)
    BOWTIE2_INDEX_PREFIX="/path/to/bowtie2_indexes/mm9" # Path to Bowtie2 index for MGSCv37 (mm9)
    ALIGNED_BAM="aligned_reads.bam"
    
    # Align reads using Bowtie2
    # Assuming paired-end reads based on common practice, adjust if single-end
    bowtie2 -x "${BOWTIE2_INDEX_PREFIX}" -1 "${READS_R1}" -2 "${READS_R2}" \
            -S "aligned_reads.sam" \
            --threads 8 # Example: use 8 threads, adjust as needed
    
    # Convert SAM to BAM and sort
    samtools view -bS "aligned_reads.sam" | samtools sort -o "${ALIGNED_BAM}" -
    
    # Remove intermediate SAM file
    rm "aligned_reads.sam"
    
    # --- HOMER Differential Peak Calling ---
    # Define HOMER genome directory (e.g., where mm9 is installed)
    HOMER_GENOME_DIR="/path/to/homer/data/genomes/mm9" # Path to HOMER genome data for mm9
    
    # Define input BAM files for treatment and control
    TREATMENT_BAM="${ALIGNED_BAM}" # Output from Bowtie2 for treatment sample
    CONTROL_BAM="control_aligned_reads.bam" # Replace with actual control BAM file
    
    # Define output directory and peak file name
    OUTPUT_DIR="homer_peaks"
    PEAK_FILE="${OUTPUT_DIR}/differential_peaks.txt"
    
    mkdir -p "${OUTPUT_DIR}"
    
    # 1. Create Tag Directories for treatment and control
    # This step processes BAM files into HOMER's tag directory format
    makeTagDirectory "${OUTPUT_DIR}/treatment_tags" "${TREATMENT_BAM}" -genome "${HOMER_GENOME_DIR}"
    makeTagDirectory "${OUTPUT_DIR}/control_tags" "${CONTROL_BAM}" -genome "${HOMER_GENOME_DIR}"
    
    # 2. Call differential peaks using findPeaks with specified parameters
    # -style factor: appropriate for transcription factor ChIP-seq
    # -F 4: Fold change > 4
    # -P 0.0001: p-value < 0.0001
    findPeaks "${OUTPUT_DIR}/treatment_tags" \
              -o "${PEAK_FILE}" \
              -i "${OUTPUT_DIR}/control_tags" \
              -style factor \
              -F 4 \
              -P 0.0001 \
              -genome "${HOMER_GENOME_DIR}"
  3. 3

    Data tracks were visualized using IGV v2.3.90.

    IGV v2.3.90 GitHub
    $ Bash example
    # Install IGV via conda (recommended)
    # conda create -n igv_env igv=2.3.90 -c bioconda -y
    # conda activate igv_env
    
    # Alternatively, download the specific version from the Broad Institute archive:
    # wget https://data.broadinstitute.org/igv/projects/downloads/2.3/IGV_2.3.90.zip
    # unzip IGV_2.3.90.zip
    # cd IGV_2.3.90
    # chmod +x igv.sh
    
    # Launch IGV and load a reference genome and data tracks for visualization.
    # Replace 'hg38' with the actual reference genome assembly used.
    # Replace 'your_alignment.bam', 'your_coverage.bigwig', 'your_peaks.bed' with the actual data files that were visualized.
    # IGV is primarily a GUI tool; this command launches the application and loads specified files.
    igv --genome hg38 your_alignment.bam your_coverage.bigwig your_peaks.bed

Tools Used

Raw Source Text
For human samples, reads were aligned using Bowtie2 to GRCh37 and SE peaks were called using HOMER v4.11.1 default settings (-style super, Fold change >4, p-value <0.0001).
For mouse samples, reads were aligned using Bowtie2 to MGSCv37 (mm9) and differential peaks were called using HOMER’s79 default settings (Fold change >4, p-value <0.0001) using '-style factor'.
Data tracks were visualized using IGV v2.3.90.
Assembly: GRCh37 (human) and MGSCv37 (mm9) for mouse
Supplementary files format and content: bedGraph, bed
← Back to Analysis