GSE234078 Processing Pipeline

GSE code_examples 3 steps

Publication

A super-enhancer-regulated RNA-binding protein cascade drives pancreatic cancer.

Nature communications (2023) — PMID 37673892

Dataset

A super-enhancer regulated RNA-binding protein cascade drives pancreatic cancer

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

For human samples, reads were aligned using Bowtie2 to GRCh37 and SE peaks were called using HOMER v4.11.1 default settings (-style super, Fold change >4, p-value <0.0001).

Bowtie2 vNot specified GitHub

$ Bash example

# --- Installation (commented out) ---
# conda install -c bioconda bowtie2 samtools homer

# --- Reference Data Setup ---
# Download GRCh37 (hg19) primary assembly FASTA
# wget -O GRCh37.fa.gz ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
# gunzip GRCh37.fa.gz
GENOME_FASTA="GRCh37.fa" # Path to your unzipped GRCh37 FASTA file

# Build Bowtie2 index for GRCh37
BOWTIE2_INDEX_PREFIX="GRCh37_index" # Prefix for Bowtie2 index files
# bowtie2-build "${GENOME_FASTA}" "${BOWTIE2_INDEX_PREFIX}"

# Configure HOMER genome (hg19 is GRCh37)
HOMER_GENOME="hg19" # HOMER uses UCSC genome names; hg19 corresponds to GRCh37
# perl /path/to/homer/bin/configureHomer.pl -install "${HOMER_GENOME}"

# --- Input/Output Variables ---
READS_FASTQ="reads.fastq" # Replace with your actual input FASTQ file
OUTPUT_SAM="aligned.sam"
OUTPUT_BAM="aligned.bam"
OUTPUT_SORTED_BAM="aligned.sorted.bam"
CONTROL_BAM="control.sorted.bam" # Replace with your actual control BAM file (crucial for Super Enhancer calling)
OUTPUT_PEAKS_DIR="homer_super_enhancer_peaks"
OUTPUT_PEAKS_FILE="${OUTPUT_PEAKS_DIR}/super_enhancers.txt"

# --- 1. Alignment using Bowtie2 ---
# Reads were aligned using Bowtie2 to GRCh37
echo "Starting Bowtie2 alignment..."
bowtie2 -x "${BOWTIE2_INDEX_PREFIX}" -U "${READS_FASTQ}" -S "${OUTPUT_SAM}"
echo "Bowtie2 alignment complete."

# Convert SAM to BAM, sort, and index
echo "Converting SAM to BAM, sorting, and indexing..."
samtools view -bS "${OUTPUT_SAM}" > "${OUTPUT_BAM}"
samtools sort "${OUTPUT_BAM}" -o "${OUTPUT_SORTED_BAM}"
samtools index "${OUTPUT_SORTED_BAM}"
echo "BAM processing complete."

# --- 2. Super Enhancer (SE) Peak Calling using HOMER v4.11.1 ---
# SE peaks were called using HOMER v4.11.1 default settings (-style super, Fold change >4, p-value <0.0001)
echo "Starting HOMER Super Enhancer peak calling..."
mkdir -p "${OUTPUT_PEAKS_DIR}"
findPeaks "${OUTPUT_SORTED_BAM}" -style super -o auto -F 4 -p 0.0001 -g "${HOMER_GENOME}" -input "${CONTROL_BAM}" -o "${OUTPUT_PEAKS_FILE}"
echo "HOMER Super Enhancer peak calling complete. Results in ${OUTPUT_PEAKS_FILE}"

View on GitHub

For mouse samples, reads were aligned using Bowtie2 to MGSCv37 (mm9) and differential peaks were called using HOMERâs79 default settings (Fold change >4, p-value <0.0001) using '-style factor'.

Bowtie2 v2.4.5 GitHub

$ Bash example

# Install Bowtie2 (example using conda)
# conda install -c bioconda bowtie2=2.4.5

# Install HOMER (example using conda)
# conda install -c bioconda homer=4.11

# --- Bowtie2 Alignment ---
# Define input and output files
READS_R1="sample_R1.fastq.gz" # Replace with actual R1 reads file
READS_R2="sample_R2.fastq.gz" # Replace with actual R2 reads file (remove if single-end)
BOWTIE2_INDEX_PREFIX="/path/to/bowtie2_indexes/mm9" # Path to Bowtie2 index for MGSCv37 (mm9)
ALIGNED_BAM="aligned_reads.bam"

# Align reads using Bowtie2
# Assuming paired-end reads based on common practice, adjust if single-end
bowtie2 -x "${BOWTIE2_INDEX_PREFIX}" -1 "${READS_R1}" -2 "${READS_R2}" \
        -S "aligned_reads.sam" \
        --threads 8 # Example: use 8 threads, adjust as needed

# Convert SAM to BAM and sort
samtools view -bS "aligned_reads.sam" | samtools sort -o "${ALIGNED_BAM}" -

# Remove intermediate SAM file
rm "aligned_reads.sam"

# --- HOMER Differential Peak Calling ---
# Define HOMER genome directory (e.g., where mm9 is installed)
HOMER_GENOME_DIR="/path/to/homer/data/genomes/mm9" # Path to HOMER genome data for mm9

# Define input BAM files for treatment and control
TREATMENT_BAM="${ALIGNED_BAM}" # Output from Bowtie2 for treatment sample
CONTROL_BAM="control_aligned_reads.bam" # Replace with actual control BAM file

# Define output directory and peak file name
OUTPUT_DIR="homer_peaks"
PEAK_FILE="${OUTPUT_DIR}/differential_peaks.txt"

mkdir -p "${OUTPUT_DIR}"

# 1. Create Tag Directories for treatment and control
# This step processes BAM files into HOMER's tag directory format
makeTagDirectory "${OUTPUT_DIR}/treatment_tags" "${TREATMENT_BAM}" -genome "${HOMER_GENOME_DIR}"
makeTagDirectory "${OUTPUT_DIR}/control_tags" "${CONTROL_BAM}" -genome "${HOMER_GENOME_DIR}"

# 2. Call differential peaks using findPeaks with specified parameters
# -style factor: appropriate for transcription factor ChIP-seq
# -F 4: Fold change > 4
# -P 0.0001: p-value < 0.0001
findPeaks "${OUTPUT_DIR}/treatment_tags" \
          -o "${PEAK_FILE}" \
          -i "${OUTPUT_DIR}/control_tags" \
          -style factor \
          -F 4 \
          -P 0.0001 \
          -genome "${HOMER_GENOME_DIR}"

View on GitHub

Data tracks were visualized using IGV v2.3.90.

IGV v2.3.90 GitHub

$ Bash example

# Install IGV via conda (recommended)
# conda create -n igv_env igv=2.3.90 -c bioconda -y
# conda activate igv_env

# Alternatively, download the specific version from the Broad Institute archive:
# wget https://data.broadinstitute.org/igv/projects/downloads/2.3/IGV_2.3.90.zip
# unzip IGV_2.3.90.zip
# cd IGV_2.3.90
# chmod +x igv.sh

# Launch IGV and load a reference genome and data tracks for visualization.
# Replace 'hg38' with the actual reference genome assembly used.
# Replace 'your_alignment.bam', 'your_coverage.bigwig', 'your_peaks.bed' with the actual data files that were visualized.
# IGV is primarily a GUI tool; this command launches the application and loads specified files.
igv --genome hg38 your_alignment.bam your_coverage.bigwig your_peaks.bed

View on GitHub

Tools Used

Bowtie2

Raw Source Text

For human samples, reads were aligned using Bowtie2 to GRCh37 and SE peaks were called using HOMER v4.11.1 default settings (-style super, Fold change >4, p-value <0.0001).
For mouse samples, reads were aligned using Bowtie2 to MGSCv37 (mm9) and differential peaks were called using HOMERâs79 default settings (Fold change >4, p-value <0.0001) using '-style factor'.
Data tracks were visualized using IGV v2.3.90.
Assembly: GRCh37 (human) and MGSCv37 (mm9) for mouse
Supplementary files format and content: bedGraph, bed

← Back to Analysis