GSE234078 Processing Pipeline
GSE
code_examples
3 steps
Publication
A super-enhancer-regulated RNA-binding protein cascade drives pancreatic cancer.Nature communications (2023) — PMID 37673892
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
For human samples, reads were aligned using Bowtie2 to GRCh37 and SE peaks were called using HOMER v4.11.1 default settings (-style super, Fold change >4, p-value <0.0001).
$ Bash example
# --- Installation (commented out) --- # conda install -c bioconda bowtie2 samtools homer # --- Reference Data Setup --- # Download GRCh37 (hg19) primary assembly FASTA # wget -O GRCh37.fa.gz ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz # gunzip GRCh37.fa.gz GENOME_FASTA="GRCh37.fa" # Path to your unzipped GRCh37 FASTA file # Build Bowtie2 index for GRCh37 BOWTIE2_INDEX_PREFIX="GRCh37_index" # Prefix for Bowtie2 index files # bowtie2-build "${GENOME_FASTA}" "${BOWTIE2_INDEX_PREFIX}" # Configure HOMER genome (hg19 is GRCh37) HOMER_GENOME="hg19" # HOMER uses UCSC genome names; hg19 corresponds to GRCh37 # perl /path/to/homer/bin/configureHomer.pl -install "${HOMER_GENOME}" # --- Input/Output Variables --- READS_FASTQ="reads.fastq" # Replace with your actual input FASTQ file OUTPUT_SAM="aligned.sam" OUTPUT_BAM="aligned.bam" OUTPUT_SORTED_BAM="aligned.sorted.bam" CONTROL_BAM="control.sorted.bam" # Replace with your actual control BAM file (crucial for Super Enhancer calling) OUTPUT_PEAKS_DIR="homer_super_enhancer_peaks" OUTPUT_PEAKS_FILE="${OUTPUT_PEAKS_DIR}/super_enhancers.txt" # --- 1. Alignment using Bowtie2 --- # Reads were aligned using Bowtie2 to GRCh37 echo "Starting Bowtie2 alignment..." bowtie2 -x "${BOWTIE2_INDEX_PREFIX}" -U "${READS_FASTQ}" -S "${OUTPUT_SAM}" echo "Bowtie2 alignment complete." # Convert SAM to BAM, sort, and index echo "Converting SAM to BAM, sorting, and indexing..." samtools view -bS "${OUTPUT_SAM}" > "${OUTPUT_BAM}" samtools sort "${OUTPUT_BAM}" -o "${OUTPUT_SORTED_BAM}" samtools index "${OUTPUT_SORTED_BAM}" echo "BAM processing complete." # --- 2. Super Enhancer (SE) Peak Calling using HOMER v4.11.1 --- # SE peaks were called using HOMER v4.11.1 default settings (-style super, Fold change >4, p-value <0.0001) echo "Starting HOMER Super Enhancer peak calling..." mkdir -p "${OUTPUT_PEAKS_DIR}" findPeaks "${OUTPUT_SORTED_BAM}" -style super -o auto -F 4 -p 0.0001 -g "${HOMER_GENOME}" -input "${CONTROL_BAM}" -o "${OUTPUT_PEAKS_FILE}" echo "HOMER Super Enhancer peak calling complete. Results in ${OUTPUT_PEAKS_FILE}" -
2
For mouse samples, reads were aligned using Bowtie2 to MGSCv37 (mm9) and differential peaks were called using HOMERâs79 default settings (Fold change >4, p-value <0.0001) using '-style factor'.
$ Bash example
# Install Bowtie2 (example using conda) # conda install -c bioconda bowtie2=2.4.5 # Install HOMER (example using conda) # conda install -c bioconda homer=4.11 # --- Bowtie2 Alignment --- # Define input and output files READS_R1="sample_R1.fastq.gz" # Replace with actual R1 reads file READS_R2="sample_R2.fastq.gz" # Replace with actual R2 reads file (remove if single-end) BOWTIE2_INDEX_PREFIX="/path/to/bowtie2_indexes/mm9" # Path to Bowtie2 index for MGSCv37 (mm9) ALIGNED_BAM="aligned_reads.bam" # Align reads using Bowtie2 # Assuming paired-end reads based on common practice, adjust if single-end bowtie2 -x "${BOWTIE2_INDEX_PREFIX}" -1 "${READS_R1}" -2 "${READS_R2}" \ -S "aligned_reads.sam" \ --threads 8 # Example: use 8 threads, adjust as needed # Convert SAM to BAM and sort samtools view -bS "aligned_reads.sam" | samtools sort -o "${ALIGNED_BAM}" - # Remove intermediate SAM file rm "aligned_reads.sam" # --- HOMER Differential Peak Calling --- # Define HOMER genome directory (e.g., where mm9 is installed) HOMER_GENOME_DIR="/path/to/homer/data/genomes/mm9" # Path to HOMER genome data for mm9 # Define input BAM files for treatment and control TREATMENT_BAM="${ALIGNED_BAM}" # Output from Bowtie2 for treatment sample CONTROL_BAM="control_aligned_reads.bam" # Replace with actual control BAM file # Define output directory and peak file name OUTPUT_DIR="homer_peaks" PEAK_FILE="${OUTPUT_DIR}/differential_peaks.txt" mkdir -p "${OUTPUT_DIR}" # 1. Create Tag Directories for treatment and control # This step processes BAM files into HOMER's tag directory format makeTagDirectory "${OUTPUT_DIR}/treatment_tags" "${TREATMENT_BAM}" -genome "${HOMER_GENOME_DIR}" makeTagDirectory "${OUTPUT_DIR}/control_tags" "${CONTROL_BAM}" -genome "${HOMER_GENOME_DIR}" # 2. Call differential peaks using findPeaks with specified parameters # -style factor: appropriate for transcription factor ChIP-seq # -F 4: Fold change > 4 # -P 0.0001: p-value < 0.0001 findPeaks "${OUTPUT_DIR}/treatment_tags" \ -o "${PEAK_FILE}" \ -i "${OUTPUT_DIR}/control_tags" \ -style factor \ -F 4 \ -P 0.0001 \ -genome "${HOMER_GENOME_DIR}" -
3
Data tracks were visualized using IGV v2.3.90.
$ Bash example
# Install IGV via conda (recommended) # conda create -n igv_env igv=2.3.90 -c bioconda -y # conda activate igv_env # Alternatively, download the specific version from the Broad Institute archive: # wget https://data.broadinstitute.org/igv/projects/downloads/2.3/IGV_2.3.90.zip # unzip IGV_2.3.90.zip # cd IGV_2.3.90 # chmod +x igv.sh # Launch IGV and load a reference genome and data tracks for visualization. # Replace 'hg38' with the actual reference genome assembly used. # Replace 'your_alignment.bam', 'your_coverage.bigwig', 'your_peaks.bed' with the actual data files that were visualized. # IGV is primarily a GUI tool; this command launches the application and loads specified files. igv --genome hg38 your_alignment.bam your_coverage.bigwig your_peaks.bed
Tools Used
Raw Source Text
For human samples, reads were aligned using Bowtie2 to GRCh37 and SE peaks were called using HOMER v4.11.1 default settings (-style super, Fold change >4, p-value <0.0001). For mouse samples, reads were aligned using Bowtie2 to MGSCv37 (mm9) and differential peaks were called using HOMERâs79 default settings (Fold change >4, p-value <0.0001) using '-style factor'. Data tracks were visualized using IGV v2.3.90. Assembly: GRCh37 (human) and MGSCv37 (mm9) for mouse Supplementary files format and content: bedGraph, bed