GSE55727 Processing Pipeline

RNA-Seq code_examples 5 steps

Publication

Zmat3 Is a Key Splicing Regulator in the p53 Tumor Suppression Program.

Molecular cell (2020) — PMID 33157015

Dataset

GSE55727

Integrative Genomic Analysis Reveals Widespread Enhancer Regulation by p53 in Response to DNA Damage

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    RNA-Seq Read Alignment: TopHat 2.0 with default options using UCSC known genes from hg19 or mm9 as reference transcriptome

    $ Bash example
    # Install TopHat 2.0 (and Bowtie2, which TopHat depends on)
    # conda install -c bioconda tophat=2.0 bowtie2
    
    # --- Reference Preparation (hg19 example) ---
    # Create a directory for reference files
    # mkdir -p references/hg19
    # cd references/hg19
    
    # Download hg19 genome FASTA
    # wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz
    # gunzip hg19.fa.gz
    
    # Build Bowtie2 index for hg19 genome
    # bowtie2-build hg19.fa hg19_index
    
    # Download a GTF file representing UCSC known genes for hg19.
    # Gencode v19 is a comprehensive annotation for hg19 and often used as a proxy for "known genes".
    # wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz
    # gunzip gencode.v19.annotation.gtf.gz
    # mv gencode.v19.annotation.gtf genes.gtf
    # cd ../../ # Go back to the main working directory
    
    # --- RNA-Seq Read Alignment with TopHat 2.0 ---
    # Define variables
    GENOME_INDEX="references/hg19/hg19_index" # Path to the Bowtie2 index for hg19
    GTF_FILE="references/hg19/genes.gtf"     # Path to the GTF file (e.g., Gencode v19 for hg19 known genes)
    READS_R1="input_reads_R1.fastq.gz"        # Placeholder for input forward reads
    READS_R2="input_reads_R2.fastq.gz"        # Placeholder for input reverse reads (remove if single-end)
    OUTPUT_DIR="tophat_alignment_output"      # Output directory for TopHat results
    
    # Execute TopHat 2.0 with default options and specified GTF
    # TopHat 2.0 uses 'tophat2' command.
    tophat2 -o "${OUTPUT_DIR}" \
            --GTF "${GTF_FILE}" \
            "${GENOME_INDEX}" \
            "${READS_R1}" "${READS_R2}"
  2. 2

    RNA-Seq Differential Expression: Cuffdiff 2.1 with default options using UCSC known genes from hg19 or mm9 as reference transcriptome

    $ Bash example
    # Install Cufflinks (which includes Cuffdiff)
    # conda install -c bioconda cufflinks=2.1
    
    # Placeholder for reference transcriptome (UCSC known genes from hg19 or mm9)
    # This GTF file would contain the gene and transcript annotations.
    # You would typically download this from UCSC's Table Browser or a similar resource.
    # For example, a GTF derived from UCSC known genes for hg19:
    REFERENCE_GTF="hg19_UCSC_known_genes.gtf" # Or mm9_UCSC_known_genes.gtf
    
    # Placeholder for input BAM files (aligned RNA-Seq reads for different conditions/replicates)
    # Replace with your actual BAM files. Cuffdiff expects comma-separated BAMs for replicates within a condition.
    CONDITION_A_REPS="conditionA_rep1.bam,conditionA_rep2.bam"
    CONDITION_B_REPS="conditionB_rep1.bam,conditionB_rep2.bam"
    
    # Output directory for Cuffdiff results
    OUTPUT_DIR="cuffdiff_results"
    
    # Run Cuffdiff with default options
    # The command requires the reference GTF, followed by comma-separated BAM files for each condition.
    cuffdiff -o "${OUTPUT_DIR}" "${REFERENCE_GTF}" "${CONDITION_A_REPS}" "${CONDITION_B_REPS}"
  3. 3

    ChIP-Seq Read Alignment: Bowtie 2.0 with default options using hg19 or mm9 as reference genome

    Bowtie v2.0 GitHub
    $ Bash example
    # Install Bowtie 2 (if not already installed)
    # conda install -c bioconda bowtie2
    
    # Define variables
    # REFERENCE_INDEX: Path to the Bowtie2 index prefix (e.g., /path/to/bowtie2_indexes/hg19/hg19)
    # Pre-built indices for hg19 and mm9 can be downloaded from sources like the Bowtie 2 website (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml)
    # or UCSC Genome Browser (e.g., for hg19: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz, then build index with bowtie2-build)
    REFERENCE_INDEX="/path/to/bowtie2_indexes/hg19/hg19" # Example for hg19. Replace with mm9 for mouse.
    READS_R1="sample_R1.fastq.gz" # Replace with your actual R1 FASTQ file
    READS_R2="sample_R2.fastq.gz" # Replace with your actual R2 FASTQ file (remove if single-end)
    OUTPUT_SAM="sample_aligned.sam"
    
    # Perform paired-end alignment with Bowtie 2 using default options
    # -x: reference index basename
    # -1: reads file 1
    # -2: reads file 2
    # -S: output SAM file
    # Default options for Bowtie 2 are generally suitable for ChIP-Seq.
    # For single-end reads, use -U instead of -1 and -2: bowtie2 -x "${REFERENCE_INDEX}" -U "${READS_R1}" -S "${OUTPUT_SAM}"
    bowtie2 -x "${REFERENCE_INDEX}" -1 "${READS_R1}" -2 "${READS_R2}" -S "${OUTPUT_SAM}"
    
    # Common post-alignment steps (e.g., convert SAM to BAM, sort, and index)
    # OUTPUT_BAM="sample_aligned.bam"
    # OUTPUT_SORTED_BAM="sample_aligned.sorted.bam"
    # samtools view -bS "${OUTPUT_SAM}" > "${OUTPUT_BAM}"
    # samtools sort "${OUTPUT_BAM}" -o "${OUTPUT_SORTED_BAM}"
    # samtools index "${OUTPUT_SORTED_BAM}"
  4. 4

    ChIP-Seq general peak calling: Scripture using hg19 or mm9 as reference genome

    ChIP-seq v1.0 (Inferred with models/gemini-2.5-flash)
    $ Bash example
    # Install Scripture (requires Java)
    # Download Scripture from the Broad Institute website (e.g., https://www.broadinstitute.org/cancer/cga/scripture_download)
    # Ensure Java is installed and in your PATH
    # For example, if you downloaded scripture.jar to your current directory:
    
    # Define variables for input files, genome, and output
    CHIP_BAM="chip_sample.bam" # Aligned ChIP-seq reads (BAM format)
    CONTROL_BAM="control_sample.bam" # Aligned control/input reads (BAM format)
    GENOME_FASTA="/path/to/reference/hg19.fa" # Reference genome FASTA (hg19 or mm9)
    GENOME_SIZE_FILE="/path/to/reference/hg19.chrom.sizes" # Chromosome sizes file (e.g., from UCSC)
    OUTPUT_DIR="scripture_output"
    
    # Create output directory if it doesn't exist
    mkdir -p "${OUTPUT_DIR}"
    
    # Run Scripture for peak calling
    # Note: Scripture requires specific input formats (e.g., .bed or .wig for reads, .fasta for genome, .chrom.sizes for genome sizes).
    # If your input is BAM, you might need to convert it to BED or WIG first.
    # Example conversion from BAM to BED (adjust for paired-end if necessary):
    # samtools view -b -F 0x04 "${CHIP_BAM}" | bedtools bamtobed -i - > "${OUTPUT_DIR}/chip_sample.bed"
    # samtools view -b -F 0x04 "${CONTROL_BAM}" | bedtools bamtobed -i - > "${OUTPUT_DIR}/control_sample.bed"
    
    # Assuming input BED files are available:
    CHIP_BED="${OUTPUT_DIR}/chip_sample.bed"
    CONTROL_BED="${OUTPUT_DIR}/control_sample.bed"
    
    # Example command for Scripture (parameters may need adjustment based on specific data and desired sensitivity/specificity)
    # This is a general example, actual parameters might vary.
    # -r: ChIP-seq read file (BED format)
    # -c: Control read file (BED format)
    # -g: Reference genome FASTA file
    # -s: Genome sizes file
    # -o: Output directory
    # -p: P-value threshold (e.g., 0.05)
    # -f: FDR threshold (e.g., 0.05)
    # -w: Window size (e.g., 100 bp)
    # -e: Extension size (e.g., 150 bp for typical fragment length)
    
    java -Xmx4g -jar scripture.jar \
      -r "${CHIP_BED}" \
      -c "${CONTROL_BED}" \
      -g "${GENOME_FASTA}" \
      -s "${GENOME_SIZE_FILE}" \
      -o "${OUTPUT_DIR}" \
      -p 0.05 \
      -f 0.05 \
      -w 100 \
      -e 150
    
    echo "Scripture peak calling completed. Results are in ${OUTPUT_DIR}"
  5. 5

    ChIP-Seq significant peak calling: Cuffdiff 2.0 using Scripture output as peak reference

    $ Bash example
    # Install Cufflinks suite (includes Cuffdiff)
    # conda install -c bioconda cufflinks=2.0
    
    # Placeholder for Scripture output converted to GTF format.
    # Scripture typically outputs BED files of transcribed regions.
    # Cuffdiff requires a GTF file of transcripts for quantification.
    # This step assumes a conversion from Scripture BED to GTF has been performed.
    # Replace 'scripture_output.gtf' with the actual path to your converted Scripture output.
    SCRIPTURE_GTF="scripture_output.gtf"
    
    # Placeholder for ChIP-Seq alignment BAM files for two conditions/samples.
    # Replace with actual paths to your aligned ChIP-Seq data for each replicate.
    CHIP_BAM_SAMPLE1_REP1="chip_sample1_conditionA_replicate1.bam"
    CHIP_BAM_SAMPLE1_REP2="chip_sample1_conditionA_replicate2.bam"
    CHIP_BAM_SAMPLE2_REP1="chip_sample2_conditionB_replicate1.bam"
    CHIP_BAM_SAMPLE2_REP2="chip_sample2_conditionB_replicate2.bam"
    
    # Output directory for Cuffdiff results
    OUTPUT_DIR="cuffdiff_chipseq_results"
    
    # Run Cuffdiff for differential expression analysis.
    # Note: Cuffdiff is primarily designed for RNA-Seq differential expression analysis,
    # not direct ChIP-Seq peak calling. Its application here for "ChIP-Seq significant peak calling"
    # using "Scripture output as peak reference" is highly unconventional.
    # This command will perform differential quantification of features
    # defined in the SCRIPTURE_GTF across the provided ChIP-Seq BAM files,
    # treating them as expression data. The output will be differential abundance
    # of these features, not traditional ChIP-Seq peaks.
    cuffdiff \
        -o "${OUTPUT_DIR}" \
        -L "ConditionA,ConditionB" \
        "${SCRIPTURE_GTF}" \
        "${CHIP_BAM_SAMPLE1_REP1},${CHIP_BAM_SAMPLE1_REP2}" \
        "${CHIP_BAM_SAMPLE2_REP1},${CHIP_BAM_SAMPLE2_REP2}"
    

Tools Used

Raw Source Text
RNA-Seq Read Alignment: TopHat 2.0 with default options using UCSC known genes from hg19 or mm9 as reference transcriptome
RNA-Seq Differential Expression: Cuffdiff 2.1 with default options using UCSC known genes from hg19 or mm9 as reference transcriptome
ChIP-Seq Read Alignment: Bowtie 2.0 with default options using hg19 or mm9 as reference genome
ChIP-Seq general peak calling: Scripture using hg19 or mm9 as reference genome
ChIP-Seq significant peak calling: Cuffdiff 2.0 using Scripture output as peak reference
Genome_build: hg19 / mm9
Supplementary_files_format_and_content: TXT files are expression matrices with FPKM values for each sample / BED files are p53 ChIP-Seq peak coordinates
← Back to Analysis