GSE55727 Processing Pipeline
RNA-Seq
code_examples
5 steps
Publication
Zmat3 Is a Key Splicing Regulator in the p53 Tumor Suppression Program.Molecular cell (2020) — PMID 33157015
Dataset
GSE55727Integrative Genomic Analysis Reveals Widespread Enhancer Regulation by p53 in Response to DNA Damage
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
RNA-Seq Read Alignment: TopHat 2.0 with default options using UCSC known genes from hg19 or mm9 as reference transcriptome
$ Bash example
# Install TopHat 2.0 (and Bowtie2, which TopHat depends on) # conda install -c bioconda tophat=2.0 bowtie2 # --- Reference Preparation (hg19 example) --- # Create a directory for reference files # mkdir -p references/hg19 # cd references/hg19 # Download hg19 genome FASTA # wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz # gunzip hg19.fa.gz # Build Bowtie2 index for hg19 genome # bowtie2-build hg19.fa hg19_index # Download a GTF file representing UCSC known genes for hg19. # Gencode v19 is a comprehensive annotation for hg19 and often used as a proxy for "known genes". # wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz # gunzip gencode.v19.annotation.gtf.gz # mv gencode.v19.annotation.gtf genes.gtf # cd ../../ # Go back to the main working directory # --- RNA-Seq Read Alignment with TopHat 2.0 --- # Define variables GENOME_INDEX="references/hg19/hg19_index" # Path to the Bowtie2 index for hg19 GTF_FILE="references/hg19/genes.gtf" # Path to the GTF file (e.g., Gencode v19 for hg19 known genes) READS_R1="input_reads_R1.fastq.gz" # Placeholder for input forward reads READS_R2="input_reads_R2.fastq.gz" # Placeholder for input reverse reads (remove if single-end) OUTPUT_DIR="tophat_alignment_output" # Output directory for TopHat results # Execute TopHat 2.0 with default options and specified GTF # TopHat 2.0 uses 'tophat2' command. tophat2 -o "${OUTPUT_DIR}" \ --GTF "${GTF_FILE}" \ "${GENOME_INDEX}" \ "${READS_R1}" "${READS_R2}" -
2
RNA-Seq Differential Expression: Cuffdiff 2.1 with default options using UCSC known genes from hg19 or mm9 as reference transcriptome
$ Bash example
# Install Cufflinks (which includes Cuffdiff) # conda install -c bioconda cufflinks=2.1 # Placeholder for reference transcriptome (UCSC known genes from hg19 or mm9) # This GTF file would contain the gene and transcript annotations. # You would typically download this from UCSC's Table Browser or a similar resource. # For example, a GTF derived from UCSC known genes for hg19: REFERENCE_GTF="hg19_UCSC_known_genes.gtf" # Or mm9_UCSC_known_genes.gtf # Placeholder for input BAM files (aligned RNA-Seq reads for different conditions/replicates) # Replace with your actual BAM files. Cuffdiff expects comma-separated BAMs for replicates within a condition. CONDITION_A_REPS="conditionA_rep1.bam,conditionA_rep2.bam" CONDITION_B_REPS="conditionB_rep1.bam,conditionB_rep2.bam" # Output directory for Cuffdiff results OUTPUT_DIR="cuffdiff_results" # Run Cuffdiff with default options # The command requires the reference GTF, followed by comma-separated BAM files for each condition. cuffdiff -o "${OUTPUT_DIR}" "${REFERENCE_GTF}" "${CONDITION_A_REPS}" "${CONDITION_B_REPS}" -
3
ChIP-Seq Read Alignment: Bowtie 2.0 with default options using hg19 or mm9 as reference genome
$ Bash example
# Install Bowtie 2 (if not already installed) # conda install -c bioconda bowtie2 # Define variables # REFERENCE_INDEX: Path to the Bowtie2 index prefix (e.g., /path/to/bowtie2_indexes/hg19/hg19) # Pre-built indices for hg19 and mm9 can be downloaded from sources like the Bowtie 2 website (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) # or UCSC Genome Browser (e.g., for hg19: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz, then build index with bowtie2-build) REFERENCE_INDEX="/path/to/bowtie2_indexes/hg19/hg19" # Example for hg19. Replace with mm9 for mouse. READS_R1="sample_R1.fastq.gz" # Replace with your actual R1 FASTQ file READS_R2="sample_R2.fastq.gz" # Replace with your actual R2 FASTQ file (remove if single-end) OUTPUT_SAM="sample_aligned.sam" # Perform paired-end alignment with Bowtie 2 using default options # -x: reference index basename # -1: reads file 1 # -2: reads file 2 # -S: output SAM file # Default options for Bowtie 2 are generally suitable for ChIP-Seq. # For single-end reads, use -U instead of -1 and -2: bowtie2 -x "${REFERENCE_INDEX}" -U "${READS_R1}" -S "${OUTPUT_SAM}" bowtie2 -x "${REFERENCE_INDEX}" -1 "${READS_R1}" -2 "${READS_R2}" -S "${OUTPUT_SAM}" # Common post-alignment steps (e.g., convert SAM to BAM, sort, and index) # OUTPUT_BAM="sample_aligned.bam" # OUTPUT_SORTED_BAM="sample_aligned.sorted.bam" # samtools view -bS "${OUTPUT_SAM}" > "${OUTPUT_BAM}" # samtools sort "${OUTPUT_BAM}" -o "${OUTPUT_SORTED_BAM}" # samtools index "${OUTPUT_SORTED_BAM}" -
4
ChIP-Seq general peak calling: Scripture using hg19 or mm9 as reference genome
ChIP-seq v1.0 (Inferred with models/gemini-2.5-flash)$ Bash example
# Install Scripture (requires Java) # Download Scripture from the Broad Institute website (e.g., https://www.broadinstitute.org/cancer/cga/scripture_download) # Ensure Java is installed and in your PATH # For example, if you downloaded scripture.jar to your current directory: # Define variables for input files, genome, and output CHIP_BAM="chip_sample.bam" # Aligned ChIP-seq reads (BAM format) CONTROL_BAM="control_sample.bam" # Aligned control/input reads (BAM format) GENOME_FASTA="/path/to/reference/hg19.fa" # Reference genome FASTA (hg19 or mm9) GENOME_SIZE_FILE="/path/to/reference/hg19.chrom.sizes" # Chromosome sizes file (e.g., from UCSC) OUTPUT_DIR="scripture_output" # Create output directory if it doesn't exist mkdir -p "${OUTPUT_DIR}" # Run Scripture for peak calling # Note: Scripture requires specific input formats (e.g., .bed or .wig for reads, .fasta for genome, .chrom.sizes for genome sizes). # If your input is BAM, you might need to convert it to BED or WIG first. # Example conversion from BAM to BED (adjust for paired-end if necessary): # samtools view -b -F 0x04 "${CHIP_BAM}" | bedtools bamtobed -i - > "${OUTPUT_DIR}/chip_sample.bed" # samtools view -b -F 0x04 "${CONTROL_BAM}" | bedtools bamtobed -i - > "${OUTPUT_DIR}/control_sample.bed" # Assuming input BED files are available: CHIP_BED="${OUTPUT_DIR}/chip_sample.bed" CONTROL_BED="${OUTPUT_DIR}/control_sample.bed" # Example command for Scripture (parameters may need adjustment based on specific data and desired sensitivity/specificity) # This is a general example, actual parameters might vary. # -r: ChIP-seq read file (BED format) # -c: Control read file (BED format) # -g: Reference genome FASTA file # -s: Genome sizes file # -o: Output directory # -p: P-value threshold (e.g., 0.05) # -f: FDR threshold (e.g., 0.05) # -w: Window size (e.g., 100 bp) # -e: Extension size (e.g., 150 bp for typical fragment length) java -Xmx4g -jar scripture.jar \ -r "${CHIP_BED}" \ -c "${CONTROL_BED}" \ -g "${GENOME_FASTA}" \ -s "${GENOME_SIZE_FILE}" \ -o "${OUTPUT_DIR}" \ -p 0.05 \ -f 0.05 \ -w 100 \ -e 150 echo "Scripture peak calling completed. Results are in ${OUTPUT_DIR}" -
5
ChIP-Seq significant peak calling: Cuffdiff 2.0 using Scripture output as peak reference
$ Bash example
# Install Cufflinks suite (includes Cuffdiff) # conda install -c bioconda cufflinks=2.0 # Placeholder for Scripture output converted to GTF format. # Scripture typically outputs BED files of transcribed regions. # Cuffdiff requires a GTF file of transcripts for quantification. # This step assumes a conversion from Scripture BED to GTF has been performed. # Replace 'scripture_output.gtf' with the actual path to your converted Scripture output. SCRIPTURE_GTF="scripture_output.gtf" # Placeholder for ChIP-Seq alignment BAM files for two conditions/samples. # Replace with actual paths to your aligned ChIP-Seq data for each replicate. CHIP_BAM_SAMPLE1_REP1="chip_sample1_conditionA_replicate1.bam" CHIP_BAM_SAMPLE1_REP2="chip_sample1_conditionA_replicate2.bam" CHIP_BAM_SAMPLE2_REP1="chip_sample2_conditionB_replicate1.bam" CHIP_BAM_SAMPLE2_REP2="chip_sample2_conditionB_replicate2.bam" # Output directory for Cuffdiff results OUTPUT_DIR="cuffdiff_chipseq_results" # Run Cuffdiff for differential expression analysis. # Note: Cuffdiff is primarily designed for RNA-Seq differential expression analysis, # not direct ChIP-Seq peak calling. Its application here for "ChIP-Seq significant peak calling" # using "Scripture output as peak reference" is highly unconventional. # This command will perform differential quantification of features # defined in the SCRIPTURE_GTF across the provided ChIP-Seq BAM files, # treating them as expression data. The output will be differential abundance # of these features, not traditional ChIP-Seq peaks. cuffdiff \ -o "${OUTPUT_DIR}" \ -L "ConditionA,ConditionB" \ "${SCRIPTURE_GTF}" \ "${CHIP_BAM_SAMPLE1_REP1},${CHIP_BAM_SAMPLE1_REP2}" \ "${CHIP_BAM_SAMPLE2_REP1},${CHIP_BAM_SAMPLE2_REP2}"
Raw Source Text
RNA-Seq Read Alignment: TopHat 2.0 with default options using UCSC known genes from hg19 or mm9 as reference transcriptome RNA-Seq Differential Expression: Cuffdiff 2.1 with default options using UCSC known genes from hg19 or mm9 as reference transcriptome ChIP-Seq Read Alignment: Bowtie 2.0 with default options using hg19 or mm9 as reference genome ChIP-Seq general peak calling: Scripture using hg19 or mm9 as reference genome ChIP-Seq significant peak calling: Cuffdiff 2.0 using Scripture output as peak reference Genome_build: hg19 / mm9 Supplementary_files_format_and_content: TXT files are expression matrices with FPKM values for each sample / BED files are p53 ChIP-Seq peak coordinates