GSE46240 Processing Pipeline

RNA-Seq code_examples 5 steps

Publication

Zmat3 Is a Key Splicing Regulator in the p53 Tumor Suppression Program.

Molecular cell (2020) — PMID 33157015

Dataset

GSE46240

Global genomic profiling of p53-regulated genes

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    ChIP-seq and RNA-seq reads were aligned to the mm9 genome assembly using DNAnexus.

    RNA-seq v2.7.10a (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install STAR (example using conda)
    # conda install -c bioconda star
    
    # Placeholder for genome index directory
    # You would typically download or build the STAR index for mm9.
    # Example for building index (replace with actual mm9 files, e.g., from UCSC or Ensembl):
    # mkdir -p star_index_mm9
    # STAR --runMode genomeGenerate \
    #      --genomeDir star_index_mm9 \
    #      --genomeFastaFiles Mus_musculus.NCBIM37.67.dna.primary_assembly.fa.gz \
    #      --sjdbGTFfile Mus_musculus.NCBIM37.67.gtf.gz \
    #      --runThreadN 8
    
    # Assuming STAR index for mm9 is available at /path/to/star_index_mm9
    GENOME_DIR="/path/to/star_index_mm9" # Replace with actual path to mm9 STAR index
    READ1="sample_R1.fastq.gz" # Replace with actual R1 fastq file
    READ2="sample_R2.fastq.gz" # Replace with actual R2 fastq file (assuming paired-end)
    OUTPUT_PREFIX="sample_aligned"
    THREADS=8 # Adjust as needed
    
    STAR --runThreadN ${THREADS} \
         --genomeDir ${GENOME_DIR} \
         --readFilesIn ${READ1} ${READ2} \
         --readFilesCommand zcat \
         --outFileNamePrefix ${OUTPUT_PREFIX} \
         --outSAMtype BAM SortedByCoordinate \
         --outSAMunmapped Within \
         --outSAMattributes Standard
  2. 2

    For ChIP-seq: Peaks were called using the DNAnexus ChIP-seq analyses tool with the following specifications: peak enrichment >20, peak-to background enrichment >3, a minimum ratio between uniquely and repetitively mapped reads of 3:1, a kernel bandwith of 30.0, FDR calculation enabled.

    ChIP-seq vInferred with models/gemini-2.5-flash GitHub
    $ Bash example
    # This bash code block represents the likely underlying peak calling logic
    # using MACS2, which is commonly employed in ChIP-seq analysis pipelines
    # like the DNAnexus ChIP-seq analyses tool. The ENCODE ChIP-seq pipeline,
    # a common standard, utilizes MACS2 for peak calling.
    # The specific parameters mentioned in the description are mapped to MACS2
    # where applicable, or noted as post-processing/QC criteria.
    
    # Installation (uncomment to install via conda)
    # conda create -n macs2_env macs2=2.2.7.1
    # conda activate macs2_env
    
    # Define input files (placeholders - replace with actual paths)
    # TREATMENT_BAM: Path to the alignment file for the ChIP sample (e.g., IP sample)
    # CONTROL_BAM: Path to the alignment file for the input/control sample (e.g., Input DNA)
    # Example:
    TREATMENT_BAM="path/to/your/treatment.bam"
    CONTROL_BAM="path/to/your/control.bam"
    OUTPUT_DIR="macs2_peaks_output"
    GENOME_SIZE="hs" # For human (hg38), use 'hs'. For mouse (mm10), use 'mm'.
                     # Refer to MACS2 documentation for other genome sizes or use --gsize <effective_genome_size>
    FDR_CUTOFF=0.05  # "FDR calculation enabled" - common default FDR cutoff is 0.05 if not specified
    KERNEL_BANDWIDTH=30.0 # "a kernel bandwith of 30.0"
    
    # Create output directory if it doesn't exist
    mkdir -p "${OUTPUT_DIR}"
    
    # Execute MACS2 peak calling
    # Assuming paired-end reads (BAMPE) as is common for modern ChIP-seq. If single-end reads, change -f BAMPE to -f BAM.
    macs2 callpeak \
        -t "${TREATMENT_BAM}" \
        -c "${CONTROL_BAM}" \
        -f BAMPE \
        -g "${GENOME_SIZE}" \
        -n "chipseq_peaks" \
        --outdir "${OUTPUT_DIR}" \
        --bw "${KERNEL_BANDWIDTH}" \
        -q "${FDR_CUTOFF}" \
        --keep-dup all # Keep all duplicate reads, common for ChIP-seq. Adjust if needed.
    
    # The following parameters are likely post-processing filters or quality control metrics
    # applied after initial peak calling, or specific to the DNAnexus tool's internal logic:
    # - Peak enrichment >20 (This would be a filter applied to the fold_enrichment column in MACS2 output)
    # - Peak-to background enrichment >3 (Similar to peak enrichment, often derived from signal over control)
    # - Minimum ratio between uniquely and repetitively mapped reads of 3:1 (This is a QC metric for input BAMs, not a direct peak calling parameter)
  3. 3

    Background was combined input reads and reads from p53 ChIP in p53-/- MEFs.

    samtools (Inferred with models/gemini-2.5-flash) v1.19 GitHub
    $ Bash example
    # Install samtools if not already installed
    # conda install -c bioconda samtools
    
    # Combine aligned input reads (BAM) and aligned p53 ChIP reads from p53-/- MEFs (BAM)
    # to create a pooled background BAM file for downstream peak calling.
    # Replace 'input_control.bam' and 'p53_chip_ko.bam' with the actual paths to your input BAM files.
    # The output 'combined_background.bam' will be sorted if the input BAMs are sorted.
    samtools merge combined_background.bam input_control.bam p53_chip_ko.bam
    
    # Index the merged background BAM file, which is often required for downstream tools like peak callers.
    samtools index combined_background.bam
  4. 4

    For RNA-seq: Tag counts were derived using the DNAnexus 3SEQ transcriptome-based quantification tool with the following specifications: Sense only, RefSeq genes.

    RefSeq vNot specified (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # This command is conceptual as DNAnexus 3SEQ is a platform-specific application.
    # Actual execution would involve running an app on the DNAnexus platform.
    # Example using the DNAnexus CLI (dx tool):
    
    # Assuming 'app-3seq' is the name of the 3SEQ quantification app on DNAnexus
    # and 'project-xxxx' is your project ID.
    # Input files (e.g., FASTQ reads) would need to be uploaded to DNAnexus first.
    # Reference genome and annotation (RefSeq genes) would also be available on the platform.
    
    # Placeholder for input FASTQ file ID on DNAnexus
    INPUT_FASTQ_ID="file-xxxx"
    # Placeholder for RefSeq gene annotation file ID on DNAnexus (e.g., GRCh38 RefSeq GTF)
    # The specific RefSeq GTF file would need to be available or uploaded to the DNAnexus project.
    REFSEQ_GTF_ID="file-yyyy" # Example: GRCh38 RefSeq GTF
    
    # Execute the 3SEQ app with specified parameters
    # The exact parameter names (e.g., --sense_only, --refseq_genes_file) are illustrative
    # and would depend on the specific app's input specifications.
    dx run app-3seq \
      -iinput_reads="${INPUT_FASTQ_ID}" \
      -iref_genes="${REFSEQ_GTF_ID}" \
      -isense_strand="true" \
      -ooutput_counts="3seq_tag_counts.tsv" \
      --wait # Wait for the job to complete
  5. 5

    The DESeq library was used to normalize tag counts between samples and to call differentially expressed genes.

    DESeq2 v1.40.2 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install R and Bioconductor if not already present
    # sudo apt-get update
    # sudo apt-get install r-base
    # R -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager"); BiocManager::install("DESeq2")'
    
    # Create dummy count data and sample information for demonstration
    # In a real scenario, these would be generated by upstream steps (e.g., featureCounts)
    cat <<EOF > counts.tsv
    gene_id	sample1	sample2	sample3	sample4
    geneA	100	120	50	60
    geneB	20	25	10	12
    geneC	500	550	200	220
    geneD	5	6	2	3
    EOF
    
    cat <<EOF > sample_info.tsv
    sample_id	condition
    sample1	treated
    sample2	treated
    sample3	control
    sample4	control
    EOF
    
    # R script for DESeq2 analysis
    Rscript -e '
      library(DESeq2)
    
      # Load count data
      count_data <- read.table("counts.tsv", header = TRUE, row.names = 1, sep = "\t")
    
      # Load sample information
      sample_info <- read.table("sample_info.tsv", header = TRUE, row.names = 1, sep = "\t")
    
      # Ensure sample order matches between count data and sample info
      sample_info <- sample_info[colnames(count_data), , drop = FALSE]
    
      # Create DESeqDataSet object
      dds <- DESeqDataSetFromMatrix(countData = count_data,
                                    colData = sample_info,
                                    design = ~ condition)
    
      # Run DESeq2 analysis
      dds <- DESeq(dds)
    
      # Get results for the "treated" vs "control" comparison
      # The contrast argument specifies the comparison: c("design_variable", "level_numerator", "level_denominator")
      res <- results(dds, contrast = c("condition", "treated", "control"))
    
      # Order results by adjusted p-value
      res_ordered <- res[order(res$padj),]
    
      # Save results to a file
      write.table(as.data.frame(res_ordered), file = "deseq2_results.tsv", sep = "\t", quote = FALSE, row.names = TRUE)
    
      message("DESeq2 analysis complete. Results saved to deseq2_results.tsv")
    '
    

Tools Used

Raw Source Text
ChIP-seq and RNA-seq reads were aligned to the mm9 genome assembly using DNAnexus.
For ChIP-seq: Peaks were called using the DNAnexus ChIP-seq analyses tool with the following specifications: peak enrichment >20, peak-to background enrichment >3, a minimum ratio between uniquely and repetitively mapped reads of 3:1, a kernel bandwith of 30.0, FDR calculation enabled. Background was combined input  reads and reads from p53 ChIP in p53-/- MEFs.
For RNA-seq: Tag counts were derived using the DNAnexus 3SEQ transcriptome-based quantification tool with the following specifications: Sense only, RefSeq genes. The DESeq library was used to normalize tag counts between samples and to call differentially expressed genes.
Genome_build: mm9
Supplementary_files_format_and_content: ChIP-seq: Processed data files are tab-delimited text files containing 1) Chromosome name, 2) Position (summit of the peak), 3) enrichment (based on background reads), 4) neg_log10_qvalue (negative log of q-value of peak) and 5) peak rank.
Supplementary_files_format_and_content: RNA-seq: Processed files are tab-delimited text files containing normalized gene expression, obtained using DESeq.
← Back to Analysis