GSE46240 Processing Pipeline

RNA-Seq code_examples 5 steps

Publication

Zmat3 Is a Key Splicing Regulator in the p53 Tumor Suppression Program.

Molecular cell (2020) — PMID 33157015

Dataset

Global genomic profiling of p53-regulated genes

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

ChIP-seq and RNA-seq reads were aligned to the mm9 genome assembly using DNAnexus.

RNA-seq v2.7.10a (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install STAR (example using conda)
# conda install -c bioconda star

# Placeholder for genome index directory
# You would typically download or build the STAR index for mm9.
# Example for building index (replace with actual mm9 files, e.g., from UCSC or Ensembl):
# mkdir -p star_index_mm9
# STAR --runMode genomeGenerate \
#      --genomeDir star_index_mm9 \
#      --genomeFastaFiles Mus_musculus.NCBIM37.67.dna.primary_assembly.fa.gz \
#      --sjdbGTFfile Mus_musculus.NCBIM37.67.gtf.gz \
#      --runThreadN 8

# Assuming STAR index for mm9 is available at /path/to/star_index_mm9
GENOME_DIR="/path/to/star_index_mm9" # Replace with actual path to mm9 STAR index
READ1="sample_R1.fastq.gz" # Replace with actual R1 fastq file
READ2="sample_R2.fastq.gz" # Replace with actual R2 fastq file (assuming paired-end)
OUTPUT_PREFIX="sample_aligned"
THREADS=8 # Adjust as needed

STAR --runThreadN ${THREADS} \
     --genomeDir ${GENOME_DIR} \
     --readFilesIn ${READ1} ${READ2} \
     --readFilesCommand zcat \
     --outFileNamePrefix ${OUTPUT_PREFIX} \
     --outSAMtype BAM SortedByCoordinate \
     --outSAMunmapped Within \
     --outSAMattributes Standard

View on GitHub

For ChIP-seq: Peaks were called using the DNAnexus ChIP-seq analyses tool with the following specifications: peak enrichment >20, peak-to background enrichment >3, a minimum ratio between uniquely and repetitively mapped reads of 3:1, a kernel bandwith of 30.0, FDR calculation enabled.

ChIP-seq vInferred with models/gemini-2.5-flash GitHub

$ Bash example

# This bash code block represents the likely underlying peak calling logic
# using MACS2, which is commonly employed in ChIP-seq analysis pipelines
# like the DNAnexus ChIP-seq analyses tool. The ENCODE ChIP-seq pipeline,
# a common standard, utilizes MACS2 for peak calling.
# The specific parameters mentioned in the description are mapped to MACS2
# where applicable, or noted as post-processing/QC criteria.

# Installation (uncomment to install via conda)
# conda create -n macs2_env macs2=2.2.7.1
# conda activate macs2_env

# Define input files (placeholders - replace with actual paths)
# TREATMENT_BAM: Path to the alignment file for the ChIP sample (e.g., IP sample)
# CONTROL_BAM: Path to the alignment file for the input/control sample (e.g., Input DNA)
# Example:
TREATMENT_BAM="path/to/your/treatment.bam"
CONTROL_BAM="path/to/your/control.bam"
OUTPUT_DIR="macs2_peaks_output"
GENOME_SIZE="hs" # For human (hg38), use 'hs'. For mouse (mm10), use 'mm'.
                 # Refer to MACS2 documentation for other genome sizes or use --gsize <effective_genome_size>
FDR_CUTOFF=0.05  # "FDR calculation enabled" - common default FDR cutoff is 0.05 if not specified
KERNEL_BANDWIDTH=30.0 # "a kernel bandwith of 30.0"

# Create output directory if it doesn't exist
mkdir -p "${OUTPUT_DIR}"

# Execute MACS2 peak calling
# Assuming paired-end reads (BAMPE) as is common for modern ChIP-seq. If single-end reads, change -f BAMPE to -f BAM.
macs2 callpeak \
    -t "${TREATMENT_BAM}" \
    -c "${CONTROL_BAM}" \
    -f BAMPE \
    -g "${GENOME_SIZE}" \
    -n "chipseq_peaks" \
    --outdir "${OUTPUT_DIR}" \
    --bw "${KERNEL_BANDWIDTH}" \
    -q "${FDR_CUTOFF}" \
    --keep-dup all # Keep all duplicate reads, common for ChIP-seq. Adjust if needed.

# The following parameters are likely post-processing filters or quality control metrics
# applied after initial peak calling, or specific to the DNAnexus tool's internal logic:
# - Peak enrichment >20 (This would be a filter applied to the fold_enrichment column in MACS2 output)
# - Peak-to background enrichment >3 (Similar to peak enrichment, often derived from signal over control)
# - Minimum ratio between uniquely and repetitively mapped reads of 3:1 (This is a QC metric for input BAMs, not a direct peak calling parameter)

View on GitHub

Background was combined input reads and reads from p53 ChIP in p53-/- MEFs.

samtools (Inferred with models/gemini-2.5-flash) v1.19 GitHub

$ Bash example

# Install samtools if not already installed
# conda install -c bioconda samtools

# Combine aligned input reads (BAM) and aligned p53 ChIP reads from p53-/- MEFs (BAM)
# to create a pooled background BAM file for downstream peak calling.
# Replace 'input_control.bam' and 'p53_chip_ko.bam' with the actual paths to your input BAM files.
# The output 'combined_background.bam' will be sorted if the input BAMs are sorted.
samtools merge combined_background.bam input_control.bam p53_chip_ko.bam

# Index the merged background BAM file, which is often required for downstream tools like peak callers.
samtools index combined_background.bam

View on GitHub

For RNA-seq: Tag counts were derived using the DNAnexus 3SEQ transcriptome-based quantification tool with the following specifications: Sense only, RefSeq genes.

RefSeq vNot specified (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# This command is conceptual as DNAnexus 3SEQ is a platform-specific application.
# Actual execution would involve running an app on the DNAnexus platform.
# Example using the DNAnexus CLI (dx tool):

# Assuming 'app-3seq' is the name of the 3SEQ quantification app on DNAnexus
# and 'project-xxxx' is your project ID.
# Input files (e.g., FASTQ reads) would need to be uploaded to DNAnexus first.
# Reference genome and annotation (RefSeq genes) would also be available on the platform.

# Placeholder for input FASTQ file ID on DNAnexus
INPUT_FASTQ_ID="file-xxxx"
# Placeholder for RefSeq gene annotation file ID on DNAnexus (e.g., GRCh38 RefSeq GTF)
# The specific RefSeq GTF file would need to be available or uploaded to the DNAnexus project.
REFSEQ_GTF_ID="file-yyyy" # Example: GRCh38 RefSeq GTF

# Execute the 3SEQ app with specified parameters
# The exact parameter names (e.g., --sense_only, --refseq_genes_file) are illustrative
# and would depend on the specific app's input specifications.
dx run app-3seq \
  -iinput_reads="${INPUT_FASTQ_ID}" \
  -iref_genes="${REFSEQ_GTF_ID}" \
  -isense_strand="true" \
  -ooutput_counts="3seq_tag_counts.tsv" \
  --wait # Wait for the job to complete

View on GitHub

The DESeq library was used to normalize tag counts between samples and to call differentially expressed genes.

DESeq2 v1.40.2 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install R and Bioconductor if not already present
# sudo apt-get update
# sudo apt-get install r-base
# R -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager"); BiocManager::install("DESeq2")'

# Create dummy count data and sample information for demonstration
# In a real scenario, these would be generated by upstream steps (e.g., featureCounts)
cat <<EOF > counts.tsv
gene_id	sample1	sample2	sample3	sample4
geneA	100	120	50	60
geneB	20	25	10	12
geneC	500	550	200	220
geneD	5	6	2	3
EOF

cat <<EOF > sample_info.tsv
sample_id	condition
sample1	treated
sample2	treated
sample3	control
sample4	control
EOF

# R script for DESeq2 analysis
Rscript -e '
  library(DESeq2)

  # Load count data
  count_data <- read.table("counts.tsv", header = TRUE, row.names = 1, sep = "\t")

  # Load sample information
  sample_info <- read.table("sample_info.tsv", header = TRUE, row.names = 1, sep = "\t")

  # Ensure sample order matches between count data and sample info
  sample_info <- sample_info[colnames(count_data), , drop = FALSE]

  # Create DESeqDataSet object
  dds <- DESeqDataSetFromMatrix(countData = count_data,
                                colData = sample_info,
                                design = ~ condition)

  # Run DESeq2 analysis
  dds <- DESeq(dds)

  # Get results for the "treated" vs "control" comparison
  # The contrast argument specifies the comparison: c("design_variable", "level_numerator", "level_denominator")
  res <- results(dds, contrast = c("condition", "treated", "control"))

  # Order results by adjusted p-value
  res_ordered <- res[order(res$padj),]

  # Save results to a file
  write.table(as.data.frame(res_ordered), file = "deseq2_results.tsv", sep = "\t", quote = FALSE, row.names = TRUE)

  message("DESeq2 analysis complete. Results saved to deseq2_results.tsv")
'

View on GitHub

Tools Used

RNA-seq ChIP-seq DESeq2

Raw Source Text

ChIP-seq and RNA-seq reads were aligned to the mm9 genome assembly using DNAnexus.
For ChIP-seq: Peaks were called using the DNAnexus ChIP-seq analyses tool with the following specifications: peak enrichment >20, peak-to background enrichment >3, a minimum ratio between uniquely and repetitively mapped reads of 3:1, a kernel bandwith of 30.0, FDR calculation enabled. Background was combined input  reads and reads from p53 ChIP in p53-/- MEFs.
For RNA-seq: Tag counts were derived using the DNAnexus 3SEQ transcriptome-based quantification tool with the following specifications: Sense only, RefSeq genes. The DESeq library was used to normalize tag counts between samples and to call differentially expressed genes.
Genome_build: mm9
Supplementary_files_format_and_content: ChIP-seq: Processed data files are tab-delimited text files containing 1) Chromosome name, 2) Position (summit of the peak), 3) enrichment (based on background reads), 4) neg_log10_qvalue (negative log of q-value of peak) and 5) peak rank.
Supplementary_files_format_and_content: RNA-seq: Processed files are tab-delimited text files containing normalized gene expression, obtained using DESeq.

← Back to Analysis