GSE46240 Processing Pipeline
Publication
Zmat3 Is a Key Splicing Regulator in the p53 Tumor Suppression Program.Molecular cell (2020) — PMID 33157015
Processing Steps
Generate Jupyter Notebook-
1
ChIP-seq and RNA-seq reads were aligned to the mm9 genome assembly using DNAnexus.
$ Bash example
# Install STAR (example using conda) # conda install -c bioconda star # Placeholder for genome index directory # You would typically download or build the STAR index for mm9. # Example for building index (replace with actual mm9 files, e.g., from UCSC or Ensembl): # mkdir -p star_index_mm9 # STAR --runMode genomeGenerate \ # --genomeDir star_index_mm9 \ # --genomeFastaFiles Mus_musculus.NCBIM37.67.dna.primary_assembly.fa.gz \ # --sjdbGTFfile Mus_musculus.NCBIM37.67.gtf.gz \ # --runThreadN 8 # Assuming STAR index for mm9 is available at /path/to/star_index_mm9 GENOME_DIR="/path/to/star_index_mm9" # Replace with actual path to mm9 STAR index READ1="sample_R1.fastq.gz" # Replace with actual R1 fastq file READ2="sample_R2.fastq.gz" # Replace with actual R2 fastq file (assuming paired-end) OUTPUT_PREFIX="sample_aligned" THREADS=8 # Adjust as needed STAR --runThreadN ${THREADS} \ --genomeDir ${GENOME_DIR} \ --readFilesIn ${READ1} ${READ2} \ --readFilesCommand zcat \ --outFileNamePrefix ${OUTPUT_PREFIX} \ --outSAMtype BAM SortedByCoordinate \ --outSAMunmapped Within \ --outSAMattributes Standard -
2
For ChIP-seq: Peaks were called using the DNAnexus ChIP-seq analyses tool with the following specifications: peak enrichment >20, peak-to background enrichment >3, a minimum ratio between uniquely and repetitively mapped reads of 3:1, a kernel bandwith of 30.0, FDR calculation enabled.
$ Bash example
# This bash code block represents the likely underlying peak calling logic # using MACS2, which is commonly employed in ChIP-seq analysis pipelines # like the DNAnexus ChIP-seq analyses tool. The ENCODE ChIP-seq pipeline, # a common standard, utilizes MACS2 for peak calling. # The specific parameters mentioned in the description are mapped to MACS2 # where applicable, or noted as post-processing/QC criteria. # Installation (uncomment to install via conda) # conda create -n macs2_env macs2=2.2.7.1 # conda activate macs2_env # Define input files (placeholders - replace with actual paths) # TREATMENT_BAM: Path to the alignment file for the ChIP sample (e.g., IP sample) # CONTROL_BAM: Path to the alignment file for the input/control sample (e.g., Input DNA) # Example: TREATMENT_BAM="path/to/your/treatment.bam" CONTROL_BAM="path/to/your/control.bam" OUTPUT_DIR="macs2_peaks_output" GENOME_SIZE="hs" # For human (hg38), use 'hs'. For mouse (mm10), use 'mm'. # Refer to MACS2 documentation for other genome sizes or use --gsize <effective_genome_size> FDR_CUTOFF=0.05 # "FDR calculation enabled" - common default FDR cutoff is 0.05 if not specified KERNEL_BANDWIDTH=30.0 # "a kernel bandwith of 30.0" # Create output directory if it doesn't exist mkdir -p "${OUTPUT_DIR}" # Execute MACS2 peak calling # Assuming paired-end reads (BAMPE) as is common for modern ChIP-seq. If single-end reads, change -f BAMPE to -f BAM. macs2 callpeak \ -t "${TREATMENT_BAM}" \ -c "${CONTROL_BAM}" \ -f BAMPE \ -g "${GENOME_SIZE}" \ -n "chipseq_peaks" \ --outdir "${OUTPUT_DIR}" \ --bw "${KERNEL_BANDWIDTH}" \ -q "${FDR_CUTOFF}" \ --keep-dup all # Keep all duplicate reads, common for ChIP-seq. Adjust if needed. # The following parameters are likely post-processing filters or quality control metrics # applied after initial peak calling, or specific to the DNAnexus tool's internal logic: # - Peak enrichment >20 (This would be a filter applied to the fold_enrichment column in MACS2 output) # - Peak-to background enrichment >3 (Similar to peak enrichment, often derived from signal over control) # - Minimum ratio between uniquely and repetitively mapped reads of 3:1 (This is a QC metric for input BAMs, not a direct peak calling parameter) -
3
Background was combined input reads and reads from p53 ChIP in p53-/- MEFs.
$ Bash example
# Install samtools if not already installed # conda install -c bioconda samtools # Combine aligned input reads (BAM) and aligned p53 ChIP reads from p53-/- MEFs (BAM) # to create a pooled background BAM file for downstream peak calling. # Replace 'input_control.bam' and 'p53_chip_ko.bam' with the actual paths to your input BAM files. # The output 'combined_background.bam' will be sorted if the input BAMs are sorted. samtools merge combined_background.bam input_control.bam p53_chip_ko.bam # Index the merged background BAM file, which is often required for downstream tools like peak callers. samtools index combined_background.bam
-
4
For RNA-seq: Tag counts were derived using the DNAnexus 3SEQ transcriptome-based quantification tool with the following specifications: Sense only, RefSeq genes.
$ Bash example
# This command is conceptual as DNAnexus 3SEQ is a platform-specific application. # Actual execution would involve running an app on the DNAnexus platform. # Example using the DNAnexus CLI (dx tool): # Assuming 'app-3seq' is the name of the 3SEQ quantification app on DNAnexus # and 'project-xxxx' is your project ID. # Input files (e.g., FASTQ reads) would need to be uploaded to DNAnexus first. # Reference genome and annotation (RefSeq genes) would also be available on the platform. # Placeholder for input FASTQ file ID on DNAnexus INPUT_FASTQ_ID="file-xxxx" # Placeholder for RefSeq gene annotation file ID on DNAnexus (e.g., GRCh38 RefSeq GTF) # The specific RefSeq GTF file would need to be available or uploaded to the DNAnexus project. REFSEQ_GTF_ID="file-yyyy" # Example: GRCh38 RefSeq GTF # Execute the 3SEQ app with specified parameters # The exact parameter names (e.g., --sense_only, --refseq_genes_file) are illustrative # and would depend on the specific app's input specifications. dx run app-3seq \ -iinput_reads="${INPUT_FASTQ_ID}" \ -iref_genes="${REFSEQ_GTF_ID}" \ -isense_strand="true" \ -ooutput_counts="3seq_tag_counts.tsv" \ --wait # Wait for the job to complete -
5
The DESeq library was used to normalize tag counts between samples and to call differentially expressed genes.
$ Bash example
# Install R and Bioconductor if not already present # sudo apt-get update # sudo apt-get install r-base # R -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager"); BiocManager::install("DESeq2")' # Create dummy count data and sample information for demonstration # In a real scenario, these would be generated by upstream steps (e.g., featureCounts) cat <<EOF > counts.tsv gene_id sample1 sample2 sample3 sample4 geneA 100 120 50 60 geneB 20 25 10 12 geneC 500 550 200 220 geneD 5 6 2 3 EOF cat <<EOF > sample_info.tsv sample_id condition sample1 treated sample2 treated sample3 control sample4 control EOF # R script for DESeq2 analysis Rscript -e ' library(DESeq2) # Load count data count_data <- read.table("counts.tsv", header = TRUE, row.names = 1, sep = "\t") # Load sample information sample_info <- read.table("sample_info.tsv", header = TRUE, row.names = 1, sep = "\t") # Ensure sample order matches between count data and sample info sample_info <- sample_info[colnames(count_data), , drop = FALSE] # Create DESeqDataSet object dds <- DESeqDataSetFromMatrix(countData = count_data, colData = sample_info, design = ~ condition) # Run DESeq2 analysis dds <- DESeq(dds) # Get results for the "treated" vs "control" comparison # The contrast argument specifies the comparison: c("design_variable", "level_numerator", "level_denominator") res <- results(dds, contrast = c("condition", "treated", "control")) # Order results by adjusted p-value res_ordered <- res[order(res$padj),] # Save results to a file write.table(as.data.frame(res_ordered), file = "deseq2_results.tsv", sep = "\t", quote = FALSE, row.names = TRUE) message("DESeq2 analysis complete. Results saved to deseq2_results.tsv") '
Raw Source Text
ChIP-seq and RNA-seq reads were aligned to the mm9 genome assembly using DNAnexus. For ChIP-seq: Peaks were called using the DNAnexus ChIP-seq analyses tool with the following specifications: peak enrichment >20, peak-to background enrichment >3, a minimum ratio between uniquely and repetitively mapped reads of 3:1, a kernel bandwith of 30.0, FDR calculation enabled. Background was combined input reads and reads from p53 ChIP in p53-/- MEFs. For RNA-seq: Tag counts were derived using the DNAnexus 3SEQ transcriptome-based quantification tool with the following specifications: Sense only, RefSeq genes. The DESeq library was used to normalize tag counts between samples and to call differentially expressed genes. Genome_build: mm9 Supplementary_files_format_and_content: ChIP-seq: Processed data files are tab-delimited text files containing 1) Chromosome name, 2) Position (summit of the peak), 3) enrichment (based on background reads), 4) neg_log10_qvalue (negative log of q-value of peak) and 5) peak rank. Supplementary_files_format_and_content: RNA-seq: Processed files are tab-delimited text files containing normalized gene expression, obtained using DESeq.