GSE80664 Processing Pipeline
Publication
Dysregulation of RBFOX2 Is an Early Event in Cardiac Pathogenesis of Diabetes.Cell reports (2016) — PMID 27239029
Dataset
GSE80664Transcriptome-wide mRNA alterations in Streptozotocin induced Type I diabetic mouse heart
Processing Steps
Generate Jupyter Notebook-
1
Base calling was performed with CASAVA 1.8.2
CASAVA v1.8.2$ Bash example
# Install CASAVA 1.8.2 (typically part of Illumina's instrument software) # This is a conceptual command for base calling and demultiplexing with CASAVA 1.8.2. # Replace /path/to/illumina/run_directory with the actual path to your Illumina run folder. # Replace /path/to/output_directory with the desired output location for FASTQ files. # Replace /path/to/SampleSheet.csv with the actual path to your sample sheet. # Step 1: Configure the base calling and demultiplexing # This command generates a Makefile in the output directory. configureBclToFastq.pl --input-dir /path/to/illumina/run_directory/Data/Intensities/BaseCalls \ --output-dir /path/to/output_directory \ --sample-sheet /path/to/SampleSheet.csv \ --no-eamss # Optional: disable EAMSS (Error-Aware Multi-Sample Splitting) # Step 2: Execute the base calling and demultiplexing using the generated Makefile # Navigate to the output directory where the Makefile was created # cd /path/to/output_directory make -j 8 # Use an appropriate number of parallel jobs (e.g., number of CPU cores) -
2
Sequences were trimmed then aligned to the mm9 mouse genome using TOPHAT
$ Bash example
# Install TopHat (example using Conda) # TopHat requires Bowtie2 and Samtools as dependencies. # conda install -c bioconda tophat bowtie2 samtools # --- Prepare Reference Genome and Annotation --- # 1. Download the mm9 reference genome FASTA file (e.g., from UCSC). # Example: wget -P /path/to/genome/ http://hgdownload.soe.ucsc.edu/goldenPath/mm9/bigZips/mm9.fa.gz # gunzip /path/to/genome/mm9.fa.gz # 2. Build the Bowtie2 index for mm9. # bowtie2-build /path/to/genome/mm9.fa /path/to/bowtie_indexes/mm9 # 3. Download a GTF annotation file for mm9 (e.g., from UCSC or Ensembl). # Example: wget -P /path/to/annotations/ http://hgdownload.soe.ucsc.edu/goldenPath/mm9/bigZips/genes/mm9.ncbiRefSeq.gtf.gz # gunzip /path/to/annotations/mm9.ncbiRefSeq.gtf.gz # --- Define Variables --- # Replace with actual paths to your files BOWTIE_INDEX="/path/to/bowtie_indexes/mm9" # Path to the Bowtie2 index prefix for mm9 GTF_FILE="/path/to/annotations/mm9.gtf" # Path to the mm9 GTF annotation file TRIMMED_READS_R1="trimmed_reads_R1.fastq" # Path to the trimmed forward reads (FASTQ) TRIMMED_READS_R2="trimmed_reads_R2.fastq" # Path to the trimmed reverse reads (FASTQ) - omit if single-end OUTPUT_DIR="tophat_alignment_mm9" # Directory for TopHat output NUM_THREADS=8 # Number of CPU threads to use # --- Run TopHat Alignment --- # -p: Number of threads # -G: GTF annotation file (essential for splice junction discovery in RNA-seq) # -o: Output directory # The Bowtie2 index prefix and input FASTQ files are positional arguments. # For paired-end reads, provide both R1 and R2 files. # For single-end reads, provide only the R1 file. tophat -p "${NUM_THREADS}" -G "${GTF_FILE}" -o "${OUTPUT_DIR}" "${BOWTIE_INDEX}" "${TRIMMED_READS_R1}" "${TRIMMED_READS_R2}" -
3
Probabilities of isoform abundances were computed using MISO
MISO v0.5.3 (Inferred with models/gemini-2.5-flash)$ Bash example
# Install MISO (if not already installed) # conda install -c bioconda miso # Placeholder for MISO events index directory (e.g., pre-built for a specific genome like hg38) # This directory contains the indexed alternative splicing events (e.g., from UCSC knownGene, Ensembl) # Example: MISO_EVENTS_INDEX_DIR="/path/to/miso_events/hg38_v2" MISO_EVENTS_INDEX_DIR="/path/to/miso_events_index" # Placeholder for input RNA-Seq alignment file (BAM format) INPUT_BAM="sample_aligned.bam" # Placeholder for output directory where MISO results will be stored OUTPUT_DIR="miso_output" # Create the output directory if it does not exist mkdir -p "${OUTPUT_DIR}" # Run MISO to compute probabilities of isoform abundances # This command will generate various output files, including posterior distributions # and summary statistics for isoform usage, from which probabilities can be derived. miso --run "${MISO_EVENTS_INDEX_DIR}" "${INPUT_BAM}" --output-dir "${OUTPUT_DIR}" -
4
Data was filtered based on a Bayes factor of at least 1
awk (Inferred with models/gemini-2.5-flash) vN/A$ Bash example
# Assuming 'input.tsv' is a tab-separated file where the Bayes factor is in the 5th column. # Adjust the column number ($5) and delimiter (-F'\t') as per your input file format. # If your file has a header and you want to preserve it, use: # (head -n 1 input.tsv && awk -F'\t' 'NR > 1 && $5 >= 1' input.tsv) > filtered_output.tsv awk -F'\t' '$5 >= 1' input.tsv > filtered_output.tsv
-
5
Additional details are included in Verma, S.K., Deshmukh, V., Liu, P., Nutter, C.A., Espejo, R., Hung, M.L., Wang, G.S., Yeo, G.W., and Kuyumcu-Martinez, M.N. (2013).
R v3.x$ Bash example
# Install R (if not already available) # conda install -c r r-base # This is a placeholder command as the specific R script and parameters # are not detailed in the description, which only references a publication. # The publication describes a novel role for RBM3 in translation regulation, # which would likely involve various bioinformatics analyses in R. # Replace 'my_analysis_script.R', 'input_data.txt', and 'output_results.txt' # with actual script and file names relevant to the specific analysis. Rscript my_analysis_script.R input_data.txt output_results.txt
-
6
Reactivation of fetal splicing programs in diabetic hearts is mediated by protein kinase C signaling.
$ Bash example
# Install rMATS (example using conda) # conda create -n rmats_env python=3.8 # conda activate rmats_env # pip install rmats-turbo # Define input files and reference # Placeholder for human genome (GRCh38) and GENCODE annotation GENOME_FASTA="/path/to/human_genome/GRCh38.p14.genome.fa" GTF_ANNOTATION="/path/to/human_genome/gencode.v44.annotation.gtf" # Files listing paths to BAM files for each group # In a real scenario, these files would be populated with actual paths to aligned RNA-seq BAMs. BAM_LIST_DIABETIC="diabetic_bams.txt" BAM_LIST_CONTROL="control_bams.txt" OUTPUT_DIR="rmats_output_diabetic_vs_control" # Create placeholder BAM list files for demonstration echo "/path/to/aligned_reads/diabetic_sample1.bam" > $BAM_LIST_DIABETIC echo "/path/to/aligned_reads/diabetic_sample2.bam" >> $BAM_LIST_DIABETIC echo "/path/to/aligned_reads/control_sample1.bam" > $BAM_LIST_CONTROL echo "/path/to/aligned_reads/control_sample2.bam" >> $BAM_LIST_CONTROL # Run rMATS for differential splicing analysis # This command compares splicing events between diabetic and control heart samples. # Parameters like readLength and libType should be adjusted based on actual experimental data. rmats.py --b1 $BAM_LIST_DIABETIC \ --b2 $BAM_LIST_CONTROL \ --gtf $GTF_ANNOTATION \ --readLength 100 \ --nthread 8 \ --tmp rmats_tmp \ --od $OUTPUT_DIR \ --task diff \ --libType fr-firststrand -
7
The Journal of biological chemistry 288, 35372-35386.
Unknown (Inferred with models/gemini-2.5-flash) vUnknown$ Bash example
# No specific bioinformatics tool or command could be inferred from the provided publication reference. # The description refers to a scientific publication, not a bioinformatics step or tool.
-
8
processed data files format and content: Proccessed data is MISO analysis of skipped exons in comparisions between control and diabetic samples.
MISO vNot specified (Inferred with models/gemini-2.5-flash)$ Bash example
# Install MISO (if not already installed) # pip install miso # Define variables # MISO annotations for skipped exons (e.g., derived from a GFF3 file for a specific genome assembly like hg38). # This file needs to be pre-built using the 'index_gff' script from MISO. MISO_ANNOTATIONS="path/to/miso_annotations_se.gff3" # Input BAM files directories for control and diabetic samples. # These directories should contain the aligned RNA-Seq BAM files for each group. CONTROL_BAM_DIR="path/to/control_bams" DIABETIC_BAM_DIR="path/to/diabetic_bams" # Output directories for MISO quantification results for each sample group OUTPUT_DIR_CONTROL_QUANT="miso_quant_control" OUTPUT_DIR_DIABETIC_QUANT="miso_quant_diabetic" # Output directory for the MISO comparison results OUTPUT_DIR_COMPARISON="miso_comparison_output" # Create output directories if they don't exist mkdir -p "${OUTPUT_DIR_CONTROL_QUANT}" mkdir -p "${OUTPUT_DIR_DIABETIC_QUANT}" mkdir -p "${OUTPUT_DIR_COMPARISON}" # --- MISO Quantification for Control Samples --- # Iterate through control BAM files and run MISO for each sample to quantify isoform usage (PSI values). for bam_file in "${CONTROL_BAM_DIR}"/*.bam; do sample_name=$(basename "${bam_file}" .bam) echo "Running MISO quantification for control sample: ${sample_name}" # The --read-len parameter should match the actual read length of your RNA-Seq data. # Adjust other parameters like --num-reads-per-group if needed. miso --run "${MISO_ANNOTATIONS}" "${bam_file}" --output-dir "${OUTPUT_DIR_CONTROL_QUANT}/${sample_name}" --read-len 50 done # --- MISO Quantification for Diabetic Samples --- # Iterate through diabetic BAM files and run MISO for each sample. for bam_file in "${DIABETIC_BAM_DIR}"/*.bam; do sample_name=$(basename "${bam_file}" .bam) echo "Running MISO quantification for diabetic sample: ${sample_name}" # The --read-len parameter should match the actual read length of your RNA-Seq data. miso --run "${MISO_ANNOTATIONS}" "${bam_file}" --output-dir "${OUTPUT_DIR_DIABETIC_QUANT}/${sample_name}" --read-len 50 done # --- MISO Comparison between Control and Diabetic Samples --- # This step compares the MISO output directories from the two groups # to identify differential splicing events (e.g., changes in skipped exon usage). # It computes Bayes factors and delta PSI values. echo "Running MISO comparison between control and diabetic samples" compare_miso "${OUTPUT_DIR_CONTROL_QUANT}" "${OUTPUT_DIR_DIABETIC_QUANT}" "${OUTPUT_DIR_COMPARISON}" -
9
The proccessed data format is tab deliniated .txt files.
N/A (Inferred with models/gemini-2.5-flash) vN/A$ Bash example
# The step description "The proccessed data format is tab deliniated .txt files" describes the expected output format of a previous processing step, rather than a processing step itself. # As no specific tool, process, or parameters are mentioned, a concrete bash command for a bioinformatics tool cannot be generated. # This description typically serves as metadata for the output of a preceding step (e.g., peak calling, quantification, differential expression results). # # Example of how one might inspect such a file (not a processing command): # head -n 5 processed_data.txt # # To check the number of columns (assuming tab-delimited): # # head -n 1 processed_data.txt | awk -F'\t' '{print NF}'
Raw Source Text
Base calling was performed with CASAVA 1.8.2 Sequences were trimmed then aligned to the mm9 mouse genome using TOPHAT Probabilities of isoform abundances were computed using MISO Data was filtered based on a Bayes factor of at least 1 Additional details are included in Verma, S.K., Deshmukh, V., Liu, P., Nutter, C.A., Espejo, R., Hung, M.L., Wang, G.S., Yeo, G.W., and Kuyumcu-Martinez, M.N. (2013). Reactivation of fetal splicing programs in diabetic hearts is mediated by protein kinase C signaling. The Journal of biological chemistry 288, 35372-35386. genome build: mm9 processed data files format and content: Proccessed data is MISO analysis of skipped exons in comparisions between control and diabetic samples. The proccessed data format is tab deliniated .txt files.