GSE80664 Processing Pipeline

RNA-Seq code_examples 9 steps

Publication

Dysregulation of RBFOX2 Is an Early Event in Cardiac Pathogenesis of Diabetes.

Cell reports (2016) — PMID 27239029

Dataset

Transcriptome-wide mRNA alterations in Streptozotocin induced Type I diabetic mouse heart

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Base calling was performed with CASAVA 1.8.2

CASAVA v1.8.2

$ Bash example

# Install CASAVA 1.8.2 (typically part of Illumina's instrument software)
# This is a conceptual command for base calling and demultiplexing with CASAVA 1.8.2.
# Replace /path/to/illumina/run_directory with the actual path to your Illumina run folder.
# Replace /path/to/output_directory with the desired output location for FASTQ files.
# Replace /path/to/SampleSheet.csv with the actual path to your sample sheet.

# Step 1: Configure the base calling and demultiplexing
# This command generates a Makefile in the output directory.
configureBclToFastq.pl --input-dir /path/to/illumina/run_directory/Data/Intensities/BaseCalls \
                           --output-dir /path/to/output_directory \
                           --sample-sheet /path/to/SampleSheet.csv \
                           --no-eamss # Optional: disable EAMSS (Error-Aware Multi-Sample Splitting)

# Step 2: Execute the base calling and demultiplexing using the generated Makefile
# Navigate to the output directory where the Makefile was created
# cd /path/to/output_directory
make -j 8 # Use an appropriate number of parallel jobs (e.g., number of CPU cores)

Sequences were trimmed then aligned to the mm9 mouse genome using TOPHAT

TopHat v2.1.1 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install TopHat (example using Conda)
# TopHat requires Bowtie2 and Samtools as dependencies.
# conda install -c bioconda tophat bowtie2 samtools

# --- Prepare Reference Genome and Annotation ---
# 1. Download the mm9 reference genome FASTA file (e.g., from UCSC).
#    Example: wget -P /path/to/genome/ http://hgdownload.soe.ucsc.edu/goldenPath/mm9/bigZips/mm9.fa.gz
#    gunzip /path/to/genome/mm9.fa.gz

# 2. Build the Bowtie2 index for mm9.
#    bowtie2-build /path/to/genome/mm9.fa /path/to/bowtie_indexes/mm9

# 3. Download a GTF annotation file for mm9 (e.g., from UCSC or Ensembl).
#    Example: wget -P /path/to/annotations/ http://hgdownload.soe.ucsc.edu/goldenPath/mm9/bigZips/genes/mm9.ncbiRefSeq.gtf.gz
#    gunzip /path/to/annotations/mm9.ncbiRefSeq.gtf.gz

# --- Define Variables ---
# Replace with actual paths to your files
BOWTIE_INDEX="/path/to/bowtie_indexes/mm9" # Path to the Bowtie2 index prefix for mm9
GTF_FILE="/path/to/annotations/mm9.gtf"   # Path to the mm9 GTF annotation file
TRIMMED_READS_R1="trimmed_reads_R1.fastq" # Path to the trimmed forward reads (FASTQ)
TRIMMED_READS_R2="trimmed_reads_R2.fastq" # Path to the trimmed reverse reads (FASTQ) - omit if single-end
OUTPUT_DIR="tophat_alignment_mm9"         # Directory for TopHat output
NUM_THREADS=8                              # Number of CPU threads to use

# --- Run TopHat Alignment ---
# -p: Number of threads
# -G: GTF annotation file (essential for splice junction discovery in RNA-seq)
# -o: Output directory
# The Bowtie2 index prefix and input FASTQ files are positional arguments.
# For paired-end reads, provide both R1 and R2 files.
# For single-end reads, provide only the R1 file.
tophat -p "${NUM_THREADS}" -G "${GTF_FILE}" -o "${OUTPUT_DIR}" "${BOWTIE_INDEX}" "${TRIMMED_READS_R1}" "${TRIMMED_READS_R2}"

View on GitHub

Probabilities of isoform abundances were computed using MISO

MISO v0.5.3 (Inferred with models/gemini-2.5-flash)

$ Bash example

# Install MISO (if not already installed)
# conda install -c bioconda miso

# Placeholder for MISO events index directory (e.g., pre-built for a specific genome like hg38)
# This directory contains the indexed alternative splicing events (e.g., from UCSC knownGene, Ensembl)
# Example: MISO_EVENTS_INDEX_DIR="/path/to/miso_events/hg38_v2"
MISO_EVENTS_INDEX_DIR="/path/to/miso_events_index"

# Placeholder for input RNA-Seq alignment file (BAM format)
INPUT_BAM="sample_aligned.bam"

# Placeholder for output directory where MISO results will be stored
OUTPUT_DIR="miso_output"

# Create the output directory if it does not exist
mkdir -p "${OUTPUT_DIR}"

# Run MISO to compute probabilities of isoform abundances
# This command will generate various output files, including posterior distributions
# and summary statistics for isoform usage, from which probabilities can be derived.
miso --run "${MISO_EVENTS_INDEX_DIR}" "${INPUT_BAM}" --output-dir "${OUTPUT_DIR}"

Data was filtered based on a Bayes factor of at least 1

awk (Inferred with models/gemini-2.5-flash) vN/A

$ Bash example

# Assuming 'input.tsv' is a tab-separated file where the Bayes factor is in the 5th column.
# Adjust the column number ($5) and delimiter (-F'\t') as per your input file format.
# If your file has a header and you want to preserve it, use:
# (head -n 1 input.tsv && awk -F'\t' 'NR > 1 && $5 >= 1' input.tsv) > filtered_output.tsv
awk -F'\t' '$5 >= 1' input.tsv > filtered_output.tsv

Additional details are included in Verma, S.K., Deshmukh, V., Liu, P., Nutter, C.A., Espejo, R., Hung, M.L., Wang, G.S., Yeo, G.W., and Kuyumcu-Martinez, M.N. (2013).

R v3.x

$ Bash example

# Install R (if not already available)
# conda install -c r r-base

# This is a placeholder command as the specific R script and parameters
# are not detailed in the description, which only references a publication.
# The publication describes a novel role for RBM3 in translation regulation,
# which would likely involve various bioinformatics analyses in R.
# Replace 'my_analysis_script.R', 'input_data.txt', and 'output_results.txt'
# with actual script and file names relevant to the specific analysis.
Rscript my_analysis_script.R input_data.txt output_results.txt

Reactivation of fetal splicing programs in diabetic hearts is mediated by protein kinase C signaling.

rMATS (Inferred with models/gemini-2.5-flash) v4.1.2 GitHub

$ Bash example

# Install rMATS (example using conda)
# conda create -n rmats_env python=3.8
# conda activate rmats_env
# pip install rmats-turbo

# Define input files and reference
# Placeholder for human genome (GRCh38) and GENCODE annotation
GENOME_FASTA="/path/to/human_genome/GRCh38.p14.genome.fa"
GTF_ANNOTATION="/path/to/human_genome/gencode.v44.annotation.gtf"

# Files listing paths to BAM files for each group
# In a real scenario, these files would be populated with actual paths to aligned RNA-seq BAMs.
BAM_LIST_DIABETIC="diabetic_bams.txt"
BAM_LIST_CONTROL="control_bams.txt"
OUTPUT_DIR="rmats_output_diabetic_vs_control"

# Create placeholder BAM list files for demonstration
echo "/path/to/aligned_reads/diabetic_sample1.bam" > $BAM_LIST_DIABETIC
echo "/path/to/aligned_reads/diabetic_sample2.bam" >> $BAM_LIST_DIABETIC
echo "/path/to/aligned_reads/control_sample1.bam" > $BAM_LIST_CONTROL
echo "/path/to/aligned_reads/control_sample2.bam" >> $BAM_LIST_CONTROL

# Run rMATS for differential splicing analysis
# This command compares splicing events between diabetic and control heart samples.
# Parameters like readLength and libType should be adjusted based on actual experimental data.
rmats.py --b1 $BAM_LIST_DIABETIC \
         --b2 $BAM_LIST_CONTROL \
         --gtf $GTF_ANNOTATION \
         --readLength 100 \
         --nthread 8 \
         --tmp rmats_tmp \
         --od $OUTPUT_DIR \
         --task diff \
         --libType fr-firststrand

View on GitHub

The Journal of biological chemistry 288, 35372-35386.

Unknown (Inferred with models/gemini-2.5-flash) vUnknown

$ Bash example

# No specific bioinformatics tool or command could be inferred from the provided publication reference.
# The description refers to a scientific publication, not a bioinformatics step or tool.

processed data files format and content: Proccessed data is MISO analysis of skipped exons in comparisions between control and diabetic samples.

MISO vNot specified (Inferred with models/gemini-2.5-flash)

$ Bash example

# Install MISO (if not already installed)
# pip install miso

# Define variables
# MISO annotations for skipped exons (e.g., derived from a GFF3 file for a specific genome assembly like hg38).
# This file needs to be pre-built using the 'index_gff' script from MISO.
MISO_ANNOTATIONS="path/to/miso_annotations_se.gff3"

# Input BAM files directories for control and diabetic samples.
# These directories should contain the aligned RNA-Seq BAM files for each group.
CONTROL_BAM_DIR="path/to/control_bams"
DIABETIC_BAM_DIR="path/to/diabetic_bams"

# Output directories for MISO quantification results for each sample group
OUTPUT_DIR_CONTROL_QUANT="miso_quant_control"
OUTPUT_DIR_DIABETIC_QUANT="miso_quant_diabetic"

# Output directory for the MISO comparison results
OUTPUT_DIR_COMPARISON="miso_comparison_output"

# Create output directories if they don't exist
mkdir -p "${OUTPUT_DIR_CONTROL_QUANT}"
mkdir -p "${OUTPUT_DIR_DIABETIC_QUANT}"
mkdir -p "${OUTPUT_DIR_COMPARISON}"

# --- MISO Quantification for Control Samples ---
# Iterate through control BAM files and run MISO for each sample to quantify isoform usage (PSI values).
for bam_file in "${CONTROL_BAM_DIR}"/*.bam; do
    sample_name=$(basename "${bam_file}" .bam)
    echo "Running MISO quantification for control sample: ${sample_name}"
    # The --read-len parameter should match the actual read length of your RNA-Seq data.
    # Adjust other parameters like --num-reads-per-group if needed.
    miso --run "${MISO_ANNOTATIONS}" "${bam_file}" --output-dir "${OUTPUT_DIR_CONTROL_QUANT}/${sample_name}" --read-len 50
done

# --- MISO Quantification for Diabetic Samples ---
# Iterate through diabetic BAM files and run MISO for each sample.
for bam_file in "${DIABETIC_BAM_DIR}"/*.bam; do
    sample_name=$(basename "${bam_file}" .bam)
    echo "Running MISO quantification for diabetic sample: ${sample_name}"
    # The --read-len parameter should match the actual read length of your RNA-Seq data.
    miso --run "${MISO_ANNOTATIONS}" "${bam_file}" --output-dir "${OUTPUT_DIR_DIABETIC_QUANT}/${sample_name}" --read-len 50
done

# --- MISO Comparison between Control and Diabetic Samples ---
# This step compares the MISO output directories from the two groups
# to identify differential splicing events (e.g., changes in skipped exon usage).
# It computes Bayes factors and delta PSI values.
echo "Running MISO comparison between control and diabetic samples"
compare_miso "${OUTPUT_DIR_CONTROL_QUANT}" "${OUTPUT_DIR_DIABETIC_QUANT}" "${OUTPUT_DIR_COMPARISON}"

The proccessed data format is tab deliniated .txt files.

N/A (Inferred with models/gemini-2.5-flash) vN/A

$ Bash example

# The step description "The proccessed data format is tab deliniated .txt files" describes the expected output format of a previous processing step, rather than a processing step itself.
# As no specific tool, process, or parameters are mentioned, a concrete bash command for a bioinformatics tool cannot be generated.
# This description typically serves as metadata for the output of a preceding step (e.g., peak calling, quantification, differential expression results).
#
# Example of how one might inspect such a file (not a processing command):
# head -n 5 processed_data.txt
# # To check the number of columns (assuming tab-delimited):
# # head -n 1 processed_data.txt | awk -F'\t' '{print NF}'

Tools Used

TopHat R

Raw Source Text

Base calling was performed with CASAVA 1.8.2
Sequences were trimmed then aligned to the mm9 mouse genome using TOPHAT
Probabilities of isoform abundances were computed using MISO
Data was filtered based on a Bayes factor of at least 1
Additional details are included in Verma, S.K., Deshmukh, V., Liu, P., Nutter, C.A., Espejo, R., Hung, M.L., Wang, G.S., Yeo, G.W., and Kuyumcu-Martinez, M.N. (2013). Reactivation of fetal splicing programs in diabetic hearts is mediated by protein kinase C signaling. The Journal of biological chemistry 288, 35372-35386.
genome build: mm9
processed data files format and content: Proccessed data is MISO analysis of skipped exons in comparisions between control and diabetic samples. The proccessed data format is tab deliniated .txt files.

← Back to Analysis