GSE69889 Processing Pipeline

RNA-Seq code_examples 5 steps

Publication

A role for alternative splicing in circadian control of exocytosis and glucose homeostasis.

Genes & development (2020) — PMID 32616519

Dataset

Genome-wide Circadian Control of Transcription at Active Enhancers Regulates Insulin Secretion and Diabetes Risk

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

The data was analyzed by RNA Express v1.0 in BaseSpace

RNA Express v1.0

$ Bash example

# The analysis was performed using "RNA Express v1.0" within the Illumina BaseSpace platform.
# This is an application typically run via a graphical user interface or API, not a direct command-line tool.
# Specific command-line parameters and an executable bash command cannot be generated from the provided description.

# Placeholder for reference genome, commonly used in RNA analysis:
# REF_GENOME="GRCh38"
# REF_GTF="gencode.v38.annotation.gtf"

Reads were aligned to mm10 using STAR with the following parameters: readFilesCommand zcat outFilterType BySJout outSJfilterCountUniqueMin -1 2 2 2 outSJfilterCountTotalMin -1 2 2 2 outFilterIntronMotifs RemoveNoncanonical genomeLoad LoadAndKeep

STAR v2.7.10a (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install STAR (example using conda)
# conda install -c bioconda star

# Create a genome directory (if not already present)
# mkdir -p /path/to/STAR_genome_index/mm10

# Build STAR genome index (if not already present, using mm10 as reference)
# This step is usually done once per genome assembly
# STAR --runMode genomeGenerate \
#      --genomeDir /path/to/STAR_genome_index/mm10 \
#      --genomeFastaFiles /path/to/mm10.fa \
#      --sjdbGTFfile /path/to/mm10.gtf \
#      --runThreadN <number_of_threads>

# Align reads using STAR
STAR --genomeDir /path/to/STAR_genome_index/mm10 \
     --readFilesIn input_reads.fastq.gz \
     --readFilesCommand zcat \
     --outFileNamePrefix output_aligned_prefix_ \
     --outFilterType BySJout \
     --outSJfilterCountUniqueMin -1 2 2 2 \
     --outSJfilterCountTotalMin -1 2 2 2 \
     --outFilterIntronMotifs RemoveNoncanonical \
     --genomeLoad LoadAndKeep \
     --outSAMtype BAM SortedByCoordinate \
     --outSAMunmapped Within \
     --outSAMattributes Standard \
     --runThreadN <number_of_threads>

View on GitHub

Gene expression levels were computed by a proprietary alogorithm in BaseSpace

Proprietary algorithm in BaseSpace (Inferred with models/gemini-2.5-flash) vN/A GitHub

$ Bash example

# Gene expression levels were computed using a proprietary algorithm within Illumina BaseSpace.
# Specific tool and parameters are not disclosed.
# Placeholder for a generic gene expression quantification step, assuming human GRCh38 reference:
# proprietary_expression_tool --input_reads reads.fastq.gz --reference_genome GRCh38.fa --annotation_gtf GRCh38.gtf --output_expression_matrix expression_levels.tsv

View on GitHub

Differential expression was performed with DESeq2 using default parameters

DESeq2 v1.44.0 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install R and Bioconductor if not already present
# sudo apt-get update && sudo apt-get install -y r-base
# R -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")'
# R -e 'BiocManager::install("DESeq2")'

# Create dummy count data (replace with your actual counts.tsv)
cat 'gene_id\tsampleA\tsampleB\tsampleC\tsampleD
gene1\t100\t120\t50\t60
gene2\t50\t60\t100\t110
gene3\t200\t210\t180\t190
gene4\t10\t12\t20\t22' > counts.tsv

# Create dummy sample information (replace with your actual sample_info.tsv)
cat 'sample\tcondition
sampleA\tcontrol
sampleB\tcontrol
sampleC\ttreated
sampleD\ttreated' > sample_info.tsv

# Create an R script for DESeq2 analysis
cat << 'EOF' > run_deseq2.R
library(DESeq2)

# Load count data
count_data <- read.table("counts.tsv", header = TRUE, row.names = 1, sep = "\t")

# Load sample information
sample_info <- read.table("sample_info.tsv", header = TRUE, row.names = 1, sep = "\t")

# Ensure sample order matches count data columns
sample_info <- sample_info[colnames(count_data), , drop = FALSE]

# Create DESeqDataSet object
dds <- DESeqDataSetFromMatrix(countData = count_data,
                              colData = sample_info,
                              design = ~ condition)

# Run DESeq2 analysis with default parameters
dds <- DESeq(dds)

# Get results (e.g., comparing 'treated' vs 'control')
res <- results(dds, contrast = c("condition", "treated", "control"))

# Order results by adjusted p-value
res <- res[order(res$padj), ]

# Write results to a CSV file
write.csv(as.data.frame(res), file = "deseq2_results.csv")

message("DESeq2 analysis complete. Results saved to deseq2_results.csv")
EOF

# Execute the R script
Rscript run_deseq2.R

View on GitHub

Gene cycling data was computed with eJTK_Cycle

eJTK_Cycle vNot specified

$ Bash example

# Clone the eJTK_Cycle repository if not already present
# git clone https://github.com/alan-yeo/eJTK_Cycle.git
# cd eJTK_Cycle

# Run eJTK_Cycle to compute gene cycling data.
# This command assumes 'gene_expression_data.tsv' is the input file containing gene expression values
# (e.g., genes as rows, time points as columns).
# 'cycling_results.tsv' will be the output file with cycling analysis results.
# -p: Specifies the period to test for (e.g., 24 for circadian rhythms).
# -a: Specifies the alpha (significance) level for p-value correction (e.g., 0.05).
python eJTK_Cycle.py -i gene_expression_data.tsv -o cycling_results.tsv -p 24 -a 0.05

Tools Used

STAR DESeq2

Raw Source Text

The data was analyzed by RNA Express v1.0 in BaseSpace
Reads were aligned to mm10 using STAR with the following parameters: readFilesCommand zcat outFilterType BySJout outSJfilterCountUniqueMin -1 2 2 2 outSJfilterCountTotalMin -1 2 2 2 outFilterIntronMotifs RemoveNoncanonical genomeLoad LoadAndKeep
Gene expression levels were computed by a proprietary alogorithm in BaseSpace
Differential expression was performed with DESeq2 using default parameters
Gene cycling data was computed with eJTK_Cycle
Genome_build: mm10
Supplementary_files_format_and_content: tab-delimited text files showing gene cycling data

← Back to Analysis