GSE157049 Processing Pipeline

RNA-Seq code_examples 14 steps

Publication

AMPK regulation of Raptor and TSC2 mediate metformin effects on transcriptional control of anabolism and inflammation.

Genes & development (2020) — PMID 32912901

Dataset

RNA-seq of metformin treatment in liver in WT, Raptor Ser-Ala mutant, Tsc2-null, Raptor mutant;Tsc2-null, and Ampk-null

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Raw RNA sequencing reads in FASTQ files were quality-tested using FASTQC (v0.11.8) (Andrews 2010) and mapped to the mouse reference genome (mm10) with STAR (v2.5.3a) aligner with default parameters (Dobin et al.

STAR v2.5.3a GitHub

$ Bash example

# Install STAR if not already installed
# conda install -c bioconda star=2.5.3a

# Placeholder for input FASTQ file(s)
# input_reads.fastq (for single-end reads)
# input_reads_R1.fastq input_reads_R2.fastq (for paired-end reads)

# Placeholder for STAR genome index directory for mm10
# Ensure the mm10 genome index is pre-built using STAR --runMode genomeGenerate ...
# Example: /path/to/STAR_genome_index/mm10

# Align RNA-seq reads to the mouse reference genome (mm10) using STAR with default parameters
# This command assumes single-end reads and outputs a sorted BAM file.
# Adjust --readFilesIn for paired-end reads (e.g., --readFilesIn input_reads_R1.fastq input_reads_R2.fastq)
STAR --genomeDir /path/to/STAR_genome_index/mm10 \
     --readFilesIn input_reads.fastq \
     --outFileNamePrefix aligned_reads_ \
     --runThreadN 8 \
     --outSAMtype BAM SortedByCoordinate

View on GitHub

2012).

Not specified (Inferred with models/gemini-2.5-flash) vNot specified

$ Bash example

# No specific tool or command can be inferred from the description '2012).'
# Please provide more context or a tool name to generate a relevant command.

Raw or normalized gene expression levels were quantified across all the exons of RefSeq genes with analyzeRepeats.pl in HOMER (v4.11.1) (Heinz et al.

HOMER v4.11.1 GitHub

$ Bash example

# Install HOMER (if not already installed)
# For example, using conda:
# conda install -c bioconda homer=4.11.1
# Or manual installation:
# wget http://homer.ucsd.edu/homer/configureHomer.pl
# perl configureHomer.pl -install
# perl configureHomer.pl -install hg38 # Install human genome (replace hg38 with your desired genome, e.g., mm10)

# Quantify raw gene expression levels across RefSeq exons using analyzeRepeats.pl
# -gene: Quantify expression across RefSeq genes (exons).
# -raw: Output raw read counts. For RPKM normalized values, use -rpkm instead of -raw.
# -count: Specify input BAM file(s) or a HOMER tag directory.
# Replace 'hg38' with the genome you have installed in HOMER (e.g., hg38, mm10).
# Replace 'gene_expression.txt' with your desired output file name.
# Replace 'input.bam' with your actual aligned BAM file(s) (e.g., from STAR or HISAT2 alignment).
# If you have multiple BAM files, list them separated by spaces (e.g., input1.bam input2.bam).
# Alternatively, you can provide a HOMER tag directory created with makeTagDirectory.
analyzeRepeats.pl hg38 gene_expression.txt -gene -raw -count input.bam

View on GitHub

2010).

No specific tool inferred (Inferred with models/gemini-2.5-flash)

$ Bash example

# The description '2010)' is insufficient to infer a specific bioinformatics tool, version, or command.
# Please provide a more detailed step description.

Differential expression analysis was performed with raw gene counts using R package, edgeR (v3.26.7) (Robinson et al.

edgeR v3.26.7 GitHub

$ Bash example

# Install R if not already installed (example for Debian/Ubuntu)
# sudo apt-get update && sudo apt-get install -y r-base
#
# Install Bioconductor and edgeR package
# R -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")'
# R -e 'BiocManager::install("edgeR")'

# Example raw gene counts file (replace with actual path)
# counts.tsv should have genes as rows and samples as columns, with the first column being gene IDs.
# Example sample_info.csv should have sample IDs and a 'group' column for differential expression.
#
# Example counts.tsv:
# GeneID\tSample1\tSample2\tSample3\tSample4
# GeneA\t100\t120\t50\t60
# GeneB\t20\t25\t100\t110
# ...
#
# Example sample_info.csv:
# SampleID,Group
# Sample1,Control
# Sample2,Control
# Sample3,Treated
# Sample4,Treated

Rscript -e '
library(edgeR)

# --- Configuration ---
counts_file <- "counts.tsv" # Path to raw gene counts file
sample_info_file <- "sample_info.csv" # Path to sample information file
output_file <- "de_results.csv" # Output file for differential expression results
group_column <- "Group" # Column in sample_info_file indicating experimental groups
control_group <- "Control" # Name of the control group
treated_group <- "Treated" # Name of the treated group
# --- End Configuration ---

# Load data
counts_data <- read.delim(counts_file, row.names = 1, check.names = FALSE)
sample_info <- read.csv(sample_info_file, row.names = 1)

# Ensure sample order matches between counts and sample info
sample_info <- sample_info[colnames(counts_data), , drop = FALSE]

# Create DGEList object
dge <- DGEList(counts = counts_data, group = factor(sample_info[[group_column]]))

# Filter out lowly expressed genes (optional but recommended)
# Keeping genes with at least 10 counts per million (CPM) in at least 2 samples
keep <- filterByExpr(dge)
dge <- dge[keep, , keep.lib.sizes = FALSE]

# Normalize library sizes (TMM normalization is default)
dge <- calcNormFactors(dge)

# Create design matrix
design <- model.matrix(~0 + group, data = dge$samples)
colnames(design) <- levels(dge$samples$group)

# Estimate dispersions
dge <- estimateDisp(dge, design)

# Fit GLM
fit <- glmFit(dge, design)

# Define contrast for differential expression (e.g., Treated vs Control)
contrast_matrix <- makeContrasts(
  paste0(treated_group, "-", control_group),
  levels = design
)

# Perform likelihood ratio test
lrt <- glmLRT(fit, contrast = contrast_matrix)

# Get differential expression results
results <- topTags(lrt, n = Inf, sort.by = "PValue")$table

# Add adjusted p-value (FDR) if not already present
if (!"FDR" %in% colnames(results)) {
  results$FDR <- p.adjust(results$PValue, method = "BH")
}

# Write results to file
write.csv(results, file = output_file, row.names = TRUE)

cat(paste0("Differential expression analysis completed. Results saved to ", output_file, "\n"))
'

View on GitHub

2010), using replicates to compute within-group dispersion and correcting for batch effects.

DESeq (Inferred with models/gemini-2.5-flash) vBioconductor 2.6 (circa 2010) GitHub

$ Bash example

# Install R and Bioconductor if not already present
# sudo apt-get update
# sudo apt-get install r-base
# R -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")'
# R -e 'BiocManager::install("DESeq2")' # DESeq2 is the modern successor to DESeq (2010) and implements these concepts.

# Placeholder for count data and sample information
# Replace 'counts.tsv' with your actual count matrix (rows: genes, columns: samples)
# Replace 'sample_info.tsv' with your actual sample metadata (columns: sample_id, condition, batch, etc.)
# Example:
# counts.tsv:
# gene_id    sample1    sample2    sample3    sample4
# geneA      100        120        90         110
# geneB      50         60         45         55
# ...
#
# sample_info.tsv:
# sample_id    condition    batch
# sample1      control      batch1
# sample2      treated      batch1
# sample3      control      batch2
# sample4      treated      batch2

Rscript -e '
library(DESeq2)

# --- Placeholder: Load your actual count data and sample information ---
# For demonstration, creating dummy data
set.seed(123)
num_genes <- 100
num_samples <- 6
counts <- matrix(sample(1:1000, num_genes * num_samples, replace = TRUE), ncol = num_samples)
colnames(counts) <- paste0("sample", 1:num_samples)
rownames(counts) <- paste0("gene", 1:num_genes)

sample_info <- data.frame(
  sample_id = colnames(counts),
  condition = factor(rep(c("control", "treated"), num_samples / 2)),
  batch = factor(c("batch1", "batch1", "batch2", "batch2", "batch1", "batch2")) # Example batch effect
)
rownames(sample_info) <- sample_info$sample_id

# Ensure sample order matches
sample_info <- sample_info[colnames(counts), ]
# --------------------------------------------------------------------

# Create DESeqDataSet object
# The design formula specifies factors to consider.
# Here, "batch" is included to correct for batch effects.
# "condition" is included as the primary biological variable.
# DESeq2 internally uses replicates to estimate within-group dispersion.
dds <- DESeqDataSetFromMatrix(countData = counts,
                              colData = sample_info,
                              design = ~ batch + condition)

# Run the DESeq pipeline to estimate size factors, dispersions, and fit the model.
# This step computes within-group dispersion and incorporates the design formula
# to account for batch effects.
dds <- DESeq(dds)

# The estimated dispersions can be accessed and plotted:
# plotDispEsts(dds)

message("DESeq2 analysis completed, including dispersion estimation and batch effect correction.")
message("The dds object now contains normalized counts, dispersion estimates, and model fits.")
'

View on GitHub

Genes with false discovery rate (FDR) < 0.05 and an absolute fold-change (FC) >= 1.5 were identified as significantly different when comparing two conditions.

DESeq2 (Inferred with models/gemini-2.5-flash) v1.42.0 GitHub

$ Bash example

# Install R and DESeq2 if not already installed
# sudo apt-get update && sudo apt-get install -y r-base
# R -e "if (!requireNamespace('BiocManager', quietly = TRUE)) install.packages('BiocManager'); BiocManager::install('DESeq2')"

# Create dummy input files for demonstration purposes.
# In a real pipeline, 'gene_counts.tsv' would be the output from a quantification tool (e.g., featureCounts, Salmon, Kallisto)
# and 'sample_info.tsv' would be a metadata file describing your experimental design.

# Dummy gene count matrix (rows: genes, columns: samples)
echo -e "gene\tsampleA1\tsampleA2\tsampleB1\tsampleB2" > gene_counts.tsv
echo -e "gene1\t100\t120\t200\t210" >> gene_counts.tsv
echo -e "gene2\t50\t60\t30\t35" >> gene_counts.tsv
echo -e "gene3\t10\t12\t15\t18" >> gene_counts.tsv
echo -e "gene4\t500\t550\t200\t220" >> gene_counts.tsv
echo -e "gene5\t20\t25\t40\t45" >> gene_counts.tsv
echo -e "gene6\t300\t310\t305\t315" >> gene_counts.tsv
echo -e "gene7\t80\t85\t150\t160" >> gene_counts.tsv

# Dummy sample metadata (rows: samples, columns: experimental factors)
echo -e "sample\tcondition" > sample_info.tsv
echo -e "sampleA1\tA" >> sample_info.tsv
echo -e "sampleA2\tA" >> sample_info.tsv
echo -e "sampleB1\tB" >> sample_info.tsv
echo -e "sampleB2\tB" >> sample_info.tsv

# Execute R script for DESeq2 analysis
Rscript -e '
library(DESeq2)

# Define input and output file paths
count_matrix_file <- "gene_counts.tsv"
sample_metadata_file <- "sample_info.tsv"
output_file <- "significant_genes.tsv"

# Read count data and sample metadata
count_data <- read.table(count_matrix_file, header=TRUE, row.names=1, sep="\t")
sample_info <- read.table(sample_metadata_file, header=TRUE, row.names=1, sep="\t")

# Ensure sample order in count_data matches sample_info
count_data <- count_data[, rownames(sample_info)]

# DESeq2 expects integer counts, so round if necessary (e.g., from Salmon/Kallisto)
dds <- DESeqDataSetFromMatrix(countData = round(count_data),
                              colData = sample_info,
                              design = ~ condition) # Assuming "condition" is the primary experimental variable

# Run DESeq2 differential expression analysis
dds <- DESeq(dds)

# Extract results. The description specifies FDR < 0.05, which corresponds to alpha=0.05.
res <- results(dds, alpha=0.05)

# Filter for significant genes based on the criteria:
# 1. False Discovery Rate (FDR) < 0.05 (padj < 0.05)
# 2. Absolute Fold-Change (FC) >= 1.5 (abs(log2FoldChange) >= log2(1.5))
significant_genes <- subset(res, padj < 0.05 & abs(log2FoldChange) >= log2(1.5))

# Order results by adjusted p-value for easier interpretation
significant_genes <- significant_genes[order(significant_genes$padj), ]

# Save the table of significantly different genes
write.table(as.data.frame(significant_genes), file=output_file, sep="\t", quote=FALSE, row.names=TRUE)

message(paste("Identified", nrow(significant_genes), "significantly different genes. Results saved to", output_file))
'

View on GitHub

Normalized counts were log transformed, filtered, scaled, and centered prior to clustering and heatmap generation with Cluster3 (de Hoon et al.

Cluster3 v3.0

$ Bash example

# Install C Clustering Library (which provides the 'cluster' executable, the core algorithm for Cluster 3.0)
# For Debian/Ubuntu:
# sudo apt-get update
# sudo apt-get install c-cluster-programs

# Input file: normalized_counts.tsv (assuming a tab-separated matrix with gene IDs as first column and sample IDs as header)
INPUT_FILE="normalized_counts.tsv"
OUTPUT_PREFIX="clustered_data"

# Perform log transformation, filtering, scaling, centering, clustering, and generate files for heatmap visualization.
# The 'cluster' executable from the C Clustering Library directly supports these operations via command-line parameters.
# -l: Log transform data (e.g., log2 transformation)
# -g: Normalize genes (adjusts genes to have mean 0 and variance 1, effectively scaling and centering gene-wise)
# -a: Normalize arrays (adjusts arrays/samples to have mean 0 and variance 1, effectively scaling and centering sample-wise)
# -f 0 0: No filtering (adjust parameters if specific filtering criteria are needed, e.g., -f 1 10 for 10% missing values)
# -m a: Hierarchical clustering (agglomerative method)
# -s c: Similarity measure: Pearson correlation (a common choice for gene expression data)
# -u: Output gene tree (.gtr file)
# -e: Output array tree (.atr file)
# -o: Output prefix for the generated .cdt, .gtr, and .atr files

cluster -l -g -a -f 0 0 -m a -s c -u -e -o "${OUTPUT_PREFIX}" "${INPUT_FILE}"

# Heatmap visualization is typically performed using a companion viewer like Java TreeView or similar,
# which reads the .cdt, .gtr, and .atr files generated by the 'cluster' command.
# For example, to open the generated data with Java TreeView (if installed and associated with .cdt files):
# java -jar javatreeview.jar "${OUTPUT_PREFIX}.cdt"

2004), and visualized with JavaTreeView (Saldanha 2004).

JavaTreeView v2004

$ Bash example

No specific command-line execution details are provided in the description for JavaTreeView, as it is primarily a graphical visualization tool.

GSEA was carried out with the GenePattern interface, https://genepattern.broadinstitute.org) using preranked lists generated from FDR values, setting gene set permutations to 1000 and using the Hallmark collection in MSigDB version5.0.

GSEA vv2.2.4 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Download GSEA software (e.g., GSEA_2.2.4.jar for command-line execution)
# The GSEA software can be downloaded from the Broad Institute's GSEA website.
# For example, assuming GSEA_2.2.4.jar is downloaded and placed in a 'gsea_home' directory:
# mkdir -p gsea_home
# # Place GSEA_2.2.4.jar here
# export GSEA_HOME=$(pwd)/gsea_home

# Download MSigDB Hallmark collection v5.0
# mkdir -p msigdb_v5.0
# wget -O msigdb_v5.0/h.all.v5.0.symbols.gmt https://data.broadinstitute.org/gsea-msigdb/msigdb/release/5.0/h.all.v5.0.symbols.gmt

# Prepare the preranked list file (e.g., 'input_preranked.rnk')
# This file should contain two columns: Gene Symbol and Rank Value (e.g., FDR values or a derived score).
# Example content of input_preranked.rnk:
# GENE_SYMBOL	RANK_VALUE
# TP53	0.95
# MYC	0.88
# ...

# Run GSEA with the preranked list, Hallmark gene sets, and specified permutations
# Note: The GenePattern interface provides a GUI for GSEA. This command reflects the underlying command-line execution.
# The '-Xmx' parameter sets the maximum Java heap size, adjust as needed.
java -Xmx4000m -cp "$GSEA_HOME/gsea-cli.jar" xtools.gsea.GseaPreranked \
    -rnk input_preranked.rnk \
    -gmx msigdb_v5.0/h.all.v5.0.symbols.gmt \
    -out gsea_output_preranked \
    -perm 1000 \
    -plot_top_x 20 \
    -collapse false \
    -mode gene_set \
    -norm meandiv \
    -rnd_seed timestamp \
    -set_max 500 \
    -set_min 15

View on GitHub

GO Term and KEGG Pathway analysis were performed with David: http://david.ncifcrf.gov/.

DAVID v6.8

$ Bash example

# Create a placeholder gene list file (e.g., Entrez Gene IDs)
echo "10000" > my_gene_list.txt
echo "10001" >> my_gene_list.txt
echo "10002" >> my_gene_list.txt
echo "10003" >> my_gene_list.txt
echo "10004" >> my_gene_list.txt

# Install the suds-py3 library for SOAP client interaction if not already installed
# pip install suds-py3

# Python script to interact with DAVID Web Service API
# This script performs GO Term and KEGG Pathway analysis.
python3 - <<EOF
from suds.client import Client
import os

# DAVID Web Service URL
DAVID_WS_URL = "https://david.ncifcrf.gov/webservice/services/DAVIDWebService?wsdl"

# Your DAVID email (required for some API calls, though often optional for basic queries)
# Replace with your actual email if needed for authentication
# EMAIL = "your_email@example.com"

# Input gene list file
GENE_LIST_FILE = "my_gene_list.txt"
LIST_NAME = "My_Experiment_Genes"
LIST_TYPE = "ENTREZ_GENE_ID" # Or "GENE_SYMBOL", "ENSEMBL_GENE_ID", etc.

# Annotation categories for enrichment analysis
ANNOTATION_CATEGORIES = ["GOTERM_BP_ALL", "KEGG_PATHWAY"] # GO Biological Process and KEGG Pathways

# Output file names
GO_KEGG_CHART_FILE = "david_go_kegg_chart.txt"
GO_KEGG_TABLE_FILE = "david_go_kegg_table.txt"

try:
    # Initialize SOAP client
    client = Client(DAVID_WS_URL)

    # Login (optional, depending on DAVID server configuration and specific API calls)
    # if EMAIL:
    #     client.service.authenticate(EMAIL)
    #     print(f"Authenticated with DAVID Web Service (email: {EMAIL}).")

    # Read gene list from file
    with open(GENE_LIST_FILE, 'r') as f:
        gene_list_content = f.read().strip()

    # Add gene list to DAVID
    print(f"Adding gene list from {GENE_LIST_FILE} to DAVID...")
    # The '0' indicates it's a gene list, '1' would be for a background list
    client.service.addList(gene_list_content, LIST_TYPE, LIST_NAME, 0)
    print(f"Gene list '{LIST_NAME}' added successfully.")

    # Set annotation categories
    print(f"Setting annotation categories: {', '.join(ANNOTATION_CATEGORIES)}...")
    client.service.setCategories(ANNOTATION_CATEGORIES)
    print("Categories set.")

    # Get Functional Annotation Chart (summary of enrichment)
    print(f"Retrieving Functional Annotation Chart and saving to {GO_KEGG_CHART_FILE}...")
    chart_results = client.service.getFunctionalAnnotationChart()
    with open(GO_KEGG_CHART_FILE, 'w') as f:
        # Write header
        f.write("Term\tCount\t%\tPValue\tBenjamini\tFDR\tGenes\tListTotal\tPopHits\tPopTotal\tFoldEnrichment\tBonferroni\n")
        # Write results
        for row in chart_results:
            f.write(f"{row.Term}\t{row.Count}\t{row.Percentage}\t{row.PValue}\t{row.Benjamini}\t{row.FDR}\t{row.Genes}\t{row.ListHits}\t{row.PopHits}\t{row.PopTotal}\t{row.FoldEnrichment}\t{row.Bonferroni}\n")
    print("Functional Annotation Chart saved.")

    # Get Functional Annotation Table (detailed gene-term mapping)
    print(f"Retrieving Functional Annotation Table and saving to {GO_KEGG_TABLE_FILE}...")
    table_results = client.service.getFunctionalAnnotationTable()
    with open(GO_KEGG_TABLE_FILE, 'w') as f:
        # Write header
        f.write("GeneID\tGeneName\tTerm\tCategory\tPValue\tBenjamini\tFDR\n")
        # Write results
        for row in table_results:
            f.write(f"{row.GeneID}\t{row.GeneName}\t{row.Term}\t{row.Category}\t{row.PValue}\t{row.Benjamini}\t{row.FDR}\n")
    print("Functional Annotation Table saved.")

except Exception as e:
    print(f"An error occurred: {e}")
    print("Please ensure 'suds-py3' is installed (`pip install suds-py3`) and your gene list file is correctly formatted.")
EOF

Area-proportional Venn diagrams were plotted using BioVenn (Hulsen et al.

BioVenn v1.1.0 (Inferred with models/gemini-2.5-flash)

$ Bash example

# Install biovenn Python package and matplotlib for plotting
# pip install biovenn matplotlib

# Example usage:
# Assuming set1.txt, set2.txt, set3.txt contain lists of items (one per line)
# and venn_diagram.png is the desired output file.
# Create dummy input files for demonstration if needed:
# echo -e "geneA\ngeneB\ngeneC\ngeneD" > set1.txt
# echo -e "geneC\ngeneD\ngeneE\ngeneF" > set2.txt
# echo -e "geneD\ngeneF\ngeneG\ngeneH" > set3.txt

python -c '
import biovenn as bv
import matplotlib.pyplot as plt
import sys
import os

def read_list_from_file(filepath):
    with open(filepath, "r") as f:
        return [line.strip() for line in f if line.strip()]

if __name__ == "__main__":
    # Placeholder for actual input files and output image.
    # In a real pipeline, these would be passed as arguments or configured.
    list_files = ["set1.txt", "set2.txt", "set3.txt"]
    output_image = "venn_diagram.png"
    
    # Check if input files exist
    for f in list_files:
        if not os.path.exists(f):
            print(f"Error: Input file not found: {f}", file=sys.stderr)
            sys.exit(1)

    lists_data = [read_list_from_file(f) for f in list_files]
    names = [f"Set {i+1}" for i in range(len(lists_data))]

    # BioVenn Python package supports 2, 3, or 4 sets for area-proportional diagrams
    if len(lists_data) == 2:
        venn = bv.venn2(lists_data[0], lists_data[1], names=names)
    elif len(lists_data) == 3:
        venn = bv.venn3(lists_data[0], lists_data[1], lists_data[2], names=names)
    elif len(lists_data) == 4:
        venn = bv.venn4(lists_data[0], lists_data[1], lists_data[2], lists_data[3], names=names)
    else:
        print("Error: BioVenn Python package currently supports 2, 3, or 4 sets for area-proportional diagrams.", file=sys.stderr)
        sys.exit(1)

    plt.title("Area-Proportional Venn Diagram")
    plt.savefig(output_image, dpi=300, bbox_inches="tight")
    plt.close()
'

13

2008).

Unknown (Inferred with models/gemini-2.5-flash) vUnknown

Enrichr: http://amp.pharm.mssm.edu/Enrichr.

Enrichr vWeb Service/API GitHub

$ Bash example

# Enrichr is a web-based tool for gene set enrichment analysis.
# In automated pipelines, it's typically accessed via its API using a client library (e.g., in Python or R).
# This step assumes a gene list (e.g., differentially expressed genes) has been generated upstream.

# Example: Using a Python script (e.g., 'run_enrichr.py') that leverages the gseapy library
# to interact with the Enrichr API.
# The 'run_enrichr.py' script would take a gene list file and an output prefix as arguments.

# Installation of gseapy (if using Python API client):
# pip install gseapy

# Example execution command:
# Replace 'your_gene_list.txt' with the path to your gene list file (one gene symbol per line).
# Replace 'enrichr_results' with your desired output file prefix.
# The 'run_enrichr.py' script would need to be created separately to handle the API calls.
python run_enrichr.py your_gene_list.txt enrichr_results

# Enrichr uses its own internal databases of gene sets (e.g., GO, KEGG, WikiPathways, etc.).
# The specific gene set libraries and organism (e.g., 'human', 'mouse') would be configured
# within the 'run_enrichr.py' script based on the experimental context.

View on GitHub

Tools Used

STAR HOMER

Raw Source Text

Raw RNA sequencing reads in FASTQ files were quality-tested using FASTQC (v0.11.8) (Andrews 2010) and mapped to the mouse reference genome (mm10) with STAR (v2.5.3a) aligner with default parameters (Dobin et al. 2012).
Raw or normalized gene expression levels were quantified across all the exons of RefSeq genes with analyzeRepeats.pl in HOMER (v4.11.1) (Heinz et al. 2010).
Differential expression analysis was performed with raw gene counts using R package, edgeR (v3.26.7) (Robinson et al. 2010), using replicates to compute within-group dispersion and correcting for batch effects. Genes with false discovery rate (FDR) < 0.05 and an absolute fold-change (FC) >= 1.5 were identified as significantly different when comparing two conditions.
Normalized counts were log transformed, filtered, scaled, and centered prior to clustering and heatmap generation with Cluster3 (de Hoon et al. 2004), and visualized with JavaTreeView (Saldanha 2004).
GSEA was carried out with the GenePattern interface, https://genepattern.broadinstitute.org) using preranked lists generated from FDR values, setting gene set permutations to 1000 and using the Hallmark collection in MSigDB version5.0. GO Term and KEGG Pathway analysis were performed with David: http://david.ncifcrf.gov/.
Area-proportional Venn diagrams were plotted using BioVenn (Hulsen et al. 2008). Enrichr: http://amp.pharm.mssm.edu/Enrichr.
Genome_build: mm10
Supplementary_files_format_and_content: Normalized expression table generated with analyzeRepeats.pl. Please see header for specific arguments used.

← Back to Analysis