GSE295421 Processing Pipeline
RNA-Seq
code_examples
3 steps
Publication
Enhancing RNA base editing on mammalian transcripts with small nuclear RNAs.Nature chemical biology (2025) — PMID 40968292
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Reads were mapped using STAR 2.7.6a
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star # Define variables READS_R1="reads_R1.fastq.gz" # Replace with your actual R1 fastq file READS_R2="reads_R2.fastq.gz" # Replace with your actual R2 fastq file (remove if single-end) GENOME_DIR="path/to/STAR_genome_index/GRCh38" # Replace with the path to your STAR genome index for GRCh38 (e.g., built from Gencode v44 annotation) OUTPUT_PREFIX="mapped_reads_" # Prefix for output files THREADS=8 # Number of threads to use # Run STAR alignment with common RNA-seq parameters STAR --genomeDir "${GENOME_DIR}" \ --readFilesIn "${READS_R1}" "${READS_R2}" \ --runThreadN "${THREADS}" \ --outFileNamePrefix "${OUTPUT_PREFIX}" \ --outSAMtype BAM SortedByCoordinate \ --outSAMattributes Standard \ --outFilterMultimapNmax 20 \ --outFilterMismatchNmax 999 \ --alignIntronMin 20 \ --alignIntronMax 1000000 \ --alignSJDBoverhangMin 1 \ --outReadsUnmapped Fastx \ --quantMode GeneCounts # Optional: for gene quantification (e.g., for RNA-seq) -
2
Read count extraction was performed using featureCounts from the Subread package, with results sorted into counts matrices
featureCounts v2.0.3$ Bash example
# Install featureCounts (part of Subread package) # conda install -c bioconda subread # Define input and output files INPUT_BAM="aligned_reads.bam" # Replace with your actual aligned BAM file GENOME_GTF="gencode.v38.annotation.gtf" # Replace with your actual GTF annotation file (e.g., from GENCODE for GRCh38) OUTPUT_COUNTS="gene_counts.txt" # Run featureCounts to extract read counts into a matrix # -a: Annotation file (GTF/GFF) # -o: Output file for counts # -F GTF: Specify GTF format for annotation # -t exon: Count features of type 'exon' # -g gene_id: Group features by 'gene_id' # -s 0: Unstranded (0), stranded (1), reverse stranded (2) - adjust as needed # -T 8: Use 8 threads for parallel processing # --primary: Only count primary alignments # --ignore-multimapping: Ignore multi-mapping reads featureCounts -a "${GENOME_GTF}" \ -o "${OUTPUT_COUNTS}" \ -F GTF \ -t exon \ -g gene_id \ -s 0 \ -T 8 \ --primary \ --ignore-multimapping \ "${INPUT_BAM}" -
3
Differential gene expression analysis was performed on the counts matrices using DESeq2
$ Bash example
# Install R and Bioconductor if not already present # For Ubuntu/Debian: # sudo apt-get update # sudo apt-get install r-base # Open R and install Bioconductor packages # R # if (!require("BiocManager", quietly = TRUE)) # install.packages("BiocManager") # BiocManager::install("DESeq2") # quit() # Create dummy input files for demonstration purposes # In a real scenario, these files would be generated by upstream steps (e.g., featureCounts, Salmon, Kallisto) # Example counts.tsv (replace with your actual counts matrix) cat <<EOF > counts.tsv GeneID Sample1 Sample2 Sample3 Sample4 GeneA 100 120 50 60 GeneB 500 550 200 220 GeneC 10 15 5 8 GeneD 2000 2100 1000 1100 GeneE 5 7 20 25 EOF # Example sample_info.tsv (replace with your actual sample metadata) cat <<EOF > sample_info.tsv Sample condition Sample1 control Sample2 control Sample3 treated Sample4 treated EOF # R script for DESeq2 analysis cat <<'EOF' > run_deseq2.R # Load DESeq2 package library(DESeq2) # --- Configuration --- # Input files counts_file <- "counts.tsv" # Path to your raw count matrix sample_info_file <- "sample_info.tsv" # Path to your sample metadata file # Output file results_file <- "deseq2_results.tsv" # Design formula (e.g., ~ condition, ~ treatment + condition, etc.) # This should match a column name in your sample_info_file design_formula_str <- "~ condition" # --- Read input data --- # Read count matrix # Assuming counts.tsv has genes/features as rows and samples as columns # First column is gene ID, subsequent columns are counts counts_data <- read.delim(counts_file, row.names = 1, sep = "\t") # Read sample information # Assuming sample_info.tsv has sample IDs as rows and metadata as columns # First column is sample ID, subsequent columns are metadata (e.g., condition, treatment) sample_info <- read.delim(sample_info_file, row.names = 1, sep = "\t") # Ensure sample names match and are in the same order # It's crucial that the column names of counts_data match the row names of sample_info # and are in the same order. Reorder counts_data columns to match sample_info rows. counts_data <- counts_data[, rownames(sample_info)] # Convert counts to integer matrix (DESeq2 requirement) counts_data <- round(counts_data) counts_data <- as.matrix(counts_data) # --- Create DESeqDataSet object --- dds <- DESeqDataSetFromMatrix(countData = counts_data, colData = sample_info, design = as.formula(design_formula_str)) # --- Pre-filtering (optional but recommended) --- # Remove genes with very low counts across all samples # For example, keep genes that have at least 10 counts in total across all samples keep <- rowSums(counts(dds)) >= 10 dds <- dds[keep,] # --- Run DESeq2 analysis --- dds <- DESeq(dds) # --- Extract results --- # Get results for the primary comparison (e.g., "condition_treated_vs_control") # To see available comparisons: resultsNames(dds) # Assuming 'condition' has levels 'control' and 'treated' res <- results(dds, contrast=c("condition", "treated", "control")) # Order results by adjusted p-value res <- res[order(res$padj),] # --- Write results to file --- write.table(as.data.frame(res), file = results_file, sep = "\t", quote = FALSE, row.names = TRUE) message(paste("DESeq2 analysis complete. Results saved to:", results_file)) EOF # Execute the R script Rscript run_deseq2.R
Raw Source Text
Reads were mapped using STAR 2.7.6a Read count extraction was performed using featureCounts from the Subread package, with results sorted into counts matrices Differential gene expression analysis was performed on the counts matrices using DESeq2 Assembly: GRCh38 Supplementary files format and content: Supplementary files format and content: tab-delimited text files include gene expression analysis for each transfection condition relative to pUC19 (empty vector)