GSE123881 Processing Pipeline
RNA-Seq
code_examples
3 steps
Publication
DDX5 promotes oncogene C3 and FABP1 expressions and drives intestinal inflammation and tumorigenesis.Life science alliance (2020) — PMID 32817263
Dataset
GSE123881DDX5 promotes oncogene C3 and FABP1 expressions and drives intestinal inflammation and tumorigenesis
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Sequenced reads were mapped to mm10 using STAR v2.6.1a with parameters "--outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverReadLmax 0.04 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000".
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star=2.6.1a # Create STAR genome index (if not already created) # STAR --runThreadN 8 --runMode genomeGenerate --genomeDir /path/to/STAR_index/mm10 --genomeFastaFiles /path/to/mm10.fa --sjdbGTFfile /path/to/mm10.gtf # Align sequenced reads to mm10 using STAR STAR --genomeDir /path/to/STAR_index/mm10 \ --readFilesIn input_R1.fastq.gz input_R2.fastq.gz \ --outFileNamePrefix aligned_reads_ \ --outSAMtype BAM SortedByCoordinate \ --runThreadN 8 \ --outFilterMultimapNmax 20 \ --alignSJoverhangMin 8 \ --alignSJDBoverhangMin 1 \ --outFilterMismatchNmax 999 \ --outFilterMismatchNoverReadLmax 0.04 \ --alignIntronMin 20 \ --alignIntronMax 1000000 \ --alignMatesGapMax 1000000 -
2
Uniquely mapped reads overlapping with the exons were counted using featureCounts for each gene in GENCODE.vM19 annotation.
featureCounts v2.0.6$ Bash example
# Install featureCounts (part of Subread package) # conda install -c bioconda subread # Download GENCODE vM19 annotation GTF # wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M19/gencode.vM19.annotation.gtf.gz # gunzip gencode.vM19.annotation.gtf.gz # Define variables GTF_FILE="gencode.vM19.annotation.gtf" INPUT_BAM="aligned_reads.bam" # Placeholder for uniquely mapped and sorted BAM file OUTPUT_FILE="gene_counts.txt" NUM_THREADS=8 # Adjust as needed # Run featureCounts featureCounts -a "${GTF_FILE}" \ -o "${OUTPUT_FILE}" \ -F GTF \ -t exon \ -g gene_id \ --primary \ -s 0 \ -T "${NUM_THREADS}" \ "${INPUT_BAM}" -
3
Differential expression analysis were performed by DESeq2(v1.18.1) package
$ Bash example
#!/bin/bash # Install R and Bioconductor if not already installed # sudo apt-get update # sudo apt-get install -y r-base # R -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")' # R -e 'BiocManager::install("DESeq2", version = "3.8")' # DESeq2 v1.18.1 corresponds to Bioconductor 3.8 # Create dummy count data (replace with your actual count matrix) echo -e "gene\tsample1\tsample2\tsample3\tsample4" > counts.tsv echo -e "geneA\t100\t120\t50\t60" >> counts.tsv echo -e "geneB\t20\t25\t80\t90" >> counts.tsv echo -e "geneC\t500\t550\t200\t220" >> counts.tsv echo -e "geneD\t10\t15\t5\t8" >> counts.tsv # Create dummy sample information (replace with your actual sample metadata) echo -e "sample\tcondition" > sample_info.tsv echo -e "sample1\tcontrol" >> sample_info.tsv echo -e "sample2\tcontrol" >> sample_info.tsv echo -e "sample3\ttreated" >> sample_info.tsv echo -e "sample4\ttreated" >> sample_info.tsv # Run DESeq2 analysis using R Rscript -e ' library(DESeq2) # Load count data # Ensure the first column is gene names and subsequent columns are sample counts count_data <- read.table("counts.tsv", header = TRUE, row.names = 1) # Load sample information # Ensure the first column is sample names and subsequent columns are metadata sample_info <- read.table("sample_info.tsv", header = TRUE, row.names = 1) # Ensure sample order in colData matches countData columns sample_info <- sample_info[colnames(count_data), , drop = FALSE] # Create DESeqDataSet object # Replace `~ condition` with your actual design formula based on `sample_info` columns dds <- DESeqDataSetFromMatrix(countData = count_data, colData = sample_info, design = ~ condition) # Run DESeq2 analysis dds <- DESeq(dds) # Get results # You can specify contrast, alpha, etc. here, e.g., results(dds, contrast=c("condition","treated","control")) res <- results(dds) # Save results to a TSV file write.table(as.data.frame(res), file = "deseq2_results.tsv", sep = "\t", quote = FALSE, row.names = TRUE) message("DESeq2 analysis complete. Results saved to deseq2_results.tsv") '
Raw Source Text
Sequenced reads were mapped to mm10 using STAR v2.6.1a with parameters "--outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverReadLmax 0.04 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000". Uniquely mapped reads overlapping with the exons were counted using featureCounts for each gene in GENCODE.vM19 annotation. Differential expression analysis were performed by DESeq2(v1.18.1) package Genome_build: MM10 Supplementary_files_format_and_content: tab-delimited text files include raw counts, normalized counts values log2FoldChange and pvalue for each gene