GSE146014 Processing Pipeline

RNA-Seq code_examples 3 steps

Publication

RNA binding protein DDX5 directs tuft cell specification and function to regulate microbial repertoire and disease susceptibility in the intestine.

Gut (2022) — PMID 34853057

Dataset

GSE146014

DDX5 in colonic tumors

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Sequenced reads were mapped to mm10 using STAR v2.6.1a with parameters "--outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverReadLmax 0.04 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000".

    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star=2.6.1a
    
    # Create STAR genome index (if not already created)
    # STAR --runThreadN 8 --runMode genomeGenerate --genomeDir /path/to/STAR_index/mm10 --genomeFastaFiles /path/to/mm10.fa --sjdbGTFfile /path/to/mm10.gtf
    
    # Align sequenced reads to mm10 using STAR
    STAR --genomeDir /path/to/STAR_index/mm10 \
         --readFilesIn input_R1.fastq.gz input_R2.fastq.gz \
         --outFileNamePrefix aligned_reads_ \
         --outSAMtype BAM SortedByCoordinate \
         --runThreadN 8 \
         --outFilterMultimapNmax 20 \
         --alignSJoverhangMin 8 \
         --alignSJDBoverhangMin 1 \
         --outFilterMismatchNmax 999 \
         --outFilterMismatchNoverReadLmax 0.04 \
         --alignIntronMin 20 \
         --alignIntronMax 1000000 \
         --alignMatesGapMax 1000000
  2. 2

    Uniquely mapped reads overlapping with the exons were counted using featureCounts for each gene in GENCODE.vM19 annotation.

    featureCounts v2.0.6
    $ Bash example
    # Install featureCounts (part of Subread package)
    # conda install -c bioconda subread
    
    # Download GENCODE vM19 annotation GTF
    # wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M19/gencode.vM19.annotation.gtf.gz
    # gunzip gencode.vM19.annotation.gtf.gz
    
    # Define variables
    GTF_FILE="gencode.vM19.annotation.gtf"
    INPUT_BAM="aligned_reads.bam" # Placeholder for uniquely mapped and sorted BAM file
    OUTPUT_FILE="gene_counts.txt"
    NUM_THREADS=8 # Adjust as needed
    
    # Run featureCounts
    featureCounts -a "${GTF_FILE}" \
                  -o "${OUTPUT_FILE}" \
                  -F GTF \
                  -t exon \
                  -g gene_id \
                  --primary \
                  -s 0 \
                  -T "${NUM_THREADS}" \
                  "${INPUT_BAM}"
  3. 3

    Differential expression analysis were performed by DESeq2(v1.18.1) package

    $ Bash example
    #!/bin/bash
    
    # Install R and Bioconductor if not already installed
    # sudo apt-get update
    # sudo apt-get install -y r-base
    # R -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")'
    # R -e 'BiocManager::install("DESeq2", version = "3.8")' # DESeq2 v1.18.1 corresponds to Bioconductor 3.8
    
    # Create dummy count data (replace with your actual count matrix)
    echo -e "gene\tsample1\tsample2\tsample3\tsample4" > counts.tsv
    echo -e "geneA\t100\t120\t50\t60" >> counts.tsv
    echo -e "geneB\t20\t25\t80\t90" >> counts.tsv
    echo -e "geneC\t500\t550\t200\t220" >> counts.tsv
    echo -e "geneD\t10\t15\t5\t8" >> counts.tsv
    
    # Create dummy sample information (replace with your actual sample metadata)
    echo -e "sample\tcondition" > sample_info.tsv
    echo -e "sample1\tcontrol" >> sample_info.tsv
    echo -e "sample2\tcontrol" >> sample_info.tsv
    echo -e "sample3\ttreated" >> sample_info.tsv
    echo -e "sample4\ttreated" >> sample_info.tsv
    
    # Run DESeq2 analysis using R
    Rscript -e '
      library(DESeq2)
    
      # Load count data
      # Ensure the first column is gene names and subsequent columns are sample counts
      count_data <- read.table("counts.tsv", header = TRUE, row.names = 1)
    
      # Load sample information
      # Ensure the first column is sample names and subsequent columns are metadata
      sample_info <- read.table("sample_info.tsv", header = TRUE, row.names = 1)
    
      # Ensure sample order in colData matches countData columns
      sample_info <- sample_info[colnames(count_data), , drop = FALSE]
    
      # Create DESeqDataSet object
      # Replace `~ condition` with your actual design formula based on `sample_info` columns
      dds <- DESeqDataSetFromMatrix(countData = count_data,
                                    colData = sample_info,
                                    design = ~ condition)
    
      # Run DESeq2 analysis
      dds <- DESeq(dds)
    
      # Get results
      # You can specify contrast, alpha, etc. here, e.g., results(dds, contrast=c("condition","treated","control"))
      res <- results(dds)
    
      # Save results to a TSV file
      write.table(as.data.frame(res), file = "deseq2_results.tsv", sep = "\t", quote = FALSE, row.names = TRUE)
    
      message("DESeq2 analysis complete. Results saved to deseq2_results.tsv")
    '

Tools Used

Raw Source Text
Sequenced reads were mapped to mm10 using STAR v2.6.1a with parameters "--outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverReadLmax 0.04 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000".
Uniquely mapped reads overlapping with the exons were counted using featureCounts for each gene in GENCODE.vM19 annotation.
Differential expression analysis were performed by DESeq2(v1.18.1) package
Genome_build: MM10
Supplementary_files_format_and_content: tab-delimited text files include raw counts, normalized counts values log2FoldChange and pvalue for each gene
← Back to Analysis