GSE136911 Processing Pipeline

miRNA-Seq code_examples 2 steps

Publication

Suppression of Endothelial AGO1 Promotes Adipose Tissue Browning and Improves Metabolic Dysfunction.

Circulation (2020) — PMID 32393053

Dataset

GSE136911

small RNA Seq of Subcutaneous adipose tissue from endothelial-AGO1-knockout (EC-AGO1-KO) mice and wild-type (WT) littermates fed 16 week of high fat …

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    miRDeep2 0.0.8 was used to count the number of microRNAs

    miRDeep2 v0.0.8 GitHub
    $ Bash example
    # Install miRDeep2 (example using conda)
    # conda create -n miRDeep2_env
    # conda activate miRDeep2_env
    # conda install -c bioconda mirdeep2=0.0.8
    
    # --- Placeholder for input files ---
    # Raw sequencing reads (e.g., from small RNA-seq)
    # Replace with your actual FASTQ file
    reads_fastq="reads.fastq"
    
    # Reference genome in FASTA format (e.g., human hg38)
    # Download from NCBI, UCSC, or Ensembl
    # Example: wget -O GRCh38.p14.genome.fa.gz "http://ftp.ensembl.org/pub/release-110/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz"
    # gunzip GRCh38.p14.genome.fa.gz
    genome_fa="GRCh38.p14.genome.fa"
    
    # Known mature miRNAs in FASTA format (e.g., from miRBase)
    # Download from miRBase: https://www.mirbase.org/ftp/CURRENT/
    # Example: wget -O hsa_mature.fa.gz "ftp://mirbase.org/pub/mirbase/CURRENT/mature.fa.gz"
    # gunzip hsa_mature.fa.gz
    mature_mirnas_fa="hsa_mature.fa"
    
    # Known precursor miRNAs in FASTA format (e.g., from miRBase)
    # Example: wget -O hsa_precursor.fa.gz "ftp://mirbase.org/pub/mirbase/CURRENT/hairpin.fa.gz"
    # gunzip hsa_precursor.fa.gz
    precursor_mirnas_fa="hsa_precursor.fa"
    
    # Species name (e.g., hsa for Homo sapiens)
    species_name="hsa"
    
    # --- Run miRDeep2.pl to identify and quantify microRNAs ---
    # This command will predict novel miRNAs and quantify known miRNAs.
    # miRDeep2.pl will internally call mapper.pl to align reads to the genome and known miRNAs.
    # The output includes a comprehensive report and expression counts.
    # -v: verbose output
    # -P: number of threads
    # -d: output directory
    # -a: 3' adapter sequence (REQUIRED for trimming, replace with your actual adapter)
    # -h: minimum read length after trimming (e.g., 18)
    # -m: maximum read length after trimming (e.g., 26)
    # The output directory 'miRDeep2_output' will contain files like 'mirna_predictions.csv' and 'expression_analyses.csv'
    # Replace 'ADAPTER_SEQUENCE' with the actual 3' adapter sequence used in your small RNA-seq library preparation.
    # A common Illumina TruSeq Small RNA 3' adapter is: TGGAATTCTCGGGTGCCAAGGAACTCC
    miRDeep2.pl "$reads_fastq" "$genome_fa" "$mature_mirnas_fa" "$precursor_mirnas_fa" "$species_name" \
        -v \
        -P 8 \
        -d miRDeep2_output \
        -a TGGAATTCTCGGGTGCCAAGGAACTCC \
        -h 18 \
        -m 26
  2. 2

    The numbers are then manually normalized into FPKM values

    Custom Python script for FPKM calculation (Inferred with models/gemini-2.5-flash) vN/A GitHub
    $ Bash example
    # This script calculates FPKM values from raw read counts, gene lengths, and total mapped reads.
    # It is a placeholder for a custom script often used for "manual normalization".
    
    # --- Placeholder for calculate_fpkm.py content ---
    # #!/usr/bin/env python
    # import pandas as pd
    # import sys
    
    # def calculate_fpkm(counts_file, gene_lengths_file, total_mapped_reads_file, output_file):
    #     # Load counts (e.g., from featureCounts output)
    #     counts_df = pd.read_csv(counts_file, sep='\t', index_col=0)
        
    #     # Load gene lengths (e.g., derived from a GTF file)
    #     gene_lengths_df = pd.read_csv(gene_lengths_file, sep='\t', index_col=0)
        
    #     # Load total mapped reads (e.g., from STAR Log.final.out)
    #     with open(total_mapped_reads_file, 'r') as f:
    #         total_mapped_reads = float(f.read().strip())
        
    #     # Ensure indices match for common genes
    #     common_genes = counts_df.index.intersection(gene_lengths_df.index)
    #     counts_df = counts_df.loc[common_genes]
    #     gene_lengths_df = gene_lengths_df.loc[common_genes]
        
    #     fpkm_df = pd.DataFrame(index=counts_df.index)
        
    #     # Iterate through samples (columns in counts_df)
    #     for col in counts_df.columns:
    #         read_counts = counts_df[col]
    #         # Assuming 'length_bp' is the column name for lengths in gene_lengths_df
    #         gene_lengths = gene_lengths_df['length_bp'] 
            
    #         # Avoid division by zero for genes with 0 length
    #         gene_lengths_safe = gene_lengths.replace(0, 1) 
            
    #         # FPKM calculation: (num_reads * 10^9) / (gene_length_bp * total_mapped_reads)
    #         fpkm_values = (read_counts * 10**9) / (gene_lengths_safe * total_mapped_reads)
    #         fpkm_df[col] = fpkm_values
            
    #     fpkm_df.to_csv(output_file, sep='\t')
    
    # if __name__ == "__main__":
    #     if len(sys.argv) != 5:
    #         print("Usage: python calculate_fpkm.py <counts_file> <gene_lengths_file> <total_mapped_reads_file> <output_file>")
    #         sys.exit(1)
        
    #     counts_file = sys.argv[1]
    #     gene_lengths_file = sys.argv[2]
    #     total_mapped_reads_file = sys.argv[3]
    #     output_file = sys.argv[4]
        
    #     calculate_fpkm(counts_file, gene_lengths_file, total_mapped_reads_file, output_file)
    # ---------------------------------------------------
    
    # Example input files (these would be generated by previous steps in a pipeline):
    # raw_counts.tsv: Tab-separated file with gene IDs and raw read counts per sample.
    #                 e.g., gene_id\tsample1\tsample2\nENSG000...\t100\t150
    # gene_lengths.tsv: Tab-separated file with gene IDs and their lengths in base pairs.
    #                   (Derived from a genome annotation GTF/GFF file, e.g., Homo_sapiens.GRCh38.109.gtf)
    #                   e.g., gene_id\tlength_bp\nENSG000...\t1234
    # total_mapped_reads.txt: A file containing a single number representing the total mapped reads for the sample.
    #                         e.g., 50000000
    
    # Make the custom FPKM calculation script executable (assuming it's named calculate_fpkm.py)
    # chmod +x calculate_fpkm.py
    
    # Execute the FPKM calculation script
    ./calculate_fpkm.py raw_counts.tsv gene_lengths.tsv total_mapped_reads.txt fpkm_normalized_counts.tsv
Raw Source Text
miRDeep2 0.0.8 was used to count the number of  microRNAs
The numbers are then manually normalized into FPKM values
Genome_build: mm10
Supplementary_files_format_and_content: tab-delimited text files include FPKM values
← Back to Analysis