GSE136911 Processing Pipeline — Yeo Lab Publications

Publication

Suppression of Endothelial AGO1 Promotes Adipose Tissue Browning and Improves Metabolic Dysfunction.

Circulation (2020) — PMID 32393053

Dataset

small RNA Seq of Subcutaneous adipose tissue from endothelial-AGO1-knockout (EC-AGO1-KO) mice and wild-type (WT) littermates fed 16 week of high fat …

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

1

miRDeep2 0.0.8 was used to count the number of microRNAs

miRDeep2 v0.0.8 GitHub

$ Bash example

# Install miRDeep2 (example using conda)
# conda create -n miRDeep2_env
# conda activate miRDeep2_env
# conda install -c bioconda mirdeep2=0.0.8

# --- Placeholder for input files ---
# Raw sequencing reads (e.g., from small RNA-seq)
# Replace with your actual FASTQ file
reads_fastq="reads.fastq"

# Reference genome in FASTA format (e.g., human hg38)
# Download from NCBI, UCSC, or Ensembl
# Example: wget -O GRCh38.p14.genome.fa.gz "http://ftp.ensembl.org/pub/release-110/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz"
# gunzip GRCh38.p14.genome.fa.gz
genome_fa="GRCh38.p14.genome.fa"

# Known mature miRNAs in FASTA format (e.g., from miRBase)
# Download from miRBase: https://www.mirbase.org/ftp/CURRENT/
# Example: wget -O hsa_mature.fa.gz "ftp://mirbase.org/pub/mirbase/CURRENT/mature.fa.gz"
# gunzip hsa_mature.fa.gz
mature_mirnas_fa="hsa_mature.fa"

# Known precursor miRNAs in FASTA format (e.g., from miRBase)
# Example: wget -O hsa_precursor.fa.gz "ftp://mirbase.org/pub/mirbase/CURRENT/hairpin.fa.gz"
# gunzip hsa_precursor.fa.gz
precursor_mirnas_fa="hsa_precursor.fa"

# Species name (e.g., hsa for Homo sapiens)
species_name="hsa"

# --- Run miRDeep2.pl to identify and quantify microRNAs ---
# This command will predict novel miRNAs and quantify known miRNAs.
# miRDeep2.pl will internally call mapper.pl to align reads to the genome and known miRNAs.
# The output includes a comprehensive report and expression counts.
# -v: verbose output
# -P: number of threads
# -d: output directory
# -a: 3' adapter sequence (REQUIRED for trimming, replace with your actual adapter)
# -h: minimum read length after trimming (e.g., 18)
# -m: maximum read length after trimming (e.g., 26)
# The output directory 'miRDeep2_output' will contain files like 'mirna_predictions.csv' and 'expression_analyses.csv'
# Replace 'ADAPTER_SEQUENCE' with the actual 3' adapter sequence used in your small RNA-seq library preparation.
# A common Illumina TruSeq Small RNA 3' adapter is: TGGAATTCTCGGGTGCCAAGGAACTCC
miRDeep2.pl "$reads_fastq" "$genome_fa" "$mature_mirnas_fa" "$precursor_mirnas_fa" "$species_name" \
    -v \
    -P 8 \
    -d miRDeep2_output \
    -a TGGAATTCTCGGGTGCCAAGGAACTCC \
    -h 18 \
    -m 26

View on GitHub

2

The numbers are then manually normalized into FPKM values

Custom Python script for FPKM calculation (Inferred with models/gemini-2.5-flash) vN/A GitHub

$ Bash example

# This script calculates FPKM values from raw read counts, gene lengths, and total mapped reads.
# It is a placeholder for a custom script often used for "manual normalization".

# --- Placeholder for calculate_fpkm.py content ---
# #!/usr/bin/env python
# import pandas as pd
# import sys

# def calculate_fpkm(counts_file, gene_lengths_file, total_mapped_reads_file, output_file):
#     # Load counts (e.g., from featureCounts output)
#     counts_df = pd.read_csv(counts_file, sep='\t', index_col=0)
    
#     # Load gene lengths (e.g., derived from a GTF file)
#     gene_lengths_df = pd.read_csv(gene_lengths_file, sep='\t', index_col=0)
    
#     # Load total mapped reads (e.g., from STAR Log.final.out)
#     with open(total_mapped_reads_file, 'r') as f:
#         total_mapped_reads = float(f.read().strip())
    
#     # Ensure indices match for common genes
#     common_genes = counts_df.index.intersection(gene_lengths_df.index)
#     counts_df = counts_df.loc[common_genes]
#     gene_lengths_df = gene_lengths_df.loc[common_genes]
    
#     fpkm_df = pd.DataFrame(index=counts_df.index)
    
#     # Iterate through samples (columns in counts_df)
#     for col in counts_df.columns:
#         read_counts = counts_df[col]
#         # Assuming 'length_bp' is the column name for lengths in gene_lengths_df
#         gene_lengths = gene_lengths_df['length_bp'] 
        
#         # Avoid division by zero for genes with 0 length
#         gene_lengths_safe = gene_lengths.replace(0, 1) 
        
#         # FPKM calculation: (num_reads * 10^9) / (gene_length_bp * total_mapped_reads)
#         fpkm_values = (read_counts * 10**9) / (gene_lengths_safe * total_mapped_reads)
#         fpkm_df[col] = fpkm_values
        
#     fpkm_df.to_csv(output_file, sep='\t')

# if __name__ == "__main__":
#     if len(sys.argv) != 5:
#         print("Usage: python calculate_fpkm.py <counts_file> <gene_lengths_file> <total_mapped_reads_file> <output_file>")
#         sys.exit(1)
    
#     counts_file = sys.argv[1]
#     gene_lengths_file = sys.argv[2]
#     total_mapped_reads_file = sys.argv[3]
#     output_file = sys.argv[4]
    
#     calculate_fpkm(counts_file, gene_lengths_file, total_mapped_reads_file, output_file)
# ---------------------------------------------------

# Example input files (these would be generated by previous steps in a pipeline):
# raw_counts.tsv: Tab-separated file with gene IDs and raw read counts per sample.
#                 e.g., gene_id\tsample1\tsample2\nENSG000...\t100\t150
# gene_lengths.tsv: Tab-separated file with gene IDs and their lengths in base pairs.
#                   (Derived from a genome annotation GTF/GFF file, e.g., Homo_sapiens.GRCh38.109.gtf)
#                   e.g., gene_id\tlength_bp\nENSG000...\t1234
# total_mapped_reads.txt: A file containing a single number representing the total mapped reads for the sample.
#                         e.g., 50000000

# Make the custom FPKM calculation script executable (assuming it's named calculate_fpkm.py)
# chmod +x calculate_fpkm.py

# Execute the FPKM calculation script
./calculate_fpkm.py raw_counts.tsv gene_lengths.tsv total_mapped_reads.txt fpkm_normalized_counts.tsv

View on GitHub