GSE136911 Processing Pipeline
miRNA-Seq
code_examples
2 steps
Publication
Suppression of Endothelial AGO1 Promotes Adipose Tissue Browning and Improves Metabolic Dysfunction.Circulation (2020) — PMID 32393053
Dataset
GSE136911small RNA Seq of Subcutaneous adipose tissue from endothelial-AGO1-knockout (EC-AGO1-KO) mice and wild-type (WT) littermates fed 16 week of high fat …
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
miRDeep2 0.0.8 was used to count the number of microRNAs
$ Bash example
# Install miRDeep2 (example using conda) # conda create -n miRDeep2_env # conda activate miRDeep2_env # conda install -c bioconda mirdeep2=0.0.8 # --- Placeholder for input files --- # Raw sequencing reads (e.g., from small RNA-seq) # Replace with your actual FASTQ file reads_fastq="reads.fastq" # Reference genome in FASTA format (e.g., human hg38) # Download from NCBI, UCSC, or Ensembl # Example: wget -O GRCh38.p14.genome.fa.gz "http://ftp.ensembl.org/pub/release-110/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz" # gunzip GRCh38.p14.genome.fa.gz genome_fa="GRCh38.p14.genome.fa" # Known mature miRNAs in FASTA format (e.g., from miRBase) # Download from miRBase: https://www.mirbase.org/ftp/CURRENT/ # Example: wget -O hsa_mature.fa.gz "ftp://mirbase.org/pub/mirbase/CURRENT/mature.fa.gz" # gunzip hsa_mature.fa.gz mature_mirnas_fa="hsa_mature.fa" # Known precursor miRNAs in FASTA format (e.g., from miRBase) # Example: wget -O hsa_precursor.fa.gz "ftp://mirbase.org/pub/mirbase/CURRENT/hairpin.fa.gz" # gunzip hsa_precursor.fa.gz precursor_mirnas_fa="hsa_precursor.fa" # Species name (e.g., hsa for Homo sapiens) species_name="hsa" # --- Run miRDeep2.pl to identify and quantify microRNAs --- # This command will predict novel miRNAs and quantify known miRNAs. # miRDeep2.pl will internally call mapper.pl to align reads to the genome and known miRNAs. # The output includes a comprehensive report and expression counts. # -v: verbose output # -P: number of threads # -d: output directory # -a: 3' adapter sequence (REQUIRED for trimming, replace with your actual adapter) # -h: minimum read length after trimming (e.g., 18) # -m: maximum read length after trimming (e.g., 26) # The output directory 'miRDeep2_output' will contain files like 'mirna_predictions.csv' and 'expression_analyses.csv' # Replace 'ADAPTER_SEQUENCE' with the actual 3' adapter sequence used in your small RNA-seq library preparation. # A common Illumina TruSeq Small RNA 3' adapter is: TGGAATTCTCGGGTGCCAAGGAACTCC miRDeep2.pl "$reads_fastq" "$genome_fa" "$mature_mirnas_fa" "$precursor_mirnas_fa" "$species_name" \ -v \ -P 8 \ -d miRDeep2_output \ -a TGGAATTCTCGGGTGCCAAGGAACTCC \ -h 18 \ -m 26 -
2
The numbers are then manually normalized into FPKM values
$ Bash example
# This script calculates FPKM values from raw read counts, gene lengths, and total mapped reads. # It is a placeholder for a custom script often used for "manual normalization". # --- Placeholder for calculate_fpkm.py content --- # #!/usr/bin/env python # import pandas as pd # import sys # def calculate_fpkm(counts_file, gene_lengths_file, total_mapped_reads_file, output_file): # # Load counts (e.g., from featureCounts output) # counts_df = pd.read_csv(counts_file, sep='\t', index_col=0) # # Load gene lengths (e.g., derived from a GTF file) # gene_lengths_df = pd.read_csv(gene_lengths_file, sep='\t', index_col=0) # # Load total mapped reads (e.g., from STAR Log.final.out) # with open(total_mapped_reads_file, 'r') as f: # total_mapped_reads = float(f.read().strip()) # # Ensure indices match for common genes # common_genes = counts_df.index.intersection(gene_lengths_df.index) # counts_df = counts_df.loc[common_genes] # gene_lengths_df = gene_lengths_df.loc[common_genes] # fpkm_df = pd.DataFrame(index=counts_df.index) # # Iterate through samples (columns in counts_df) # for col in counts_df.columns: # read_counts = counts_df[col] # # Assuming 'length_bp' is the column name for lengths in gene_lengths_df # gene_lengths = gene_lengths_df['length_bp'] # # Avoid division by zero for genes with 0 length # gene_lengths_safe = gene_lengths.replace(0, 1) # # FPKM calculation: (num_reads * 10^9) / (gene_length_bp * total_mapped_reads) # fpkm_values = (read_counts * 10**9) / (gene_lengths_safe * total_mapped_reads) # fpkm_df[col] = fpkm_values # fpkm_df.to_csv(output_file, sep='\t') # if __name__ == "__main__": # if len(sys.argv) != 5: # print("Usage: python calculate_fpkm.py <counts_file> <gene_lengths_file> <total_mapped_reads_file> <output_file>") # sys.exit(1) # counts_file = sys.argv[1] # gene_lengths_file = sys.argv[2] # total_mapped_reads_file = sys.argv[3] # output_file = sys.argv[4] # calculate_fpkm(counts_file, gene_lengths_file, total_mapped_reads_file, output_file) # --------------------------------------------------- # Example input files (these would be generated by previous steps in a pipeline): # raw_counts.tsv: Tab-separated file with gene IDs and raw read counts per sample. # e.g., gene_id\tsample1\tsample2\nENSG000...\t100\t150 # gene_lengths.tsv: Tab-separated file with gene IDs and their lengths in base pairs. # (Derived from a genome annotation GTF/GFF file, e.g., Homo_sapiens.GRCh38.109.gtf) # e.g., gene_id\tlength_bp\nENSG000...\t1234 # total_mapped_reads.txt: A file containing a single number representing the total mapped reads for the sample. # e.g., 50000000 # Make the custom FPKM calculation script executable (assuming it's named calculate_fpkm.py) # chmod +x calculate_fpkm.py # Execute the FPKM calculation script ./calculate_fpkm.py raw_counts.tsv gene_lengths.tsv total_mapped_reads.txt fpkm_normalized_counts.tsv
Raw Source Text
miRDeep2 0.0.8 was used to count the number of microRNAs The numbers are then manually normalized into FPKM values Genome_build: mm10 Supplementary_files_format_and_content: tab-delimited text files include FPKM values