GSE145480 Processing Pipeline

RNA-Seq code_examples 11 steps

Publication

Zfp697 is an RNA-binding protein that regulates skeletal muscle inflammation and remodeling.

Proceedings of the National Academy of Sciences of the United States of America (2024) — PMID 39141348

Dataset

GSE145480

Rodents as models of human sarcopenia: a comparative analysis reveals conserved modulators of aging-dependent muscle loss

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    RNA-seq reads were subjected to 3’ adapter and poly(A)/poly(T) tail trimming using Cutadapt v1.9.1 from Martin et al., EMBnet.journal, 2011.

    cutadapt v1.9.1
  2. 2

    The following 3’ adapter sequences were utilized for generating RNA-Seq libraries and subsequently trimmed: mate1 5’-AGATCGGAAGAGCACACGTC-3’, mate2 5’-AGATCGGAAGAGCGTCGTGT-3’.

  3. 3

    Reads shorter than 30 nucleotides were discarded.

  4. 4

    The kallisto v0.43.1 software from Bray et al., Nature Biotechnology, 2016 was used for building transcriptome index and aligning filtered reads.

    kallisto v0.43.1
  5. 5

    The default options of kallisto were utilized for building transcriptome index.

    kallisto
  6. 6

    For aligning filtered reads we used options “--rf-stranded” and “--pseudobam”.

  7. 7

    Mapped reads were assigned to transcripts in a weighted manner: if a read was uniquely mapped to a transcript, then the transcript’s read count was incremented by 1; if a read was mapped to n different transcripts, each transcript’s read count was incremented by 1/n.

  8. 8

    The expression of each transcript was estimated in transcripts per million (TPM) units by dividing its read count by the transcript length and normalizing to the library size.

  9. 9

    The expression of a gene was obtained by summing up the normalized expression of the transcripts associated with it.

  10. 10

    For every gene, read counts of transcripts associated with this gene were also summed up and further used for the differential expression analysis.

  11. 11

    As the reference mouse transcriptome, we considered sequences of protein coding transcripts with the support level 1-3 based on genome assembly GRCm38 (release 92) and transcript annotations from Ensembl database (see Hubbard et al., Nucleic Acids Research, 2002).

    Ensembl

Tools Used

Raw Source Text
RNA-seq reads were subjected to 3’ adapter and poly(A)/poly(T) tail trimming using Cutadapt v1.9.1 from Martin et al., EMBnet.journal, 2011. The following 3’ adapter sequences were utilized for generating RNA-Seq libraries and subsequently trimmed: mate1 5’-AGATCGGAAGAGCACACGTC-3’, mate2 5’-AGATCGGAAGAGCGTCGTGT-3’.
Reads shorter than 30 nucleotides were discarded.
The kallisto v0.43.1 software from Bray et al., Nature Biotechnology, 2016 was used for building transcriptome index and aligning filtered reads. The default options of kallisto were utilized for building transcriptome index. For aligning filtered reads we used options “--rf-stranded” and “--pseudobam”.
Mapped reads were assigned to transcripts in a weighted manner: if a read was uniquely mapped to a transcript, then the transcript’s read count was incremented by 1; if a read was mapped to n different transcripts, each transcript’s read count was incremented by 1/n.
The expression of each transcript was estimated in transcripts per million (TPM) units by dividing its read count by the transcript length and normalizing to the library size. The expression of a gene was obtained by summing up the normalized expression of the transcripts associated with it. For every gene, read counts of transcripts associated with this gene were also summed up and further used for the differential expression analysis.
As the reference mouse transcriptome, we considered sequences of protein coding transcripts with the support level 1-3 based on genome assembly GRCm38 (release 92) and transcript annotations from Ensembl database (see Hubbard et al., Nucleic Acids Research, 2002).
Genome_build: GRCm38
Supplementary_files_format_and_content: counts_mouse_aging_timecourse.txt: Tab-delimited text file includes raw counts at the gene level for each sample.
Supplementary_files_format_and_content: tpms_mouse_aging_timecourse.txt: Tab-delimited text file includes TPM values at the gene level for each sample.
← Back to Analysis