GSE145480 Processing Pipeline

RNA-Seq code_examples 11 steps

Publication

Zfp697 is an RNA-binding protein that regulates skeletal muscle inflammation and remodeling.

Proceedings of the National Academy of Sciences of the United States of America (2024) — PMID 39141348

Dataset

Rodents as models of human sarcopenia: a comparative analysis reveals conserved modulators of aging-dependent muscle loss

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

1

RNA-seq reads were subjected to 3â adapter and poly(A)/poly(T) tail trimming using Cutadapt v1.9.1 from Martin et al., EMBnet.journal, 2011.

cutadapt v1.9.1
2

The following 3â adapter sequences were utilized for generating RNA-Seq libraries and subsequently trimmed: mate1 5â-AGATCGGAAGAGCACACGTC-3â, mate2 5â-AGATCGGAAGAGCGTCGTGT-3â.

RNA-seq
3

Reads shorter than 30 nucleotides were discarded.
4

The kallisto v0.43.1 software from Bray et al., Nature Biotechnology, 2016 was used for building transcriptome index and aligning filtered reads.

kallisto v0.43.1
5

The default options of kallisto were utilized for building transcriptome index.

kallisto
6

For aligning filtered reads we used options â--rf-strandedâ and â--pseudobamâ.
7

Mapped reads were assigned to transcripts in a weighted manner: if a read was uniquely mapped to a transcript, then the transcriptâs read count was incremented by 1; if a read was mapped to n different transcripts, each transcriptâs read count was incremented by 1/n.
8

The expression of each transcript was estimated in transcripts per million (TPM) units by dividing its read count by the transcript length and normalizing to the library size.
9

The expression of a gene was obtained by summing up the normalized expression of the transcripts associated with it.
10

For every gene, read counts of transcripts associated with this gene were also summed up and further used for the differential expression analysis.
11

As the reference mouse transcriptome, we considered sequences of protein coding transcripts with the support level 1-3 based on genome assembly GRCm38 (release 92) and transcript annotations from Ensembl database (see Hubbard et al., Nucleic Acids Research, 2002).

Ensembl

Tools Used

RNA-seq

Raw Source Text

RNA-seq reads were subjected to 3â adapter and poly(A)/poly(T) tail trimming using Cutadapt v1.9.1 from Martin et al., EMBnet.journal, 2011. The following 3â adapter sequences were utilized for generating RNA-Seq libraries and subsequently trimmed: mate1 5â-AGATCGGAAGAGCACACGTC-3â, mate2 5â-AGATCGGAAGAGCGTCGTGT-3â.
Reads shorter than 30 nucleotides were discarded.
The kallisto v0.43.1 software from Bray et al., Nature Biotechnology, 2016 was used for building transcriptome index and aligning filtered reads. The default options of kallisto were utilized for building transcriptome index. For aligning filtered reads we used options â--rf-strandedâ and â--pseudobamâ.
Mapped reads were assigned to transcripts in a weighted manner: if a read was uniquely mapped to a transcript, then the transcriptâs read count was incremented by 1; if a read was mapped to n different transcripts, each transcriptâs read count was incremented by 1/n.
The expression of each transcript was estimated in transcripts per million (TPM) units by dividing its read count by the transcript length and normalizing to the library size. The expression of a gene was obtained by summing up the normalized expression of the transcripts associated with it. For every gene, read counts of transcripts associated with this gene were also summed up and further used for the differential expression analysis.
As the reference mouse transcriptome, we considered sequences of protein coding transcripts with the support level 1-3 based on genome assembly GRCm38 (release 92) and transcript annotations from Ensembl database (see Hubbard et al., Nucleic Acids Research, 2002).
Genome_build: GRCm38
Supplementary_files_format_and_content: counts_mouse_aging_timecourse.txt: Tab-delimited text file includes raw counts at the gene level for each sample.
Supplementary_files_format_and_content: tpms_mouse_aging_timecourse.txt: Tab-delimited text file includes TPM values at the gene level for each sample.

← Back to Analysis