GSE125490 Processing Pipeline
RNA-Seq
code_examples
2 steps
Publication
Glial cells maintain synapses by inhibiting an activity-dependent retrograde protease signal.PLoS genetics (2019) — PMID 30870413
Dataset
GSE125490Glial cells maintain synapses by inhibiting an activity-dependent retrograde protease signal
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Sequenced reads were trimmed for adaptor sequence, and masked for low-complexity or low-quality sequence, then mapped to mm9 whole genome using STAR aligner with the default settings
$ Bash example
# Install STAR (e.g., using conda) # conda install -c bioconda star # --- Reference Genome Setup (mm9) --- # Download mm9 genome fasta (example from UCSC) # wget https://hgdownload.soe.ucsc.edu/goldenPath/mm9/bigZips/mm9.fa.gz # gunzip mm9.fa.gz # # Define directory for STAR index # GENOME_DIR="mm9_star_index" # # Create STAR index for mm9 (adjust --runThreadN based on available cores) # STAR --runMode genomeGenerate \ # --genomeDir ${GENOME_DIR} \ # --genomeFastaFiles mm9.fa \ # --runThreadN 8 # --- Alignment Step --- # Define input and output variables READS="input_reads.fastq.gz" # Replace with your actual input FASTQ file OUTPUT_PREFIX="aligned_mm9" # Prefix for output files GENOME_DIR="mm9_star_index" # Path to your pre-built STAR index for mm9 # Placeholder for trimming adaptor sequences and masking low-complexity/low-quality sequence. # The specific tool for this pre-processing step (e.g., fastp, Trimmomatic) was not specified. # Example (using fastp): # fastp -i ${RAW_READS} -o ${TRIMMED_READS} --trim_poly_g --trim_poly_x --detect_adapter_for_pe --qualified_quality_phred 15 --length_required 30 # Run STAR alignment with default settings to mm9 whole genome # --runThreadN is an example; adjust based on available cores STAR --genomeDir ${GENOME_DIR} \ --readFilesIn ${READS} \ --outFileNamePrefix ${OUTPUT_PREFIX} \ --outSAMtype BAM SortedByCoordinate \ --runThreadN 8 -
2
Reads counts for each gene were generated and were further used to compute the rpkm values
$ Bash example
# Install RSEM (if not already installed) # conda install -c bioconda rsem # --- Placeholder for reference data --- # Replace with actual paths to your indexed genome FASTA and gene annotation GTF. # For human, common choices are GRCh38/hg38 and Gencode annotations. # Example: # GENOME_FASTA="/path/to/human/GRCh38.primary_assembly.genome.fa" # GENE_GTF="/path/to/human/gencode.v38.annotation.gtf" # RSEM_INDEX_PREFIX="/path/to/rsem_index/GRCh38_gencode_v38" # --- Build RSEM index (run once per reference genome/annotation combination) --- # This step prepares the reference files for RSEM quantification. # rsem-prepare-reference --gtf "${GENE_GTF}" "${GENOME_FASTA}" "${RSEM_INDEX_PREFIX}" # --- Actual RSEM calculation for gene counts and FPKM/RPKM values --- # INPUT_BAM: Path to the alignment file (BAM format) for a given sample. # RSEM_INDEX_PREFIX: Path to the pre-built RSEM index. # OUTPUT_PREFIX: Base name for output files (e.g., sample_name). # RSEM will generate files like: # - ${OUTPUT_PREFIX}.genes.results (contains gene-level counts, FPKM, TPM) # - ${OUTPUT_PREFIX}.isoforms.results (contains isoform-level counts, FPKM, TPM) # - ${OUTPUT_PREFIX}.stat (alignment statistics) INPUT_BAM="path/to/your_sample_aligned.bam" RSEM_INDEX_PREFIX="path/to/your_rsem_index" # e.g., /path/to/rsem_index/GRCh38_gencode_v38 OUTPUT_PREFIX="your_sample_quantification" rsem-calculate-expression \ --bam \ --paired-end \ --no-qualities \ --forward-prob 0.5 \ --num-threads 8 \ "${INPUT_BAM}" \ "${RSEM_INDEX_PREFIX}" \ "${OUTPUT_PREFIX}" # The RPKM values will be found in the 'FPKM' column of the ${OUTPUT_PREFIX}.genes.results file. # For single-end data, FPKM is equivalent to RPKM. For paired-end data, FPKM refers to Fragments Per Kilobase per Million mapped fragments. # Example of how to extract RPKM for all genes: # awk 'NR > 1 {print $1, $7}' "${OUTPUT_PREFIX}.genes.results" > "${OUTPUT_PREFIX}.genes.rpkm.tsv"
Tools Used
Raw Source Text
Sequenced reads were trimmed for adaptor sequence, and masked for low-complexity or low-quality sequence, then mapped to mm9 whole genome using STAR aligner with the default settings Reads counts for each gene were generated and were further used to compute the rpkm values Genome_build: mm9 Supplementary_files_format_and_content: tab-delimited text files include RPKM values for each Sample ...