GSE125490 Processing Pipeline

RNA-Seq code_examples 2 steps

Publication

Glial cells maintain synapses by inhibiting an activity-dependent retrograde protease signal.

PLoS genetics (2019) — PMID 30870413

Dataset

GSE125490

Glial cells maintain synapses by inhibiting an activity-dependent retrograde protease signal

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Sequenced reads were trimmed for adaptor sequence, and masked for low-complexity or low-quality sequence, then mapped to mm9 whole genome using STAR aligner with the default settings

    $ Bash example
    # Install STAR (e.g., using conda)
    # conda install -c bioconda star
    
    # --- Reference Genome Setup (mm9) ---
    # Download mm9 genome fasta (example from UCSC)
    # wget https://hgdownload.soe.ucsc.edu/goldenPath/mm9/bigZips/mm9.fa.gz
    # gunzip mm9.fa.gz
    #
    # Define directory for STAR index
    # GENOME_DIR="mm9_star_index"
    #
    # Create STAR index for mm9 (adjust --runThreadN based on available cores)
    # STAR --runMode genomeGenerate \
    #      --genomeDir ${GENOME_DIR} \
    #      --genomeFastaFiles mm9.fa \
    #      --runThreadN 8
    
    # --- Alignment Step ---
    # Define input and output variables
    READS="input_reads.fastq.gz" # Replace with your actual input FASTQ file
    OUTPUT_PREFIX="aligned_mm9" # Prefix for output files
    GENOME_DIR="mm9_star_index" # Path to your pre-built STAR index for mm9
    
    # Placeholder for trimming adaptor sequences and masking low-complexity/low-quality sequence.
    # The specific tool for this pre-processing step (e.g., fastp, Trimmomatic) was not specified.
    # Example (using fastp):
    # fastp -i ${RAW_READS} -o ${TRIMMED_READS} --trim_poly_g --trim_poly_x --detect_adapter_for_pe --qualified_quality_phred 15 --length_required 30
    
    # Run STAR alignment with default settings to mm9 whole genome
    # --runThreadN is an example; adjust based on available cores
    STAR --genomeDir ${GENOME_DIR} \
         --readFilesIn ${READS} \
         --outFileNamePrefix ${OUTPUT_PREFIX} \
         --outSAMtype BAM SortedByCoordinate \
         --runThreadN 8
  2. 2

    Reads counts for each gene were generated and were further used to compute the rpkm values

    RSEM (Inferred with models/gemini-2.5-flash) v1.3.3 GitHub
    $ Bash example
    # Install RSEM (if not already installed)
    # conda install -c bioconda rsem
    
    # --- Placeholder for reference data ---
    # Replace with actual paths to your indexed genome FASTA and gene annotation GTF.
    # For human, common choices are GRCh38/hg38 and Gencode annotations.
    # Example:
    # GENOME_FASTA="/path/to/human/GRCh38.primary_assembly.genome.fa"
    # GENE_GTF="/path/to/human/gencode.v38.annotation.gtf"
    # RSEM_INDEX_PREFIX="/path/to/rsem_index/GRCh38_gencode_v38"
    
    # --- Build RSEM index (run once per reference genome/annotation combination) ---
    # This step prepares the reference files for RSEM quantification.
    # rsem-prepare-reference --gtf "${GENE_GTF}" "${GENOME_FASTA}" "${RSEM_INDEX_PREFIX}"
    
    # --- Actual RSEM calculation for gene counts and FPKM/RPKM values ---
    # INPUT_BAM: Path to the alignment file (BAM format) for a given sample.
    # RSEM_INDEX_PREFIX: Path to the pre-built RSEM index.
    # OUTPUT_PREFIX: Base name for output files (e.g., sample_name).
    # RSEM will generate files like:
    #   - ${OUTPUT_PREFIX}.genes.results (contains gene-level counts, FPKM, TPM)
    #   - ${OUTPUT_PREFIX}.isoforms.results (contains isoform-level counts, FPKM, TPM)
    #   - ${OUTPUT_PREFIX}.stat (alignment statistics)
    
    INPUT_BAM="path/to/your_sample_aligned.bam"
    RSEM_INDEX_PREFIX="path/to/your_rsem_index" # e.g., /path/to/rsem_index/GRCh38_gencode_v38
    OUTPUT_PREFIX="your_sample_quantification"
    
    rsem-calculate-expression \
        --bam \
        --paired-end \
        --no-qualities \
        --forward-prob 0.5 \
        --num-threads 8 \
        "${INPUT_BAM}" \
        "${RSEM_INDEX_PREFIX}" \
        "${OUTPUT_PREFIX}"
    
    # The RPKM values will be found in the 'FPKM' column of the ${OUTPUT_PREFIX}.genes.results file.
    # For single-end data, FPKM is equivalent to RPKM. For paired-end data, FPKM refers to Fragments Per Kilobase per Million mapped fragments.
    # Example of how to extract RPKM for all genes:
    # awk 'NR > 1 {print $1, $7}' "${OUTPUT_PREFIX}.genes.results" > "${OUTPUT_PREFIX}.genes.rpkm.tsv"

Tools Used

Raw Source Text
Sequenced reads were trimmed for adaptor sequence, and masked for low-complexity or low-quality sequence, then mapped to mm9 whole genome using STAR aligner with the default settings
Reads counts for each gene were generated and were further used to compute the rpkm values
Genome_build: mm9
Supplementary_files_format_and_content: tab-delimited text files include RPKM values for each Sample ...
← Back to Analysis