GSE277082 Processing Pipeline — Yeo Lab Publications

Publication

Neuronal aging causes mislocalization of splicing proteins and unchecked cellular stress.

Nature neuroscience (2025) — PMID 40456907

Dataset

Aging-linked deterioration of RNA metabolism destabilizes the stress response of neurons [RNASeq, RiboSeq]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

1

Riboseq data was analyzed as described previously; see Tirosh et al.

Bowtie (Inferred with models/gemini-2.5-flash) v1.1.1 GitHub

$ Bash example

# Install Bowtie
# conda install -c bioconda bowtie=1.1.1

# Install samtools (required for BAM conversion and sorting)
# conda install -c bioconda samtools

# Define paths and files
GENOME_DIR="genome/sacCer3"
GENOME_FA="${GENOME_DIR}/sacCer3.fa"
GENOME_INDEX_PREFIX="${GENOME_DIR}/sacCer3_index"
RRNA_FA="${GENOME_DIR}/sacCer3_rRNA.fa" # Placeholder for rRNA sequences
RRNA_INDEX_PREFIX="${GENOME_DIR}/sacCer3_rRNA_index"

INPUT_FASTQ="riboseq_reads.fastq" # Replace with actual input file
OUTPUT_FASTQ_GENOME="riboseq_rRNA_unmapped.fastq"
OUTPUT_SAM_GENOME="riboseq_genome_mapped.sam"
OUTPUT_BAM_GENOME="riboseq_genome_mapped.bam"
OUTPUT_SORTED_BAM="riboseq_genome_mapped.sorted.bam"
OUTPUT_P_SITE_COUNTS="riboseq_p_site_counts.tsv" # Example output for custom script

# --- Reference Data Preparation ---
# Download S. cerevisiae genome (sacCer3) from UCSC
# mkdir -p ${GENOME_DIR}
# wget -P ${GENOME_DIR} http://hgdownload.soe.ucsc.edu/goldenPath/sacCer3/bigZips/sacCer3.fa.gz
# gunzip ${GENOME_FA}.gz

# Obtain S. cerevisiae rRNA sequences (e.g., from NCBI or specific genome annotations)
# For example, extract rRNA from sacCer3 GFF/GTF or download separately.
# Example: echo ">rRNA_sequence" > ${RRNA_FA}
# Example: echo "AGCTAGCT..." >> ${RRNA_FA}

# Build Bowtie indices for genome and rRNA
# bowtie-build ${GENOME_FA} ${GENOME_INDEX_PREFIX}
# bowtie-build ${RRNA_FA} ${RRNA_INDEX_PREFIX}

# --- Riboseq Data Analysis Pipeline (as per Tirosh et al. 2016) ---

# 1. Align reads to rRNA and extract unmapped reads
# Reads mapping to rRNA are discarded.
# -S: output SAM format
# -v 2: allow up to 2 mismatches (as per Tirosh et al. for genome, applied here for rRNA too)
# --un: write reads that did not align to a file
bowtie -S -v 2 --un ${OUTPUT_FASTQ_GENOME} ${RRNA_INDEX_PREFIX} ${INPUT_FASTQ} > /dev/null # Discard rRNA mapped SAM

# 2. Align rRNA-unmapped reads to the genome
# -S: output SAM format
# -v 2: allow up to 2 mismatches
bowtie -S -v 2 ${GENOME_INDEX_PREFIX} ${OUTPUT_FASTQ_GENOME} ${OUTPUT_SAM_GENOME}

# 3. Convert SAM to BAM, sort, and index
samtools view -bS ${OUTPUT_SAM_GENOME} > ${OUTPUT_BAM_GENOME}
samtools sort ${OUTPUT_BAM_GENOME} -o ${OUTPUT_SORTED_BAM}
samtools index ${OUTPUT_SORTED_BAM}

# 4. P-site inference and ORF assignment (Custom script based on Tirosh et al. methods)
# Tirosh et al. (2016) methods:
# "The 5′ ends of reads were mapped to the genome, and the P-site was inferred by adding 15 nucleotides to the 5′ end of reads 28–30 nucleotides long, and 16 nucleotides to the 5′ end of reads 31–33 nucleotides long. Reads were then assigned to ORFs based on their P-site position."
# This step requires custom scripting (e.g., Python, R, Perl) to parse the BAM file,
# filter reads by length, adjust 5' end coordinates to infer P-sites, and then
# intersect these P-sites with a gene/ORF annotation file (e.g., GTF/GFF).
#
# Example placeholder for a custom script execution:
# python custom_riboseq_p_site_analysis.py \
#   --input_bam ${OUTPUT_SORTED_BAM} \
#   --orf_annotation /path/to/saccharomyces_cerevisiae.gtf \
#   --output_file ${OUTPUT_P_SITE_COUNTS}

View on GitHub

2

2015 doi:10.1371/journal.ppat.1005288

STAR (Inferred with models/gemini-2.5-flash) v2.5.2b GitHub

$ Bash example

# Install STAR (example using conda)
# conda install -c bioconda star=2.5.2b

# Define variables
FASTQ_FILE="sample_R1.fastq.gz" # Placeholder for input FASTQ file
OUTPUT_PREFIX="sample_aligned" # Prefix for output files
NUM_THREADS=8 # Number of threads to use

# Reference genome directory (hg38 as a placeholder)
# This directory should contain the STAR genome index files (SA, SAindex, genome, etc.)
# To generate a STAR genome index:
# STAR --runThreadN ${NUM_THREADS} --runMode genomeGenerate \
#      --genomeDir /path/to/STAR_INDEX_DIR/hg38 \
#      --genomeFastaFiles /path/to/hg38.fa \
#      --sjdbGTFfile /path/to/hg38.gtf \
#      --sjdbOverhang 100 # Adjust sjdbOverhang based on read length - 1

STAR_INDEX_DIR="/path/to/STAR_INDEX_DIR/hg38" # Placeholder for STAR genome index directory

# Run STAR alignment for eCLIP data
STAR --runThreadN ${NUM_THREADS} \
     --genomeDir ${STAR_INDEX_DIR} \
     --readFilesIn ${FASTQ_FILE} \
     --outFileNamePrefix ${OUTPUT_PREFIX} \
     --outSAMtype BAM SortedByCoordinate \
     --outSAMattributes All \
     --outSAMunmapped Within \
     --outFilterMultimapNmax 1 \
     --outFilterMismatchNmax 3 \
     --outFilterMismatchNoverLmax 0.1 \
     --alignIntronMin 20 \
     --alignIntronMax 1000000 \
     --alignMatesGapMax 1000000 \
     --alignSJoverhangMin 8 \
     --alignSJDBoverhangMin 1 \
     --sjdbScore 1 \
     --readFilesCommand zcat

View on GitHub