GSE276985 Processing Pipeline

RIP-Seq code_examples 2 steps

Publication

Neuronal aging causes mislocalization of splicing proteins and unchecked cellular stress.

Nature neuroscience (2025) — PMID 40456907

Dataset

Aging-linked deterioration of RNA metabolism destabilizes the stress response of neurons [eCLIP-seq]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

eCLIP data was analyzed using Skipper with default settings.

Skipper vv0.1.0 GitHub

$ Bash example

# Install Skipper (if not already installed)
# conda install -c bioconda skipper

# Skipper requires a configuration file (config.yaml) to specify input files, genome references, and other parameters.
# For "default settings", ensure your config.yaml is set up with standard paths and parameters.
# Example minimal config.yaml content (adjust paths and parameters as needed):
# wildcards:
#   rbp: ["RBP_NAME"]
#   replicate: ["rep1"]
# # Placeholder for human GRCh38 genome fasta and GTF annotation
# genome_fasta: "/path/to/human/GRCh38/genome.fa"
# gtf: "/path/to/human/GRCh38/annotation.gtf"
# # ... other default Skipper parameters ...

# Run Skipper using the specified configuration file.
# The --cores parameter can be adjusted based on available computational resources.
skipper run --configfile config.yaml --cores 8

View on GitHub

The kDa of each ORF was matched to the closest input sample beginning at that range; for example, a 78 kDa ORF would be paired with the input sample covering the range of 75 to 150 kDa.

Custom script (Inferred with models/gemini-2.5-flash) vN/A

$ Bash example

# Create dummy input files for demonstration
echo -e "ORF1\t78.0\nORF2\t120.5\nORF3\t60.0\nORF4\t160.0" > orf_data.tsv
echo -e "SampleA\t75.0\t150.0\nSampleB\t50.0\t74.9\nSampleC\t150.1\t200.0" > sample_ranges.tsv

# Python script to perform the matching logic
python3 -c '
import sys

def match_orf_to_sample_range(orf_data_file, sample_ranges_file, output_file):
    """
    Matches ORFs by kDa to the closest input sample range.
    Input files are assumed to be tab-separated:
    - orf_data_file: ORF_ID\tkDa
    - sample_ranges_file: Sample_ID\tMin_kDa\tMax_kDa
    Output file: ORF_ID\tkDa\tMatched_Sample_ID
    """
    sample_ranges = [] # List of (min_kDa, max_kDa, sample_id)
    try:
        with open(sample_ranges_file, "r") as f:
            for line in f:
                parts = line.strip().split("\t")
                if len(parts) == 3:
                    sample_id, min_str, max_str = parts
                    min_kDa = float(min_str)
                    max_kDa = float(max_str)
                    sample_ranges.append((min_kDa, max_kDa, sample_id))
                else:
                    sys.stderr.write(f"Warning: Skipping malformed line in {sample_ranges_file}: {line.strip()}\n")
    except FileNotFoundError:
        sys.stderr.write(f"Error: Sample ranges file not found: {sample_ranges_file}\n")
        sys.exit(1)
    except ValueError:
        sys.stderr.write(f"Error: Invalid number format in {sample_ranges_file}\n")
        sys.exit(1)

    try:
        with open(orf_data_file, "r") as orf_f, open(output_file, "w") as out_f:
            out_f.write("ORF_ID\tkDa\tMatched_Sample_ID\n")
            for line in orf_f:
                parts = line.strip().split("\t")
                if len(parts) == 2:
                    orf_id, kDa_str = parts
                    orf_kDa = float(kDa_str)
                    
                    matched_sample_id = "N/A"
                    # Find the range where orf_kDa falls (inclusive range based on example)
                    for min_r, max_r, sample_id_r in sample_ranges:
                        if min_r <= orf_kDa <= max_r:
                            matched_sample_id = sample_id_r
                            break # Assuming each ORF matches only one range
                    
                    out_f.write(f"{orf_id}\t{orf_kDa}\t{matched_sample_id}\n")
                else:
                    sys.stderr.write(f"Warning: Skipping malformed line in {orf_data_file}: {line.strip()}\n")
    except FileNotFoundError:
        sys.stderr.write(f"Error: ORF data file not found: {orf_data_file}\n")
        sys.exit(1)
    except ValueError:
        sys.stderr.write(f"Error: Invalid number format in {orf_data_file}\n")
        sys.exit(1)

if __name__ == "__main__":
    # Define input and output filenames
    ORF_DATA_FILE = "orf_data.tsv"
    SAMPLE_RANGES_FILE = "sample_ranges.tsv"
    OUTPUT_FILE = "matched_orfs.tsv"
    
    match_orf_to_sample_range(ORF_DATA_FILE, SAMPLE_RANGES_FILE, OUTPUT_FILE)
'

# Display the output
cat matched_orfs.tsv

Tools Used

Skipper

Raw Source Text

eCLIP data was analyzed using Skipper with default settings. The kDa of each ORF was matched to the closest input sample beginning at that range; for example, a 78 kDa ORF would be paired with the input sample covering the range of 75 to 150 kDa.
Assembly: hg38/mm10
Supplementary files format and content: Reproducible enriched peaks output by Skipper

← Back to Analysis