GSE276985 Processing Pipeline

RIP-Seq code_examples 2 steps

Publication

Neuronal aging causes mislocalization of splicing proteins and unchecked cellular stress.

Nature neuroscience (2025) — PMID 40456907

Dataset

GSE276985

Aging-linked deterioration of RNA metabolism destabilizes the stress response of neurons [eCLIP-seq]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    eCLIP data was analyzed using Skipper with default settings.

    $ Bash example
    # Install Skipper (if not already installed)
    # conda install -c bioconda skipper
    
    # Skipper requires a configuration file (config.yaml) to specify input files, genome references, and other parameters.
    # For "default settings", ensure your config.yaml is set up with standard paths and parameters.
    # Example minimal config.yaml content (adjust paths and parameters as needed):
    # wildcards:
    #   rbp: ["RBP_NAME"]
    #   replicate: ["rep1"]
    # # Placeholder for human GRCh38 genome fasta and GTF annotation
    # genome_fasta: "/path/to/human/GRCh38/genome.fa"
    # gtf: "/path/to/human/GRCh38/annotation.gtf"
    # # ... other default Skipper parameters ...
    
    # Run Skipper using the specified configuration file.
    # The --cores parameter can be adjusted based on available computational resources.
    skipper run --configfile config.yaml --cores 8
  2. 2

    The kDa of each ORF was matched to the closest input sample beginning at that range; for example, a 78 kDa ORF would be paired with the input sample covering the range of 75 to 150 kDa.

    Custom script (Inferred with models/gemini-2.5-flash) vN/A
    $ Bash example
    # Create dummy input files for demonstration
    echo -e "ORF1\t78.0\nORF2\t120.5\nORF3\t60.0\nORF4\t160.0" > orf_data.tsv
    echo -e "SampleA\t75.0\t150.0\nSampleB\t50.0\t74.9\nSampleC\t150.1\t200.0" > sample_ranges.tsv
    
    # Python script to perform the matching logic
    python3 -c '
    import sys
    
    def match_orf_to_sample_range(orf_data_file, sample_ranges_file, output_file):
        """
        Matches ORFs by kDa to the closest input sample range.
        Input files are assumed to be tab-separated:
        - orf_data_file: ORF_ID\tkDa
        - sample_ranges_file: Sample_ID\tMin_kDa\tMax_kDa
        Output file: ORF_ID\tkDa\tMatched_Sample_ID
        """
        sample_ranges = [] # List of (min_kDa, max_kDa, sample_id)
        try:
            with open(sample_ranges_file, "r") as f:
                for line in f:
                    parts = line.strip().split("\t")
                    if len(parts) == 3:
                        sample_id, min_str, max_str = parts
                        min_kDa = float(min_str)
                        max_kDa = float(max_str)
                        sample_ranges.append((min_kDa, max_kDa, sample_id))
                    else:
                        sys.stderr.write(f"Warning: Skipping malformed line in {sample_ranges_file}: {line.strip()}\n")
        except FileNotFoundError:
            sys.stderr.write(f"Error: Sample ranges file not found: {sample_ranges_file}\n")
            sys.exit(1)
        except ValueError:
            sys.stderr.write(f"Error: Invalid number format in {sample_ranges_file}\n")
            sys.exit(1)
    
        try:
            with open(orf_data_file, "r") as orf_f, open(output_file, "w") as out_f:
                out_f.write("ORF_ID\tkDa\tMatched_Sample_ID\n")
                for line in orf_f:
                    parts = line.strip().split("\t")
                    if len(parts) == 2:
                        orf_id, kDa_str = parts
                        orf_kDa = float(kDa_str)
                        
                        matched_sample_id = "N/A"
                        # Find the range where orf_kDa falls (inclusive range based on example)
                        for min_r, max_r, sample_id_r in sample_ranges:
                            if min_r <= orf_kDa <= max_r:
                                matched_sample_id = sample_id_r
                                break # Assuming each ORF matches only one range
                        
                        out_f.write(f"{orf_id}\t{orf_kDa}\t{matched_sample_id}\n")
                    else:
                        sys.stderr.write(f"Warning: Skipping malformed line in {orf_data_file}: {line.strip()}\n")
        except FileNotFoundError:
            sys.stderr.write(f"Error: ORF data file not found: {orf_data_file}\n")
            sys.exit(1)
        except ValueError:
            sys.stderr.write(f"Error: Invalid number format in {orf_data_file}\n")
            sys.exit(1)
    
    if __name__ == "__main__":
        # Define input and output filenames
        ORF_DATA_FILE = "orf_data.tsv"
        SAMPLE_RANGES_FILE = "sample_ranges.tsv"
        OUTPUT_FILE = "matched_orfs.tsv"
        
        match_orf_to_sample_range(ORF_DATA_FILE, SAMPLE_RANGES_FILE, OUTPUT_FILE)
    '
    
    # Display the output
    cat matched_orfs.tsv

Tools Used

Raw Source Text
eCLIP data was analyzed using Skipper with default settings. The kDa of each ORF was matched to the closest input sample beginning at that range; for example, a 78 kDa ORF would be paired with the input sample covering the range of 75 to 150 kDa.
Assembly: hg38/mm10
Supplementary files format and content: Reproducible enriched peaks output by Skipper
← Back to Analysis