GSE232518 Processing Pipeline

RNA-Seq code_examples 3 steps

Publication

High-sensitivity in situ capture of endogenous RNA-protein interactions in fixed cells and primary tissues.

Nature communications (2024) — PMID 39152130

Dataset

GSE232518

Expanded repertoire of RNA-editing-based detection for RNA binding protein interactions (4)

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    remove adapter with Cutadapt

    cutadapt v4.0 GitHub
    $ Bash example
    # Installation (example using conda)
    # conda install -c bioconda cutadapt=4.0
    
    # Command to remove adapters
    # The adapter sequence is a common Illumina TruSeq adapter, inferred from eCLIP workflows.
    # --minimum-length 18 is a common setting to discard very short reads after trimming.
    # --cores 8 is a placeholder for parallel processing.
    cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -o output.trimmed.fastq.gz --minimum-length 18 --cores 8 input.fastq.gz
  2. 2

    align to genome using STAR 2.7.6

    $ Bash example
    # Install STAR (example using conda)
    # conda install -c bioconda star=2.7.6
    
    # Define variables
    # Replace with actual paths and filenames
    GENOME_DIR="/path/to/STAR_genome_index/hg38" # Path to the STAR genome index (e.g., for human hg38)
    READ1_FASTQ="sample_R1.fastq.gz" # Input FASTQ file for Read 1
    READ2_FASTQ="sample_R2.fastq.gz" # Input FASTQ file for Read 2 (remove if single-end)
    OUTPUT_PREFIX="sample_aligned" # Prefix for output files
    THREADS=8 # Number of threads to use
    
    # Run STAR alignment
    STAR --genomeDir "${GENOME_DIR}" \
         --readFilesIn "${READ1_FASTQ}" "${READ2_FASTQ}" \
         --runThreadN "${THREADS}" \
         --outFileNamePrefix "${OUTPUT_PREFIX}_" \
         --outSAMtype BAM SortedByCoordinate \
         --outSAMattributes Standard \
         --quantMode GeneCounts \
         --readFilesCommand zcat
  3. 3

    calculate edits per read (EPR)

    Custom script (Inferred with models/gemini-2.5-flash) vN/A GitHub
    $ Bash example
    # Placeholder for a custom script to calculate edits per read (EPR).
    # This step typically follows RNA variant calling (e.g., using GATK or samtools mpileup).
    # The script would process a VCF file (containing identified RNA editing sites) and
    # potentially the corresponding BAM file to count reads supporting edits and calculate the EPR metric.
    
    # Define input and output files (example paths)
    INPUT_VCF="path/to/sample_variants.vcf"
    INPUT_BAM="path/to/sample_aligned.bam"
    OUTPUT_EPR_FILE="path/to/sample_epr_results.tsv"
    REFERENCE_GENOME="path/to/hg38.fa" # Using hg38 as a placeholder for the latest human reference genome
    
    # Example command for a hypothetical custom script
    # This script would iterate through identified editing sites in the VCF,
    # extract reads covering those sites from the BAM, and calculate the proportion
    # of reads containing the edit, or the average number of edits per read.
    python calculate_epr.py \
        --vcf "${INPUT_VCF}" \
        --bam "${INPUT_BAM}" \
        --reference "${REFERENCE_GENOME}" \
        --output "${OUTPUT_EPR_FILE}" \
        --min-coverage 10 \
        --min-edit-frequency 0.05 \
        --output-format tsv

Tools Used

Raw Source Text
remove adapter with Cutadapt
align to genome using STAR 2.7.6
calculate edits per read (EPR)
Assembly: GRCh38
Supplementary files format and content: epr_volcano_plot_values.tsv
Supplementary files format and content: epr_cds_3_fusion_joined.tsv
← Back to Analysis