GSE232518 Processing Pipeline

RNA-Seq code_examples 3 steps

Publication

High-sensitivity in situ capture of endogenous RNA-protein interactions in fixed cells and primary tissues.

Nature communications (2024) — PMID 39152130

Dataset

Expanded repertoire of RNA-editing-based detection for RNA binding protein interactions (4)

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

remove adapter with Cutadapt

cutadapt v4.0 GitHub

$ Bash example

# Installation (example using conda)
# conda install -c bioconda cutadapt=4.0

# Command to remove adapters
# The adapter sequence is a common Illumina TruSeq adapter, inferred from eCLIP workflows.
# --minimum-length 18 is a common setting to discard very short reads after trimming.
# --cores 8 is a placeholder for parallel processing.
cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -o output.trimmed.fastq.gz --minimum-length 18 --cores 8 input.fastq.gz

View on GitHub

align to genome using STAR 2.7.6

STAR v2.7.6 GitHub

$ Bash example

# Install STAR (example using conda)
# conda install -c bioconda star=2.7.6

# Define variables
# Replace with actual paths and filenames
GENOME_DIR="/path/to/STAR_genome_index/hg38" # Path to the STAR genome index (e.g., for human hg38)
READ1_FASTQ="sample_R1.fastq.gz" # Input FASTQ file for Read 1
READ2_FASTQ="sample_R2.fastq.gz" # Input FASTQ file for Read 2 (remove if single-end)
OUTPUT_PREFIX="sample_aligned" # Prefix for output files
THREADS=8 # Number of threads to use

# Run STAR alignment
STAR --genomeDir "${GENOME_DIR}" \
     --readFilesIn "${READ1_FASTQ}" "${READ2_FASTQ}" \
     --runThreadN "${THREADS}" \
     --outFileNamePrefix "${OUTPUT_PREFIX}_" \
     --outSAMtype BAM SortedByCoordinate \
     --outSAMattributes Standard \
     --quantMode GeneCounts \
     --readFilesCommand zcat

View on GitHub

calculate edits per read (EPR)

Custom script (Inferred with models/gemini-2.5-flash) vN/A GitHub

$ Bash example

# Placeholder for a custom script to calculate edits per read (EPR).
# This step typically follows RNA variant calling (e.g., using GATK or samtools mpileup).
# The script would process a VCF file (containing identified RNA editing sites) and
# potentially the corresponding BAM file to count reads supporting edits and calculate the EPR metric.

# Define input and output files (example paths)
INPUT_VCF="path/to/sample_variants.vcf"
INPUT_BAM="path/to/sample_aligned.bam"
OUTPUT_EPR_FILE="path/to/sample_epr_results.tsv"
REFERENCE_GENOME="path/to/hg38.fa" # Using hg38 as a placeholder for the latest human reference genome

# Example command for a hypothetical custom script
# This script would iterate through identified editing sites in the VCF,
# extract reads covering those sites from the BAM, and calculate the proportion
# of reads containing the edit, or the average number of edits per read.
python calculate_epr.py \
    --vcf "${INPUT_VCF}" \
    --bam "${INPUT_BAM}" \
    --reference "${REFERENCE_GENOME}" \
    --output "${OUTPUT_EPR_FILE}" \
    --min-coverage 10 \
    --min-edit-frequency 0.05 \
    --output-format tsv

View on GitHub

Tools Used

STAR

Raw Source Text

remove adapter with Cutadapt
align to genome using STAR 2.7.6
calculate edits per read (EPR)
Assembly: GRCh38
Supplementary files format and content: epr_volcano_plot_values.tsv
Supplementary files format and content: epr_cds_3_fusion_joined.tsv

← Back to Analysis