GSE232518 Processing Pipeline
RNA-Seq
code_examples
3 steps
Publication
High-sensitivity in situ capture of endogenous RNA-protein interactions in fixed cells and primary tissues.Nature communications (2024) — PMID 39152130
Dataset
GSE232518Expanded repertoire of RNA-editing-based detection for RNA binding protein interactions (4)
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
remove adapter with Cutadapt
$ Bash example
# Installation (example using conda) # conda install -c bioconda cutadapt=4.0 # Command to remove adapters # The adapter sequence is a common Illumina TruSeq adapter, inferred from eCLIP workflows. # --minimum-length 18 is a common setting to discard very short reads after trimming. # --cores 8 is a placeholder for parallel processing. cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -o output.trimmed.fastq.gz --minimum-length 18 --cores 8 input.fastq.gz
-
2
align to genome using STAR 2.7.6
$ Bash example
# Install STAR (example using conda) # conda install -c bioconda star=2.7.6 # Define variables # Replace with actual paths and filenames GENOME_DIR="/path/to/STAR_genome_index/hg38" # Path to the STAR genome index (e.g., for human hg38) READ1_FASTQ="sample_R1.fastq.gz" # Input FASTQ file for Read 1 READ2_FASTQ="sample_R2.fastq.gz" # Input FASTQ file for Read 2 (remove if single-end) OUTPUT_PREFIX="sample_aligned" # Prefix for output files THREADS=8 # Number of threads to use # Run STAR alignment STAR --genomeDir "${GENOME_DIR}" \ --readFilesIn "${READ1_FASTQ}" "${READ2_FASTQ}" \ --runThreadN "${THREADS}" \ --outFileNamePrefix "${OUTPUT_PREFIX}_" \ --outSAMtype BAM SortedByCoordinate \ --outSAMattributes Standard \ --quantMode GeneCounts \ --readFilesCommand zcat -
3
calculate edits per read (EPR)
$ Bash example
# Placeholder for a custom script to calculate edits per read (EPR). # This step typically follows RNA variant calling (e.g., using GATK or samtools mpileup). # The script would process a VCF file (containing identified RNA editing sites) and # potentially the corresponding BAM file to count reads supporting edits and calculate the EPR metric. # Define input and output files (example paths) INPUT_VCF="path/to/sample_variants.vcf" INPUT_BAM="path/to/sample_aligned.bam" OUTPUT_EPR_FILE="path/to/sample_epr_results.tsv" REFERENCE_GENOME="path/to/hg38.fa" # Using hg38 as a placeholder for the latest human reference genome # Example command for a hypothetical custom script # This script would iterate through identified editing sites in the VCF, # extract reads covering those sites from the BAM, and calculate the proportion # of reads containing the edit, or the average number of edits per read. python calculate_epr.py \ --vcf "${INPUT_VCF}" \ --bam "${INPUT_BAM}" \ --reference "${REFERENCE_GENOME}" \ --output "${OUTPUT_EPR_FILE}" \ --min-coverage 10 \ --min-edit-frequency 0.05 \ --output-format tsv
Tools Used
Raw Source Text
remove adapter with Cutadapt align to genome using STAR 2.7.6 calculate edits per read (EPR) Assembly: GRCh38 Supplementary files format and content: epr_volcano_plot_values.tsv Supplementary files format and content: epr_cds_3_fusion_joined.tsv