GSE240326 Processing Pipeline

GSE code_examples 6 steps

Publication

High-sensitivity in situ capture of endogenous RNA-protein interactions in fixed cells and primary tissues.

Nature communications (2024) — PMID 39152130

Dataset

An in situ method for identification of transcriptome-wide protein-RNA interactions in cells

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

remove adapter with Cutadapt

cutadapt v4.1 GitHub

$ Bash example

# Install Cutadapt (if not already installed)
# conda install -c bioconda cutadapt=4.1

# Define input and output files
# Replace with your actual input FASTQ files
INPUT_R1="path/to/your/input_read1.fastq.gz"
INPUT_R2="path/to/your/input_read2.fastq.gz" # For paired-end reads. Remove if single-end.

# Replace with your desired output FASTQ files
OUTPUT_R1_TRIMMED="path/to/your/output_read1_trimmed.fastq.gz"
OUTPUT_R2_TRIMMED="path/to/your/output_read2_trimmed.fastq.gz" # For paired-end reads. Remove if single-end.

# Define a report file for Cutadapt's summary
REPORT_FILE="cutadapt_trimming_report.txt"

# Define adapter sequences
# These are common Illumina TruSeq adapters. You MUST replace these with the actual adapter sequences
# used in your library preparation. If you don't know them, you might need to auto-detect or consult
# your sequencing provider/library prep kit documentation.
# For single-end reads, typically only -a ADAPTER_R1 is needed.
# For paired-end reads, -a ADAPTER_R1 for read 1 and -A ADAPTER_R2 for read 2.
ADAPTER_R1="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" # Example: Illumina universal adapter
ADAPTER_R2="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT" # Example: Illumina index adapter (reverse complement of universal adapter)

# Run Cutadapt for paired-end reads
# Adjust parameters like --minimum-length, --quality-cutoff, --cores as needed.
# If processing single-end reads, remove -A, -p, and INPUT_R2.
cutadapt \
  -a "${ADAPTER_R1}" \
  -A "${ADAPTER_R2}" \
  -o "${OUTPUT_R1_TRIMMED}" \
  -p "${OUTPUT_R2_TRIMMED}" \
  --minimum-length 18 \
  --quality-cutoff 20 \
  --cores 8 \
  "${INPUT_R1}" "${INPUT_R2}" > "${REPORT_FILE}" 2>&1

# For single-end reads, the command would look like this:
# cutadapt \
#   -a "${ADAPTER_R1}" \
#   -o "${OUTPUT_R1_TRIMMED}" \
#   --minimum-length 18 \
#   --quality-cutoff 20 \
#   --cores 8 \
#   "${INPUT_R1}" > "${REPORT_FILE}" 2>&1

View on GitHub

align to hg38 using STAR 2.4.0 (Homo sapiens) or mm10 using STAR 2.5.2 (Mus musculus)

STAR v2.4.0 GitHub

$ Bash example

# Install STAR (if not already installed)
# conda install -c bioconda star=2.4.0

# Define input and output variables
# Replace with actual paths and filenames
READ1="input_R1.fastq.gz"
READ2="input_R2.fastq.gz" # Remove if single-end
OUTPUT_PREFIX="aligned_output"
NUM_THREADS=8 # Adjust as needed

# Define genome index paths
# Replace with actual paths to your STAR indices
# For Homo sapiens (hg38)
HG38_STAR_INDEX="/path/to/STAR_index/hg38"
# For Mus musculus (mm10)
MM10_STAR_INDEX="/path/to/STAR_index/mm10"

# --- Choose the appropriate genome index based on species ---
# For Homo sapiens (hg38):
GENOME_DIR="${HG38_STAR_INDEX}"
# For Mus musculus (mm10):
# GENOME_DIR="${MM10_STAR_INDEX}"

# Run STAR alignment
STAR --genomeDir "${GENOME_DIR}" \
     --readFilesIn "${READ1}" "${READ2}" \
     --runThreadN "${NUM_THREADS}" \
     --outFileNamePrefix "${OUTPUT_PREFIX}_" \
     --outSAMtype BAM SortedByCoordinate \
     --outSAMstrandField intronMotif \
     --outFilterMultimapNmax 20 \
     --alignSJDBoverhangMin 1 \
     --alignSJoverhangMin 8 \
     --alignIntronMin 20 \
     --alignIntronMax 1000000 \
     --alignMatesGapMax 1000000 \
     --outReadsUnmapped Fastx \
     --quantMode GeneCounts # Optional: if gene counts are desired, otherwise remove

View on GitHub

SAILOR analysis to call C-to-U edits and keep only sites with score >0.5 and edit fraction <80%

SAILOR v0.1.0

$ Bash example

# Install SAILOR (e.g., via conda)
# conda create -n sailor_env python=3.8
# conda activate sailor_env
# conda install -c bioconda sailor=0.1.0

# Define input and output files
# Replace 'aligned_reads.bam' with your actual input BAM file containing aligned RNA-seq reads.
INPUT_BAM="aligned_reads.bam"
# Replace with the path to your reference genome FASTA file (e.g., GRCh38).
REFERENCE_FASTA="path/to/human_genome/GRCh38.p13.genome.fa"
# Replace with the path to a VCF file of known SNPs for the reference genome (e.g., dbSNP for GRCh38).
KNOWN_SNPS_VCF="path/to/known_snps/dbSNP_153_GRCh38.vcf.gz"
# Define the output file for the filtered C-to-U editing sites.
OUTPUT_TSV="c_to_u_edits_filtered.tsv"

# Run SAILOR to call C-to-U edits and apply filtering criteria.
# --min-score 0.5: Filters for sites with an editing score greater than 0.5.
# --max-edit-fraction 0.8: Filters for sites where the edit fraction is less than 80% (0.8).
# --fasta: Specifies the reference genome FASTA file.
# --vcf: Specifies a VCF file of known SNPs to exclude from editing calls.
sailor call \
    --min-score 0.5 \
    --max-edit-fraction 0.8 \
    --fasta "${REFERENCE_FASTA}" \
    --vcf "${KNOWN_SNPS_VCF}" \
    "${INPUT_BAM}" \
    > "${OUTPUT_TSV}"

FLARE analysis to call C-to-U edit clusters

FLARE v0.1.0 GitHub

$ Bash example

# Clone the FLARE repository
# git clone https://github.com/yeolab/FLARE.git
# cd FLARE

# Install dependencies (if not already installed)
# pip install -r requirements.txt

# Example usage of FLARE to call C-to-U edit clusters
# Replace <input_bam>, <output_directory>, <reference_fasta>, and <gene_annotation> with actual paths.
# Reference datasets: GRCh38 is used as a placeholder for human genome.
# Gene annotation: A GTF file for GRCh38 is used as a placeholder.

# Define placeholder variables
INPUT_BAM="path/to/your/aligned.bam"
OUTPUT_DIR="flare_c_to_u_edits"
REFERENCE_FASTA="path/to/GRCh38.fa" # e.g., from Gencode or Ensembl
GENE_ANNOTATION="path/to/GRCh38.gtf" # e.g., from Gencode or Ensembl

# Create output directory if it doesn't exist
mkdir -p "${OUTPUT_DIR}"

# Execute FLARE analysis
# The description implies calling C-to-U edits, which is the default behavior of FLARE.
# Common parameters might include:
# --min_reads 5 (minimum reads supporting an edit)
# --min_edit_frac 0.1 (minimum fraction of reads supporting an edit)
# --min_coverage 10 (minimum coverage at a site)
# --min_base_qual 20 (minimum base quality)
# --min_map_qual 20 (minimum mapping quality)
# --blacklist (path to a blacklist BED file)
# --known_edits (path to a VCF of known edits for filtering)

python FLARE.py \
    -i "${INPUT_BAM}" \
    -o "${OUTPUT_DIR}" \
    -r "${REFERENCE_FASTA}" \
    -g "${GENE_ANNOTATION}"

View on GitHub

Intersect the edit clusters from 3 replicates, which yields "*confident_peaks.bed"

merge_peaks (Inferred with models/gemini-2.5-flash) vN/A GitHub

$ Bash example

# Install bedtools if not already installed, as it's a common dependency for intersection operations within pipelines like merge_peaks.
# conda install -c bioconda bedtools

# Assume input edit cluster BED files are:
# replicate1_edit_clusters.bed
# replicate2_edit_clusters.bed
# replicate3_edit_clusters.bed

# Intersect the edit clusters from the first two replicates
bedtools intersect -a replicate1_edit_clusters.bed -b replicate2_edit_clusters.bed > temp_intersect_1_2.bed

# Intersect the result with the third replicate to find regions common to all three
bedtools intersect -a temp_intersect_1_2.bed -b replicate3_edit_clusters.bed > confident_peaks.bed

# Clean up temporary file
rm temp_intersect_1_2.bed

View on GitHub

Subtract STAMP confident clusters to Buffer only control, which yields "*cleaned_confident_peaks.bed"

bedtools (Inferred with models/gemini-2.5-flash) vv2.30.0 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install bedtools (if not already installed)
# conda install -c bioconda bedtools

# Subtract Buffer only control regions from STAMP confident clusters
# This yields regions that are present in STAMP clusters but not in the control.
# Assuming 'stamp_confident_clusters.bed' contains the STAMP confident clusters
# and 'buffer_only_control.bed' contains the Buffer only control regions.
bedtools subtract -a stamp_confident_clusters.bed -b buffer_only_control.bed > cleaned_confident_peaks.bed

View on GitHub

Tools Used

STAR SAILOR

Raw Source Text

remove adapter with Cutadapt
align to hg38 using STAR 2.4.0 (Homo sapiens) or mm10 using STAR 2.5.2 (Mus musculus)
SAILOR analysis to call C-to-U edits and keep only sites with score >0.5 and edit fraction <80%
FLARE analysis to call C-to-U edit clusters
Intersect the edit clusters from 3 replicates, which yields "*confident_peaks.bed"
Subtract STAMP confident clusters to Buffer only control, which yields "*cleaned_confident_peaks.bed"
Assembly: hg38
Assembly: mm10
Supplementary files format and content: SAILOR step yields bed file: *0.5Score0.8Fraction.fastqTr.sorted.STARUnmapped.out.sorted.STARAligned.out.sorted.bam.combined.readfiltered.formatted.varfiltered.snpfiltered.ranked.bed
Supplementary files format and content: FLARE step yields .tsv file: "*merged_sorted_peaks.fdr_0.1.d_15.scored.tsv"
Supplementary files format and content: Intersection step yields .bed file:  "*confident_peaks.bed"
Supplementary files format and content: Subtraction step yields .bed file:  "*cleaned_confident_peaks.bed"

← Back to Analysis