GSE263371 Processing Pipeline

OTHER code_examples 6 steps

Publication

High-sensitivity in situ capture of endogenous RNA-protein interactions in fixed cells and primary tissues.

Nature communications (2024) — PMID 39152130

Dataset

An in situ method for identification of transcriptome-wide protein-RNA interactions in cells [isSTAMP]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

remove adapter with Cutadapt

cutadapt v4.4 GitHub

$ Bash example

# Install Cutadapt (if not already installed)
# conda install -c bioconda cutadapt

# Define input and output file paths
INPUT_R1="input_R1.fastq.gz" # Placeholder for your forward read input file
INPUT_R2="input_R2.fastq.gz" # Placeholder for your reverse read input file (if paired-end)
OUTPUT_R1="output_R1_trimmed.fastq.gz" # Placeholder for your forward read output file
OUTPUT_R2="output_R2_trimmed.fastq.gz" # Placeholder for your reverse read output file (if paired-end)

# Define adapter sequences. Replace with actual adapter sequences used in your library preparation.
# Common Illumina adapters (partial sequences often sufficient):
# ADAPTER_FWD="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA"
# ADAPTER_REV="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT"
# If you don't know the exact adapter, Cutadapt can sometimes auto-detect common Illumina adapters
# or you can provide a shorter, conserved part of the adapter.
# For example, a generic Illumina adapter sequence for 3' trimming:
ADAPTER_FWD="AGATCGGAAGAGC"
ADAPTER_REV="AGATCGGAAGAGC"

# Run Cutadapt to remove adapter sequences, perform quality trimming, and filter by length.
# -a ADAPTER_FWD: 3' adapter for forward reads
# -A ADAPTER_REV: 3' adapter for reverse reads (for paired-end data)
# -o: Output file for forward reads
# -p: Output file for reverse reads (for paired-end data)
# -q 20,20: Trim low-quality bases from 5' and 3' ends (quality cutoff 20)
# --minimum-length 25: Discard reads shorter than 25 bp after trimming

cutadapt -a ${ADAPTER_FWD} -A ${ADAPTER_REV} \
         -o ${OUTPUT_R1} -p ${OUTPUT_R2} \
         -q 20,20 --minimum-length 25 \
         ${INPUT_R1} ${INPUT_R2}

# For single-end reads, the command would be simpler:
# cutadapt -a ${ADAPTER_FWD} -o ${OUTPUT_R1} -q 20 --minimum-length 25 ${INPUT_R1}

View on GitHub

align to hg38 using STAR 2.4.0 (Homo sapiens) or mm10 using STAR 2.5.2 (Mus musculus)

STAR v2.4.0 GitHub

$ Bash example

# Install STAR (if not already installed)
# conda install -c bioconda star

# --- Setup Reference Genome (hg38 example) ---
# Replace with the actual path to your STAR genome index directory for hg38.
# This directory should contain files like Genome, SA, SAindex, etc.
# If you need to build the index, use a command like:
# STAR --runMode genomeGenerate --genomeDir /path/to/hg38_star_index --genomeFastaFiles /path/to/hg38.fa --sjdbGTFfile /path/to/gencode.vXX.annotation.gtf --runThreadN <num_threads>
GENOME_DIR="/path/to/star_index/hg38"

# --- Input Files ---
# Replace with your actual input FASTQ files
READ1="input_R1.fastq.gz"
READ2="input_R2.fastq.gz" # For paired-end reads. If single-end, remove READ2 and adjust --readFilesIn

# --- Output Prefix ---
OUTPUT_PREFIX="aligned_sample"

# --- Alignment Command ---
# This command aligns reads to hg38 using STAR 2.4.0
# For Mus musculus (mm10) alignment, you would typically use STAR 2.5.2 or later
# and point to an mm10 genome index.
STAR \
  --genomeDir "${GENOME_DIR}" \
  --readFilesIn "${READ1}" "${READ2}" \
  --readFilesCommand zcat \
  --outFileNamePrefix "${OUTPUT_PREFIX}_" \
  --outSAMtype BAM SortedByCoordinate \
  --outSAMunmapped Within \
  --outSAMattributes Standard \
  --outFilterType BySJout \
  --outFilterMultimapNmax 20 \
  --outFilterMismatchNmax 999 \
  --outFilterMismatchNoverLmax 0.04 \
  --alignIntronMin 20 \
  --alignIntronMax 1000000 \
  --alignMatesGapMax 1000000 \
  --alignSJoverhangMin 8 \
  --alignSJDBoverhangMin 1 \
  --sjdbScore 1 \
  --runThreadN 8 # Adjust number of threads as needed

# Rename the output BAM file for clarity
mv "${OUTPUT_PREFIX}_Aligned.sortedByCoord.out.bam" "${OUTPUT_PREFIX}.bam"

# Index the BAM file
samtools index "${OUTPUT_PREFIX}.bam"

View on GitHub

SAILOR analysis to call C-to-U edits and keep only sites with score >0.5 and edit fraction <80%

SAILOR vNot specified GitHub

$ Bash example

# Install SAILOR (if not already installed)
# git clone https://github.com/yeolab/SAILOR.git
# cd SAILOR
# pip install -r requirements.txt
# # Ensure SAILOR.py is in your PATH or call it directly
# # For example, if you are in the SAILOR directory:
# # python SAILOR.py ...

# Placeholder variables for input and output files
INPUT_BAM="input.bam"
REFERENCE_FASTA="reference.fasta" # e.g., hg38.fa
OUTPUT_VCF="output_c_to_u_edits.vcf"

# Run SAILOR to call C-to-U edits with specified filters
# The default --min_score is 0.5 and --max_edit_fraction is 0.8, 
# so explicitly setting them here for clarity based on the description.
python SAILOR.py \
    --bam "$INPUT_BAM" \
    --ref "$REFERENCE_FASTA" \
    --output "$OUTPUT_VCF" \
    --min_score 0.5 \
    --max_edit_fraction 0.8 \
    --edit_type C_to_U

View on GitHub

FLARE analysis to call C-to-U edit clusters

FLARE vNot specified (Inferred with models/gemini-2.5-flash)

$ Bash example

# Clone FLARE repository
# git clone https://github.com/yeolab/FLARE.git
# cd FLARE

# Install dependencies (if not already installed in environment)
# pip install pysam numpy

# Define variables
INPUT_BAM="input.bam" # Replace with your input BAM file, typically aligned RNA-seq or eCLIP data
REFERENCE_GENOME="path/to/GRCh38.fa" # Placeholder for human hg38 reference genome (e.g., from UCSC or Ensembl)
KNOWN_SNPS_VCF="path/to/dbSNP_GRCh38.vcf.gz" # Placeholder for known SNPs VCF for GRCh38 (e.g., from NCBI dbSNP)
OUTPUT_PREFIX="flare_output"
THREADS=8 # Number of threads

# Create output directory if it doesn't exist
mkdir -p "${OUTPUT_PREFIX}_results"

# Run FLARE analysis to call C-to-U edit clusters
python FLARE.py \
    -i "${INPUT_BAM}" \
    -g "${REFERENCE_GENOME}" \
    -s "${KNOWN_SNPS_VCF}" \
    -o "${OUTPUT_PREFIX}_results/${OUTPUT_PREFIX}" \
    -t "${THREADS}" \
    --min_coverage 10 \
    --min_base_quality 20 \
    --min_mapping_quality 20 \
    --min_edit_ratio 0.1 # Example parameters for C-to-U editing, adjust as needed

Intersect the edit clusters from 3 replicates, which yields "*confident_peaks.bed"

merge_peaks (Inferred with models/gemini-2.5-flash) vN/A GitHub

$ Bash example

# Clone the merge_peaks repository if not already available
# git clone https://github.com/yeolab/merge_peaks.git
# cd merge_peaks

# Assuming the merge_peaks.py script is accessible in the current directory or PATH
python merge_peaks.py -i replicate1_edit_clusters.bed replicate2_edit_clusters.bed replicate3_edit_clusters.bed -o confident_peaks

View on GitHub

Subtract STAMP confident clusters to Buffer only control, which yields "*cleaned_confident_peaks.bed"

bedtools (Inferred with models/gemini-2.5-flash) v2.29.2 GitHub

$ Bash example

# Install bedtools if not already available
# conda install -c bioconda bedtools

# Subtract regions in 'buffer_only_control.bed' from 'stamp_confident_clusters.bed'
# The output 'cleaned_confident_peaks.bed' will contain regions from the STAMP clusters
# that do not overlap with the control regions.
bedtools subtract -a stamp_confident_clusters.bed -b buffer_only_control.bed > cleaned_confident_peaks.bed

View on GitHub

Tools Used

STAR SAILOR

Raw Source Text

remove adapter with Cutadapt
align to hg38 using STAR 2.4.0 (Homo sapiens) or mm10 using STAR 2.5.2 (Mus musculus)
SAILOR analysis to call C-to-U edits and keep only sites with score >0.5 and edit fraction <80%
FLARE analysis to call C-to-U edit clusters
Intersect the edit clusters from 3 replicates, which yields "*confident_peaks.bed"
Subtract STAMP confident clusters to Buffer only control, which yields "*cleaned_confident_peaks.bed"
Assembly: hg38/mm10
Supplementary files format and content: SAILOR step yields bed file: *0.5Score0.8Fraction.fastqTr.sorted.STARUnmapped.out.sorted.STARAligned.out.sorted.bam.combined.readfiltered.formatted.varfiltered.snpfiltered.ranked.bed
Supplementary files format and content: FLARE step yields .tsv file: "*merged_sorted_peaks.fdr_0.1.d_15.scored.tsv"
Supplementary files format and content: Intersection step yields .bed file:  "*confident_peaks.bed"
Supplementary files format and content: Subtraction step yields .bed file:  "*cleaned_confident_peaks.bed"

← Back to Analysis