GSE248461 Processing Pipeline

RNA-Seq code_examples 6 steps

Publication

High-sensitivity in situ capture of endogenous RNA-protein interactions in fixed cells and primary tissues.

Nature communications (2024) — PMID 39152130

Dataset

GSE248461

An in situ method for identification of transcriptome-wide protein-RNA interactions in cells [in_situ_STAMP II]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    remove adapter with Cutadapt

    cutadapt v4.0 GitHub
    $ Bash example
    # Install Cutadapt (e.g., via conda)
    # conda install -c bioconda cutadapt=4.0
    
    # Define input and output files
    INPUT_FASTQ="reads.fastq.gz"
    OUTPUT_FASTQ="reads_trimmed.fastq.gz"
    
    # Define adapter sequence (example for a common Illumina 3' adapter in eCLIP)
    # This adapter sequence is a placeholder. Replace with the actual adapter used in your library preparation.
    ADAPTER_SEQUENCE="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA"
    
    # Define trimming parameters
    MINIMUM_LENGTH=18  # Minimum read length after trimming
    QUALITY_CUTOFF=20  # Quality cutoff for trimming low-quality bases from the 3' end
    THREADS=8          # Number of CPU threads to use
    
    # Execute Cutadapt command
    cutadapt \
      -a "${ADAPTER_SEQUENCE}" \
      -o "${OUTPUT_FASTQ}" \
      -m ${MINIMUM_LENGTH} \
      -q ${QUALITY_CUTOFF} \
      --cores ${THREADS} \
      "${INPUT_FASTQ}"
  2. 2

    align to hg38 using STAR 2.4.0 (Homo sapiens)

    STAR v2.4.0
    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star=2.4.0
    
    # Define variables
    GENOME_DIR="/path/to/STAR_index/hg38" # Replace with the actual path to your hg38 STAR index
    READ1="read1.fastq.gz" # Replace with your R1 FASTQ file
    READ2="read2.fastq.gz" # Replace with your R2 FASTQ file (if paired-end)
    OUTPUT_PREFIX="aligned_reads_" # Prefix for output files (e.g., aligned_reads_Aligned.sortedByCoord.out.bam)
    THREADS=8 # Adjust as needed
    
    # Create STAR index if it doesn't exist (or download a pre-built one)
    # This step is usually done once per genome. Example command to build index (requires genome FASTA and GTF):
    # STAR --runMode genomeGenerate \
    #      --genomeDir ${GENOME_DIR} \
    #      --genomeFastaFiles /path/to/hg38.fa \
    #      --sjdbGTFfile /path/to/gencode.vXX.annotation.gtf \
    #      --runThreadN ${THREADS}
    
    # Align reads to hg38 using STAR
    STAR --genomeDir ${GENOME_DIR} \
         --readFilesIn ${READ1} ${READ2} \
         --runThreadN ${THREADS} \
         --outFileNamePrefix ${OUTPUT_PREFIX} \
         --outSAMtype BAM SortedByCoordinate \
         --outSAMunmapped Within \
         --outSAMattributes Standard \
         --readFilesCommand zcat \
         --outFilterMultimapNmax 20 \
         --outFilterMismatchNmax 999 \
         --outFilterMismatchNoverLmax 0.04 \
         --alignIntronMin 20 \
         --alignIntronMax 1000000 \
         --alignMatesGapMax 1000000 \
         --sjdbScore 1 \
         --limitBAMsortRAM 30000000000
  3. 3

    SAILOR analysis to call C-to-U edits and keep only sites with score >0.5 and edit fraction <80%

    $ Bash example
    # Install SAILOR (if not already installed)
    # pip install sailor
    # or
    # conda install -c bioconda sailor
    
    # Placeholder for input BAM file and reference FASTA.
    # Replace 'input.bam' with your actual aligned BAM file.
    # Replace 'reference.fasta' with the path to your reference genome FASTA file (e.g., hg38).
    
    sailor call_edits \
        --bam input.bam \
        --fasta reference.fasta \
        --output output_edits.vcf \
        --min_score 0.5 \
        --max_edit_fraction 0.8
  4. 4

    FLARE analysis to call C-to-U edit clusters

    FLARE vNot explicitly versioned, often used as a script (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Clone the FLARE repository
    # git clone https://github.com/yeolab/FLARE.git
    # cd FLARE
    #
    # Create a conda environment (example, check FLARE's dependencies like pysam, pybedtools)
    # conda create -n flare_env python=3.8 samtools bedtools -y
    # conda activate flare_env
    #
    # If there are specific Python dependencies, they might be in a requirements.txt or need manual installation
    # pip install pysam pybedtools
    
    # Define input and output paths
    INPUT_BAM="aligned_reads.bam" # Replace with your input BAM file (e.g., from STAR alignment)
    REFERENCE_GENOME_FASTA="path/to/human_hg38.fa" # Replace with your reference genome FASTA (e.g., from UCSC, Ensembl)
    OUTPUT_DIR="flare_c_to_u_output"
    SAMPLE_NAME="sample1"
    
    # Create output directory if it doesn't exist
    mkdir -p "${OUTPUT_DIR}"
    
    # Run FLARE analysis to identify RNA editing sites.
    # FLARE is primarily designed for A-to-I editing, but its framework can detect any base changes.
    # The output can then be filtered for C-to-U specific edits.
    # Ensure the 'flare.py' script is accessible (e.g., by being in the current directory or in PATH).
    # Assuming FLARE repository was cloned to './FLARE'
    
    python FLARE/src/flare.py \
        -i "${INPUT_BAM}" \
        -r "${REFERENCE_GENOME_FASTA}" \
        -o "${OUTPUT_DIR}" \
        -s "${SAMPLE_NAME}" \
        --min_coverage 10 \
        --min_edit_freq 0.05 \
        --strand_specific # Use if your library preparation is strand-specific
        # Add other relevant parameters as needed, e.g., --regions for specific genomic regions
    
    # After FLARE generates its output (e.g., a TSV or BED file),
    # you would typically post-process it to filter for C-to-U edits.
    # Example of a conceptual filtering step (requires tools like awk, grep, or a custom Python script):
    # grep -E "^chr.*\t.*\t.*\tC\tU" "${OUTPUT_DIR}/${SAMPLE_NAME}_editing_sites.tsv" > "${OUTPUT_DIR}/${SAMPLE_NAME}_c_to_u_editing_sites.tsv"
    
  5. 5

    Intersect the edit clusters from 3 replicates, which yields "*confident_peaks.bed"

    merge_peaks.py (Inferred with models/gemini-2.5-flash) vN/A GitHub
    $ Bash example
    # Install dependencies and clone the repository
    # conda create -n merge_peaks_env python=3.8
    # conda activate merge_peaks_env
    # pip install numpy scipy pybedtools matplotlib seaborn
    # git clone https://github.com/yeolab/merge_peaks.git
    # cd merge_peaks
    
    # Execute the peak merging script
    # Assuming 'rep1_edit_clusters.bed', 'rep2_edit_clusters.bed', and 'rep3_edit_clusters.bed' are the input edit cluster files from the 3 replicates.
    # The '-m 3' parameter ensures that only peaks present in all 3 replicates are reported, matching the 'intersect' description.
    python merge_peaks.py -i rep1_edit_clusters.bed rep2_edit_clusters.bed rep3_edit_clusters.bed -o confident_peaks.bed -m 3
  6. 6

    Subtract STAMP confident clusters to Buffer only control, which yields "*cleaned_confident_peaks.bed"

    bedtools subtract (Inferred with models/gemini-2.5-flash) v2.30.0 GitHub
    $ Bash example
    # Install bedtools if not already installed
    # conda install -c bioconda bedtools
    
    # Subtract regions present in the buffer_control_peaks.bed from the stamp_confident_clusters.bed.
    # This yields 'cleaned_confident_peaks.bed' which contains regions unique to the STAMP clusters.
    bedtools subtract -a stamp_confident_clusters.bed -b buffer_control_peaks.bed > cleaned_confident_peaks.bed

Tools Used

Raw Source Text
remove adapter with Cutadapt
align to hg38 using STAR 2.4.0 (Homo sapiens)
SAILOR analysis to call C-to-U edits and keep only sites with score >0.5 and edit fraction <80%
FLARE analysis to call C-to-U edit clusters
Intersect the edit clusters from 3 replicates, which yields "*confident_peaks.bed"
Subtract STAMP confident clusters to Buffer only control, which yields "*cleaned_confident_peaks.bed"
Assembly: hg38
Supplementary files format and content: SAILOR step yields bed file: *0.5Score0.8Fraction.fastqTr.sorted.STARUnmapped.out.sorted.STARAligned.out.sorted.bam.combined.readfiltered.formatted.varfiltered.snpfiltered.ranked.bed
Supplementary files format and content: FLARE step yields .tsv file: "*merged_sorted_peaks.fdr_0.1.d_15.scored.tsv"
Supplementary files format and content: Intersection step yields .bed file:  "*confident_peaks.bed"
Supplementary files format and content: Subtraction step yields .bed file:  "*cleaned_confident_peaks.bed"
← Back to Analysis