GSE215252 Processing Pipeline

RNA-Seq code_examples 5 steps

Publication

FLARE: a fast and flexible workflow for identifying RNA editing foci.

BMC bioinformatics (2023) — PMID 37784060

Dataset

FLARE: A fast and flexible peak-calling pipeline for identifying RNA editing foci

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

STAR aligned original fastqs to hg19 reference build

STAR v2.7.10a

$ Bash example

# Install STAR (example using conda)
# conda create -n star_env star=2.7.10a -c bioconda -c conda-forge
# conda activate star_env

# --- Reference Genome Preparation (if not already done) ---
# This step generates the STAR genome index for hg19.
# Replace /path/to/hg19.fa and /path/to/hg19.gtf with actual paths.
# The --sjdbOverhang parameter should be set to (ReadLength - 1) or 100 for typical RNA-seq.
# For eCLIP, where reads are often short and unspliced, --sjdbOverhang might be less critical or set to a small value.
# GENOME_DIR="/path/to/STAR_index/hg19"
# STAR --runThreadN 8 --runMode genomeGenerate \
#      --genomeDir ${GENOME_DIR} \
#      --genomeFastaFiles /path/to/hg19.fa \
#      --sjdbGTFfile /path/to/hg19.gtf \
#      --sjdbOverhang 100

# --- STAR Alignment Command ---
# Define variables
# Replace with actual paths and filenames
GENOME_DIR="/path/to/STAR_index/hg19" # Path to the pre-built STAR genome index for hg19
READ1="sample_R1.fastq.gz" # Input FASTQ file for Read 1
READ2="sample_R2.fastq.gz" # Input FASTQ file for Read 2 (remove if single-end)
OUTPUT_PREFIX="sample_aligned" # Prefix for output files
THREADS=8 # Number of threads to use

# STAR alignment command (parameters are common for eCLIP-like assays)
STAR --runThreadN ${THREADS} \
     --genomeDir ${GENOME_DIR} \
     --readFilesIn ${READ1} ${READ2} \
     --readFilesCommand zcat \
     --outFileNamePrefix ${OUTPUT_PREFIX}_ \
     --outSAMtype BAM SortedByCoordinate \
     --outSAMattributes All \
     --outFilterMultimapNmax 1 \
     --outFilterMismatchNmax 3 \
     --alignIntronMax 1 \
     --alignSJDBoverhangMin 1 \
     --alignSJoverhangMin 8 \
     --outFilterScoreMinOverLread 0.66 \
     --outFilterMatchNminOverLread 0.66

Run SAILOR to identify editing sites.

SAILOR vlatest (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# It's recommended to install SAILOR in a conda environment
# conda create -n sailor python=3.8
# conda activate sailor
# pip install git+https://github.com/yeolab/SAILOR.git

# Define input and output paths
INPUT_BAM="path/to/your/aligned_reads.bam" # Replace with your input BAM file
REFERENCE_GENOME="path/to/your/GRCh38.fa" # Placeholder: Path to the reference genome FASTA (e.g., GRCh38)
KNOWN_SNPS_VCF="path/to/your/common_snps_GRCh38.vcf.gz" # Placeholder: Path to a VCF of known common SNPs (e.g., from dbSNP) to filter out genomic variants
OUTPUT_PREFIX="sailor_editing_sites"

# Create output directory if it doesn't exist
mkdir -p sailor_output

# Run SAILOR to identify editing sites
# Adjust parameters like --min-coverage, --min-edit-fraction, --threads as needed
SAILOR run \
    --bam "${INPUT_BAM}" \
    --genome "${REFERENCE_GENOME}" \
    --vcf "${KNOWN_SNPS_VCF}" \
    --output "sailor_output/${OUTPUT_PREFIX}" \
    --min-coverage 10 \
    --min-base-quality 20 \
    --min-map-quality 20 \
    --min-edit-fraction 0.1 \
    --threads 8

View on GitHub

Code available at: https://github.com/YeoLab/FLARE

FLARE vNot specified (Inferred with models/gemini-2.5-flash)

$ Bash example

# Install FLARE using conda (uncomment to run)
# conda create -n flare_env python=3.8
# conda activate flare_env
# conda install -c bioconda flare

# Define reference files (example using GRCh38/hg38 and Gencode v38)
# Replace with actual paths to your reference files
GENOME_FASTA="path/to/GRCh38.primary_assembly.genome.fa" # e.g., from Gencode or UCSC
GTF_ANNOTATION="path/to/gencode.v38.annotation.gtf" # e.g., from Gencode

# Define output directories
FLARE_INDEX_DIR="flare_index_GRCh38_gencode_v38"
QUANT_OUTPUT_DIR="flare_quant_output"

# Input BAM file (example: replace with your aligned RNA-seq BAM file)
INPUT_BAM="path/to/your_aligned_rna_seq.bam"

# Create output directories if they don't exist
mkdir -p "${FLARE_INDEX_DIR}"
mkdir -p "${QUANT_OUTPUT_DIR}"

# 1. Build the FLARE index
# This step needs to be run once for a given genome and annotation.
echo "Building FLARE index..."
flare build \
    -g "${GENOME_FASTA}" \
    -a "${GTF_ANNOTATION}" \
    -o "${FLARE_INDEX_DIR}" \
    --threads 8 # Example: use 8 threads

# Check if index build was successful
if [ $? -ne 0 ]; then
    echo "FLARE index build failed. Exiting."
    exit 1
fi

# 2. Quantify full-length isoforms
echo "Quantifying full-length isoforms with FLARE..."
flare quant \
    -i "${FLARE_INDEX_DIR}" \
    -b "${INPUT_BAM}" \
    -o "${QUANT_OUTPUT_DIR}" \
    --threads 8 # Example: use 8 threads

# Check if quantification was successful
if [ $? -ne 0 ]; then
    echo "FLARE quantification failed. Exiting."
    exit 1
fi

echo "FLARE analysis complete."

Run FLARE to identify regions of enriched editing.

FLARE (Inferred with models/gemini-2.5-flash) v0.1.0 (Inferred from setup.py in yeolab/FLARE) GitHub

$ Bash example

# Clone the FLARE repository if not already available
# git clone https://github.com/yeolab/FLARE.git
# cd FLARE

# Define input and output files
# INPUT_BAM: Aligned RNA-seq reads in BAM format (e.g., from STAR alignment)
INPUT_BAM="path/to/your/aligned_reads.bam"
# OUTPUT_PREFIX: Prefix for all output files generated by FLARE
OUTPUT_PREFIX="flare_output"

# Define reference datasets
# GENOME_FASTA: Reference genome in FASTA format (e.g., hg38.fa)
# Source: UCSC Genome Browser (e.g., http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz)
GENOME_FASTA="path/to/reference/hg38.fa"
# REPEATS_BED: Repeat regions in BED format (e.g., from RepeatMasker)
# This file is often provided with genome builds or can be generated.
REPEATS_BED="path/to/reference/repeats.bed"
# KNOWN_SITES_VCF: Known RNA editing sites in VCF format (optional, but recommended for filtering/annotation)
# Source: e.g., REDIportal (http://www.www.compgen.unibe.ch/REDIportal/download.html) or DARNED
KNOWN_SITES_VCF="path/to/reference/known_editing_sites.vcf"

# Run FLARE to identify regions of enriched editing
# Adjust parameters like --min_coverage, --min_editing_ratio, etc., as needed
python flare.py \
    -i "${INPUT_BAM}" \
    -o "${OUTPUT_PREFIX}" \
    -g "${GENOME_FASTA}" \
    -r "${REPEATS_BED}" \
    -s "${KNOWN_SITES_VCF}" \
    --min_coverage 10 \
    --min_editing_ratio 0.05 \
    --min_base_quality 20 \
    --min_mapping_quality 20 \
    --threads $(nproc)

View on GitHub

Code available at: https://github.com/YeoLab/FLARE

FLARE vv0.1.0 GitHub

$ Bash example

# Installation (example, uncomment if needed)
# pip install flare

# Example config.yaml for FLARE
# This configuration file specifies input BAM files, GTF annotation,
# and output directories for FLARE analysis.
# Replace placeholders with actual paths.
cat << EOF > config.yaml
# Path to the GTF annotation file
gtf: /path/to/your/reference/genome.gtf

# Directory containing input BAM files
bam_dir: /path/to/your/bam_files

# List of sample names (corresponding to BAM files in bam_dir, e.g., sample1.bam)
samples:
  - sample1
  - sample2

# Output directory for FLARE results
output_dir: ./flare_output

# Optional: Number of threads to use
threads: 8

# Optional: Other FLARE specific parameters can be added here
# For example, minimum read count, minimum junction count, etc.
# min_read_count: 5
# min_junction_count: 3
EOF

# Execute FLARE with the configuration file
# Ensure 'flare' is in your PATH or specify the full path to the executable.
flare run --config config.yaml

View on GitHub

Tools Used

STAR SAILOR

Raw Source Text

STAR aligned original fastqs to hg19 reference build
Run SAILOR to identify editing sites. Code available at: https://github.com/YeoLab/FLARE
Run FLARE to identify regions of enriched editing. Code available at: https://github.com/YeoLab/FLARE
Assembly: hg19
Supplementary files format and content: Peak files are tab delimited and include peak coordinates, edit fraction, fraction of reads edited, and score

← Back to Analysis