GSE215252 Processing Pipeline

RNA-Seq code_examples 5 steps

Publication

FLARE: a fast and flexible workflow for identifying RNA editing foci.

BMC bioinformatics (2023) — PMID 37784060

Dataset

GSE215252

FLARE: A fast and flexible peak-calling pipeline for identifying RNA editing foci

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    STAR aligned original fastqs to hg19 reference build

    STAR v2.7.10a
    $ Bash example
    # Install STAR (example using conda)
    # conda create -n star_env star=2.7.10a -c bioconda -c conda-forge
    # conda activate star_env
    
    # --- Reference Genome Preparation (if not already done) ---
    # This step generates the STAR genome index for hg19.
    # Replace /path/to/hg19.fa and /path/to/hg19.gtf with actual paths.
    # The --sjdbOverhang parameter should be set to (ReadLength - 1) or 100 for typical RNA-seq.
    # For eCLIP, where reads are often short and unspliced, --sjdbOverhang might be less critical or set to a small value.
    # GENOME_DIR="/path/to/STAR_index/hg19"
    # STAR --runThreadN 8 --runMode genomeGenerate \
    #      --genomeDir ${GENOME_DIR} \
    #      --genomeFastaFiles /path/to/hg19.fa \
    #      --sjdbGTFfile /path/to/hg19.gtf \
    #      --sjdbOverhang 100
    
    # --- STAR Alignment Command ---
    # Define variables
    # Replace with actual paths and filenames
    GENOME_DIR="/path/to/STAR_index/hg19" # Path to the pre-built STAR genome index for hg19
    READ1="sample_R1.fastq.gz" # Input FASTQ file for Read 1
    READ2="sample_R2.fastq.gz" # Input FASTQ file for Read 2 (remove if single-end)
    OUTPUT_PREFIX="sample_aligned" # Prefix for output files
    THREADS=8 # Number of threads to use
    
    # STAR alignment command (parameters are common for eCLIP-like assays)
    STAR --runThreadN ${THREADS} \
         --genomeDir ${GENOME_DIR} \
         --readFilesIn ${READ1} ${READ2} \
         --readFilesCommand zcat \
         --outFileNamePrefix ${OUTPUT_PREFIX}_ \
         --outSAMtype BAM SortedByCoordinate \
         --outSAMattributes All \
         --outFilterMultimapNmax 1 \
         --outFilterMismatchNmax 3 \
         --alignIntronMax 1 \
         --alignSJDBoverhangMin 1 \
         --alignSJoverhangMin 8 \
         --outFilterScoreMinOverLread 0.66 \
         --outFilterMatchNminOverLread 0.66
  2. 2

    Run SAILOR to identify editing sites.

    SAILOR vlatest (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # It's recommended to install SAILOR in a conda environment
    # conda create -n sailor python=3.8
    # conda activate sailor
    # pip install git+https://github.com/yeolab/SAILOR.git
    
    # Define input and output paths
    INPUT_BAM="path/to/your/aligned_reads.bam" # Replace with your input BAM file
    REFERENCE_GENOME="path/to/your/GRCh38.fa" # Placeholder: Path to the reference genome FASTA (e.g., GRCh38)
    KNOWN_SNPS_VCF="path/to/your/common_snps_GRCh38.vcf.gz" # Placeholder: Path to a VCF of known common SNPs (e.g., from dbSNP) to filter out genomic variants
    OUTPUT_PREFIX="sailor_editing_sites"
    
    # Create output directory if it doesn't exist
    mkdir -p sailor_output
    
    # Run SAILOR to identify editing sites
    # Adjust parameters like --min-coverage, --min-edit-fraction, --threads as needed
    SAILOR run \
        --bam "${INPUT_BAM}" \
        --genome "${REFERENCE_GENOME}" \
        --vcf "${KNOWN_SNPS_VCF}" \
        --output "sailor_output/${OUTPUT_PREFIX}" \
        --min-coverage 10 \
        --min-base-quality 20 \
        --min-map-quality 20 \
        --min-edit-fraction 0.1 \
        --threads 8
    
  3. 3

    Code available at: https://github.com/YeoLab/FLARE

    FLARE vNot specified (Inferred with models/gemini-2.5-flash)
    $ Bash example
    # Install FLARE using conda (uncomment to run)
    # conda create -n flare_env python=3.8
    # conda activate flare_env
    # conda install -c bioconda flare
    
    # Define reference files (example using GRCh38/hg38 and Gencode v38)
    # Replace with actual paths to your reference files
    GENOME_FASTA="path/to/GRCh38.primary_assembly.genome.fa" # e.g., from Gencode or UCSC
    GTF_ANNOTATION="path/to/gencode.v38.annotation.gtf" # e.g., from Gencode
    
    # Define output directories
    FLARE_INDEX_DIR="flare_index_GRCh38_gencode_v38"
    QUANT_OUTPUT_DIR="flare_quant_output"
    
    # Input BAM file (example: replace with your aligned RNA-seq BAM file)
    INPUT_BAM="path/to/your_aligned_rna_seq.bam"
    
    # Create output directories if they don't exist
    mkdir -p "${FLARE_INDEX_DIR}"
    mkdir -p "${QUANT_OUTPUT_DIR}"
    
    # 1. Build the FLARE index
    # This step needs to be run once for a given genome and annotation.
    echo "Building FLARE index..."
    flare build \
        -g "${GENOME_FASTA}" \
        -a "${GTF_ANNOTATION}" \
        -o "${FLARE_INDEX_DIR}" \
        --threads 8 # Example: use 8 threads
    
    # Check if index build was successful
    if [ $? -ne 0 ]; then
        echo "FLARE index build failed. Exiting."
        exit 1
    fi
    
    # 2. Quantify full-length isoforms
    echo "Quantifying full-length isoforms with FLARE..."
    flare quant \
        -i "${FLARE_INDEX_DIR}" \
        -b "${INPUT_BAM}" \
        -o "${QUANT_OUTPUT_DIR}" \
        --threads 8 # Example: use 8 threads
    
    # Check if quantification was successful
    if [ $? -ne 0 ]; then
        echo "FLARE quantification failed. Exiting."
        exit 1
    fi
    
    echo "FLARE analysis complete."
  4. 4

    Run FLARE to identify regions of enriched editing.

    FLARE (Inferred with models/gemini-2.5-flash) v0.1.0 (Inferred from setup.py in yeolab/FLARE) GitHub
    $ Bash example
    # Clone the FLARE repository if not already available
    # git clone https://github.com/yeolab/FLARE.git
    # cd FLARE
    
    # Define input and output files
    # INPUT_BAM: Aligned RNA-seq reads in BAM format (e.g., from STAR alignment)
    INPUT_BAM="path/to/your/aligned_reads.bam"
    # OUTPUT_PREFIX: Prefix for all output files generated by FLARE
    OUTPUT_PREFIX="flare_output"
    
    # Define reference datasets
    # GENOME_FASTA: Reference genome in FASTA format (e.g., hg38.fa)
    # Source: UCSC Genome Browser (e.g., http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz)
    GENOME_FASTA="path/to/reference/hg38.fa"
    # REPEATS_BED: Repeat regions in BED format (e.g., from RepeatMasker)
    # This file is often provided with genome builds or can be generated.
    REPEATS_BED="path/to/reference/repeats.bed"
    # KNOWN_SITES_VCF: Known RNA editing sites in VCF format (optional, but recommended for filtering/annotation)
    # Source: e.g., REDIportal (http://www.www.compgen.unibe.ch/REDIportal/download.html) or DARNED
    KNOWN_SITES_VCF="path/to/reference/known_editing_sites.vcf"
    
    # Run FLARE to identify regions of enriched editing
    # Adjust parameters like --min_coverage, --min_editing_ratio, etc., as needed
    python flare.py \
        -i "${INPUT_BAM}" \
        -o "${OUTPUT_PREFIX}" \
        -g "${GENOME_FASTA}" \
        -r "${REPEATS_BED}" \
        -s "${KNOWN_SITES_VCF}" \
        --min_coverage 10 \
        --min_editing_ratio 0.05 \
        --min_base_quality 20 \
        --min_mapping_quality 20 \
        --threads $(nproc)
    
  5. 5

    Code available at: https://github.com/YeoLab/FLARE

    FLARE vv0.1.0 GitHub
    $ Bash example
    # Installation (example, uncomment if needed)
    # pip install flare
    
    # Example config.yaml for FLARE
    # This configuration file specifies input BAM files, GTF annotation,
    # and output directories for FLARE analysis.
    # Replace placeholders with actual paths.
    cat << EOF > config.yaml
    # Path to the GTF annotation file
    gtf: /path/to/your/reference/genome.gtf
    
    # Directory containing input BAM files
    bam_dir: /path/to/your/bam_files
    
    # List of sample names (corresponding to BAM files in bam_dir, e.g., sample1.bam)
    samples:
      - sample1
      - sample2
    
    # Output directory for FLARE results
    output_dir: ./flare_output
    
    # Optional: Number of threads to use
    threads: 8
    
    # Optional: Other FLARE specific parameters can be added here
    # For example, minimum read count, minimum junction count, etc.
    # min_read_count: 5
    # min_junction_count: 3
    EOF
    
    # Execute FLARE with the configuration file
    # Ensure 'flare' is in your PATH or specify the full path to the executable.
    flare run --config config.yaml

Tools Used

Raw Source Text
STAR aligned original fastqs to hg19 reference build
Run SAILOR to identify editing sites. Code available at: https://github.com/YeoLab/FLARE
Run FLARE to identify regions of enriched editing. Code available at: https://github.com/YeoLab/FLARE
Assembly: hg19
Supplementary files format and content: Peak files are tab delimited and include peak coordinates, edit fraction, fraction of reads edited, and score
← Back to Analysis