GSE215252 Processing Pipeline
RNA-Seq
code_examples
5 steps
Publication
FLARE: a fast and flexible workflow for identifying RNA editing foci.BMC bioinformatics (2023) — PMID 37784060
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
STAR aligned original fastqs to hg19 reference build
STAR v2.7.10a$ Bash example
# Install STAR (example using conda) # conda create -n star_env star=2.7.10a -c bioconda -c conda-forge # conda activate star_env # --- Reference Genome Preparation (if not already done) --- # This step generates the STAR genome index for hg19. # Replace /path/to/hg19.fa and /path/to/hg19.gtf with actual paths. # The --sjdbOverhang parameter should be set to (ReadLength - 1) or 100 for typical RNA-seq. # For eCLIP, where reads are often short and unspliced, --sjdbOverhang might be less critical or set to a small value. # GENOME_DIR="/path/to/STAR_index/hg19" # STAR --runThreadN 8 --runMode genomeGenerate \ # --genomeDir ${GENOME_DIR} \ # --genomeFastaFiles /path/to/hg19.fa \ # --sjdbGTFfile /path/to/hg19.gtf \ # --sjdbOverhang 100 # --- STAR Alignment Command --- # Define variables # Replace with actual paths and filenames GENOME_DIR="/path/to/STAR_index/hg19" # Path to the pre-built STAR genome index for hg19 READ1="sample_R1.fastq.gz" # Input FASTQ file for Read 1 READ2="sample_R2.fastq.gz" # Input FASTQ file for Read 2 (remove if single-end) OUTPUT_PREFIX="sample_aligned" # Prefix for output files THREADS=8 # Number of threads to use # STAR alignment command (parameters are common for eCLIP-like assays) STAR --runThreadN ${THREADS} \ --genomeDir ${GENOME_DIR} \ --readFilesIn ${READ1} ${READ2} \ --readFilesCommand zcat \ --outFileNamePrefix ${OUTPUT_PREFIX}_ \ --outSAMtype BAM SortedByCoordinate \ --outSAMattributes All \ --outFilterMultimapNmax 1 \ --outFilterMismatchNmax 3 \ --alignIntronMax 1 \ --alignSJDBoverhangMin 1 \ --alignSJoverhangMin 8 \ --outFilterScoreMinOverLread 0.66 \ --outFilterMatchNminOverLread 0.66 -
2
Run SAILOR to identify editing sites.
$ Bash example
# It's recommended to install SAILOR in a conda environment # conda create -n sailor python=3.8 # conda activate sailor # pip install git+https://github.com/yeolab/SAILOR.git # Define input and output paths INPUT_BAM="path/to/your/aligned_reads.bam" # Replace with your input BAM file REFERENCE_GENOME="path/to/your/GRCh38.fa" # Placeholder: Path to the reference genome FASTA (e.g., GRCh38) KNOWN_SNPS_VCF="path/to/your/common_snps_GRCh38.vcf.gz" # Placeholder: Path to a VCF of known common SNPs (e.g., from dbSNP) to filter out genomic variants OUTPUT_PREFIX="sailor_editing_sites" # Create output directory if it doesn't exist mkdir -p sailor_output # Run SAILOR to identify editing sites # Adjust parameters like --min-coverage, --min-edit-fraction, --threads as needed SAILOR run \ --bam "${INPUT_BAM}" \ --genome "${REFERENCE_GENOME}" \ --vcf "${KNOWN_SNPS_VCF}" \ --output "sailor_output/${OUTPUT_PREFIX}" \ --min-coverage 10 \ --min-base-quality 20 \ --min-map-quality 20 \ --min-edit-fraction 0.1 \ --threads 8 -
3
Code available at: https://github.com/YeoLab/FLARE
FLARE vNot specified (Inferred with models/gemini-2.5-flash)$ Bash example
# Install FLARE using conda (uncomment to run) # conda create -n flare_env python=3.8 # conda activate flare_env # conda install -c bioconda flare # Define reference files (example using GRCh38/hg38 and Gencode v38) # Replace with actual paths to your reference files GENOME_FASTA="path/to/GRCh38.primary_assembly.genome.fa" # e.g., from Gencode or UCSC GTF_ANNOTATION="path/to/gencode.v38.annotation.gtf" # e.g., from Gencode # Define output directories FLARE_INDEX_DIR="flare_index_GRCh38_gencode_v38" QUANT_OUTPUT_DIR="flare_quant_output" # Input BAM file (example: replace with your aligned RNA-seq BAM file) INPUT_BAM="path/to/your_aligned_rna_seq.bam" # Create output directories if they don't exist mkdir -p "${FLARE_INDEX_DIR}" mkdir -p "${QUANT_OUTPUT_DIR}" # 1. Build the FLARE index # This step needs to be run once for a given genome and annotation. echo "Building FLARE index..." flare build \ -g "${GENOME_FASTA}" \ -a "${GTF_ANNOTATION}" \ -o "${FLARE_INDEX_DIR}" \ --threads 8 # Example: use 8 threads # Check if index build was successful if [ $? -ne 0 ]; then echo "FLARE index build failed. Exiting." exit 1 fi # 2. Quantify full-length isoforms echo "Quantifying full-length isoforms with FLARE..." flare quant \ -i "${FLARE_INDEX_DIR}" \ -b "${INPUT_BAM}" \ -o "${QUANT_OUTPUT_DIR}" \ --threads 8 # Example: use 8 threads # Check if quantification was successful if [ $? -ne 0 ]; then echo "FLARE quantification failed. Exiting." exit 1 fi echo "FLARE analysis complete." -
4
Run FLARE to identify regions of enriched editing.
FLARE (Inferred with models/gemini-2.5-flash) v0.1.0 (Inferred from setup.py in yeolab/FLARE) GitHub$ Bash example
# Clone the FLARE repository if not already available # git clone https://github.com/yeolab/FLARE.git # cd FLARE # Define input and output files # INPUT_BAM: Aligned RNA-seq reads in BAM format (e.g., from STAR alignment) INPUT_BAM="path/to/your/aligned_reads.bam" # OUTPUT_PREFIX: Prefix for all output files generated by FLARE OUTPUT_PREFIX="flare_output" # Define reference datasets # GENOME_FASTA: Reference genome in FASTA format (e.g., hg38.fa) # Source: UCSC Genome Browser (e.g., http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz) GENOME_FASTA="path/to/reference/hg38.fa" # REPEATS_BED: Repeat regions in BED format (e.g., from RepeatMasker) # This file is often provided with genome builds or can be generated. REPEATS_BED="path/to/reference/repeats.bed" # KNOWN_SITES_VCF: Known RNA editing sites in VCF format (optional, but recommended for filtering/annotation) # Source: e.g., REDIportal (http://www.www.compgen.unibe.ch/REDIportal/download.html) or DARNED KNOWN_SITES_VCF="path/to/reference/known_editing_sites.vcf" # Run FLARE to identify regions of enriched editing # Adjust parameters like --min_coverage, --min_editing_ratio, etc., as needed python flare.py \ -i "${INPUT_BAM}" \ -o "${OUTPUT_PREFIX}" \ -g "${GENOME_FASTA}" \ -r "${REPEATS_BED}" \ -s "${KNOWN_SITES_VCF}" \ --min_coverage 10 \ --min_editing_ratio 0.05 \ --min_base_quality 20 \ --min_mapping_quality 20 \ --threads $(nproc) -
5
Code available at: https://github.com/YeoLab/FLARE
$ Bash example
# Installation (example, uncomment if needed) # pip install flare # Example config.yaml for FLARE # This configuration file specifies input BAM files, GTF annotation, # and output directories for FLARE analysis. # Replace placeholders with actual paths. cat << EOF > config.yaml # Path to the GTF annotation file gtf: /path/to/your/reference/genome.gtf # Directory containing input BAM files bam_dir: /path/to/your/bam_files # List of sample names (corresponding to BAM files in bam_dir, e.g., sample1.bam) samples: - sample1 - sample2 # Output directory for FLARE results output_dir: ./flare_output # Optional: Number of threads to use threads: 8 # Optional: Other FLARE specific parameters can be added here # For example, minimum read count, minimum junction count, etc. # min_read_count: 5 # min_junction_count: 3 EOF # Execute FLARE with the configuration file # Ensure 'flare' is in your PATH or specify the full path to the executable. flare run --config config.yaml
Raw Source Text
STAR aligned original fastqs to hg19 reference build Run SAILOR to identify editing sites. Code available at: https://github.com/YeoLab/FLARE Run FLARE to identify regions of enriched editing. Code available at: https://github.com/YeoLab/FLARE Assembly: hg19 Supplementary files format and content: Peak files are tab delimited and include peak coordinates, edit fraction, fraction of reads edited, and score