GSE255844 Processing Pipeline

RNA-Seq code_examples 5 steps

Publication

Long-read Ribo-STAMP simultaneously measures transcription and translation with isoform resolution.

Genome research (2024) — PMID 38906680

Dataset

GSE255844

Long-read Ribo-STAMP simultaneously measures transcription and translation at full length isoform resolution

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Sequencing data was processed using the Isoseq v4 pipeline with lima (parameter: --isoseq) to generate full-length non-concatemer reads and isoseq refine (parameter: --require-polya) to generate refined reads.

    IsoSeq vv4 GitHub
    $ Bash example
    # Install PacBio SMRT Tools (pbbioconda) if not already installed
    # conda create -n isoseq_env pbbioconda
    # conda activate isoseq_env
    
    # Assuming 'input.ccs.bam' are circular consensus (CCS) reads and 'primers.fasta' contains IsoSeq primers (e.g., SMRTbell adapters)
    # Generate full-length non-concatemer (FLNC) reads using lima
    lima --isoseq input.ccs.bam primers.fasta output.flnc.bam
    
    # Refine FLNC reads, requiring a polyA tail
    isoseq refine --require-polya output.flnc.bam output.refined.bam
  2. 2

    HEK293T APOBEC1-only and Ribo-STAMP data were aligned to hg19 reference and MDA-MB-231 Ribo-STAMP data (NT and CoCl2) were aligned to hg38 reference using pbmm2 align (parameter: --preset ISOSEQ).

    IsoSeq vNot specified (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install pbmm2 (part of pbtools)
    # conda install -c bioconda pbmm2
    
    # Define reference genomes
    # Download hg19 reference FASTA from UCSC
    # wget -O hg19.fasta.gz http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz
    # gunzip hg19.fasta.gz
    HG19_REF="hg19.fasta" # Path to hg19 reference FASTA
    
    # Download hg38 reference FASTA from UCSC
    # wget -O hg38.fasta.gz http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
    # gunzip hg38.fasta.gz
    HG38_REF="hg38.fasta" # Path to hg38 reference FASTA
    
    # Input data placeholders (assuming FASTQ files as input for alignment)
    HEK293T_INPUT="hek293t_apobec1_ribostamp.fastq"
    MDAMB231_NT_INPUT="mdamb231_ribostamp_nt.fastq"
    MDAMB231_COCL2_INPUT="mdamb231_ribostamp_cocl2.fastq"
    
    # Align HEK293T APOBEC1-only and Ribo-STAMP data to hg19
    pbmm2 align "$HG19_REF" "$HEK293T_INPUT" "hek293t_apobec1_ribostamp_hg19.bam" --preset ISOSEQ
    
    # Align MDA-MB-231 Ribo-STAMP NT data to hg38
    pbmm2 align "$HG38_REF" "$MDAMB231_NT_INPUT" "mdamb231_ribostamp_nt_hg38.bam" --preset ISOSEQ
    
    # Align MDA-MB-231 Ribo-STAMP CoCl2 data to hg38
    pbmm2 align "$HG38_REF" "$MDAMB231_COCL2_INPUT" "mdamb231_ribostamp_cocl2_hg38.bam" --preset ISOSEQ
  3. 3

    QC was completed using NanoPlot (parameters: --raw and --tsv_stats).

    NanoPlot v1.41.0 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install NanoPlot (example using conda)
    # conda create -n nanopore_qc nanopore-qc
    # conda activate nanopore_qc
    # conda install -c bioconda nanopore-qc
    
    # Example NanoPlot command for QC of raw Nanopore reads
    # Assuming 'nanopore_reads.fastq' is the input raw FASTQ file
    # Output will be generated in the 'nanoplot_output' directory
    
    mkdir -p nanoplot_output
    NanoPlot --raw --tsv_stats --fastq nanopore_reads.fastq --outdir nanoplot_output
  4. 4

    Reads were filtered for uniquely mapped reads and read counts obtained using IsoQuant (parameters: --data_type pacbio, --transcript_quantification unique_only, and --gene_quantification unique_only)

    $ Bash example
    # Install IsoQuant (example using pip)
    # pip install IsoQuant
    # Or using conda
    # conda create -n isoquant_env python=3.8
    # conda activate isoquant_env
    # pip install IsoQuant
    
    # Placeholder for input PacBio aligned reads (BAM)
    INPUT_BAM="path/to/your/pacbio_aligned_reads.bam"
    # Placeholder for reference genome FASTA (e.g., hg38.fa)
    GENOME_FASTA="path/to/your/reference_genome.fasta"
    # Placeholder for gene annotation GTF/GFF3 (e.g., gencode.v38.annotation.gtf)
    ANNOTATION_GTF="path/to/your/annotation.gtf"
    # Output directory for IsoQuant results
    OUTPUT_DIR="isoquant_output"
    
    # Create output directory if it doesn't exist
    mkdir -p "${OUTPUT_DIR}"
    
    # Execute IsoQuant for read quantification
    isoquant \
        --data_type pacbio \
        --transcript_quantification unique_only \
        --gene_quantification unique_only \
        --genome "${GENOME_FASTA}" \
        --gtf "${ANNOTATION_GTF}" \
        --bam "${INPUT_BAM}" \
        --output "${OUTPUT_DIR}"
  5. 5

    Edits were identified and filtered using custom scripts.

    RNA_Edit_Filter_Script (Inferred with models/gemini-2.5-flash) vCustom GitHub
    $ Bash example
    # This step identifies and filters RNA editing sites using custom scripts.
    # The specific script and parameters would depend on the custom implementation.
    # Input is typically a VCF file containing potential RNA editing sites identified from aligned RNA-seq data.
    # Output is a filtered VCF file with high-confidence RNA editing sites.
    
    # Define input and output files (placeholders)
    INPUT_VCF="identified_rna_edits.vcf"
    OUTPUT_VCF="filtered_rna_edits.vcf"
    
    # Define reference genome (using hg38 as a common latest assembly placeholder)
    # The actual reference genome used should match the one used for alignment.
    REFERENCE_GENOME="/path/to/human_genome/hg38.fa"
    
    # Placeholder for custom script execution.
    # The script would typically take parameters for filtering criteria such as:
    # - Minimum read depth at the editing site
    # - Minimum allele frequency of the edited base
    # - Exclusion of known SNPs (e.g., from dbSNP)
    # - Exclusion of sites in repetitive regions or low-complexity regions
    
    # Example command for a hypothetical custom script:
    # Replace 'custom_rna_edit_filter.sh' with the actual script name.
    # Replace parameters with those used in the specific custom script.
    custom_rna_edit_filter.sh \
        --input ${INPUT_VCF} \
        --output ${OUTPUT_VCF} \
        --reference ${REFERENCE_GENOME} \
        --min_depth 10 \
        --min_allele_frequency 0.1 \
        --exclude_snps "/path/to/dbSNP/common_snps.vcf.gz" \
        --filter_criteria "custom_filter_settings.txt"

Tools Used

Raw Source Text
Sequencing data was processed using the Isoseq v4 pipeline with lima (parameter: --isoseq) to generate full-length non-concatemer reads and isoseq refine (parameter: --require-polya) to generate refined reads.
HEK293T APOBEC1-only and Ribo-STAMP data were aligned to hg19 reference and MDA-MB-231 Ribo-STAMP data (NT and CoCl2) were aligned to hg38 reference using pbmm2 align (parameter: --preset ISOSEQ).
QC was completed using NanoPlot (parameters: --raw and --tsv_stats).
Reads were filtered for uniquely mapped reads and read counts obtained using IsoQuant (parameters: --data_type pacbio, --transcript_quantification unique_only, and --gene_quantification unique_only)
Edits were identified and filtered using custom scripts.
Assembly: hg19, hg38
Supplementary files format and content: tab-delimited file containing edited positions, the number of reads with C-to-U edits at each positions (conversion), the total number of reads at each position, and each edit's assignment to a gene and isoform (BED)
← Back to Analysis