GSE255844 Processing Pipeline

RNA-Seq code_examples 5 steps

Publication

Long-read Ribo-STAMP simultaneously measures transcription and translation with isoform resolution.

Genome research (2024) — PMID 38906680

Dataset

Long-read Ribo-STAMP simultaneously measures transcription and translation at full length isoform resolution

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Sequencing data was processed using the Isoseq v4 pipeline with lima (parameter: --isoseq) to generate full-length non-concatemer reads and isoseq refine (parameter: --require-polya) to generate refined reads.

IsoSeq vv4 GitHub

$ Bash example

# Install PacBio SMRT Tools (pbbioconda) if not already installed
# conda create -n isoseq_env pbbioconda
# conda activate isoseq_env

# Assuming 'input.ccs.bam' are circular consensus (CCS) reads and 'primers.fasta' contains IsoSeq primers (e.g., SMRTbell adapters)
# Generate full-length non-concatemer (FLNC) reads using lima
lima --isoseq input.ccs.bam primers.fasta output.flnc.bam

# Refine FLNC reads, requiring a polyA tail
isoseq refine --require-polya output.flnc.bam output.refined.bam

View on GitHub

HEK293T APOBEC1-only and Ribo-STAMP data were aligned to hg19 reference and MDA-MB-231 Ribo-STAMP data (NT and CoCl2) were aligned to hg38 reference using pbmm2 align (parameter: --preset ISOSEQ).

IsoSeq vNot specified (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install pbmm2 (part of pbtools)
# conda install -c bioconda pbmm2

# Define reference genomes
# Download hg19 reference FASTA from UCSC
# wget -O hg19.fasta.gz http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz
# gunzip hg19.fasta.gz
HG19_REF="hg19.fasta" # Path to hg19 reference FASTA

# Download hg38 reference FASTA from UCSC
# wget -O hg38.fasta.gz http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
# gunzip hg38.fasta.gz
HG38_REF="hg38.fasta" # Path to hg38 reference FASTA

# Input data placeholders (assuming FASTQ files as input for alignment)
HEK293T_INPUT="hek293t_apobec1_ribostamp.fastq"
MDAMB231_NT_INPUT="mdamb231_ribostamp_nt.fastq"
MDAMB231_COCL2_INPUT="mdamb231_ribostamp_cocl2.fastq"

# Align HEK293T APOBEC1-only and Ribo-STAMP data to hg19
pbmm2 align "$HG19_REF" "$HEK293T_INPUT" "hek293t_apobec1_ribostamp_hg19.bam" --preset ISOSEQ

# Align MDA-MB-231 Ribo-STAMP NT data to hg38
pbmm2 align "$HG38_REF" "$MDAMB231_NT_INPUT" "mdamb231_ribostamp_nt_hg38.bam" --preset ISOSEQ

# Align MDA-MB-231 Ribo-STAMP CoCl2 data to hg38
pbmm2 align "$HG38_REF" "$MDAMB231_COCL2_INPUT" "mdamb231_ribostamp_cocl2_hg38.bam" --preset ISOSEQ

View on GitHub

QC was completed using NanoPlot (parameters: --raw and --tsv_stats).

NanoPlot v1.41.0 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install NanoPlot (example using conda)
# conda create -n nanopore_qc nanopore-qc
# conda activate nanopore_qc
# conda install -c bioconda nanopore-qc

# Example NanoPlot command for QC of raw Nanopore reads
# Assuming 'nanopore_reads.fastq' is the input raw FASTQ file
# Output will be generated in the 'nanoplot_output' directory

mkdir -p nanoplot_output
NanoPlot --raw --tsv_stats --fastq nanopore_reads.fastq --outdir nanoplot_output

View on GitHub

Reads were filtered for uniquely mapped reads and read counts obtained using IsoQuant (parameters: --data_type pacbio, --transcript_quantification unique_only, and --gene_quantification unique_only)

PacBio sequencing vv2.0.0 GitHub

$ Bash example

# Install IsoQuant (example using pip)
# pip install IsoQuant
# Or using conda
# conda create -n isoquant_env python=3.8
# conda activate isoquant_env
# pip install IsoQuant

# Placeholder for input PacBio aligned reads (BAM)
INPUT_BAM="path/to/your/pacbio_aligned_reads.bam"
# Placeholder for reference genome FASTA (e.g., hg38.fa)
GENOME_FASTA="path/to/your/reference_genome.fasta"
# Placeholder for gene annotation GTF/GFF3 (e.g., gencode.v38.annotation.gtf)
ANNOTATION_GTF="path/to/your/annotation.gtf"
# Output directory for IsoQuant results
OUTPUT_DIR="isoquant_output"

# Create output directory if it doesn't exist
mkdir -p "${OUTPUT_DIR}"

# Execute IsoQuant for read quantification
isoquant \
    --data_type pacbio \
    --transcript_quantification unique_only \
    --gene_quantification unique_only \
    --genome "${GENOME_FASTA}" \
    --gtf "${ANNOTATION_GTF}" \
    --bam "${INPUT_BAM}" \
    --output "${OUTPUT_DIR}"

View on GitHub

Edits were identified and filtered using custom scripts.

RNA_Edit_Filter_Script (Inferred with models/gemini-2.5-flash) vCustom GitHub

$ Bash example

# This step identifies and filters RNA editing sites using custom scripts.
# The specific script and parameters would depend on the custom implementation.
# Input is typically a VCF file containing potential RNA editing sites identified from aligned RNA-seq data.
# Output is a filtered VCF file with high-confidence RNA editing sites.

# Define input and output files (placeholders)
INPUT_VCF="identified_rna_edits.vcf"
OUTPUT_VCF="filtered_rna_edits.vcf"

# Define reference genome (using hg38 as a common latest assembly placeholder)
# The actual reference genome used should match the one used for alignment.
REFERENCE_GENOME="/path/to/human_genome/hg38.fa"

# Placeholder for custom script execution.
# The script would typically take parameters for filtering criteria such as:
# - Minimum read depth at the editing site
# - Minimum allele frequency of the edited base
# - Exclusion of known SNPs (e.g., from dbSNP)
# - Exclusion of sites in repetitive regions or low-complexity regions

# Example command for a hypothetical custom script:
# Replace 'custom_rna_edit_filter.sh' with the actual script name.
# Replace parameters with those used in the specific custom script.
custom_rna_edit_filter.sh \
    --input ${INPUT_VCF} \
    --output ${OUTPUT_VCF} \
    --reference ${REFERENCE_GENOME} \
    --min_depth 10 \
    --min_allele_frequency 0.1 \
    --exclude_snps "/path/to/dbSNP/common_snps.vcf.gz" \
    --filter_criteria "custom_filter_settings.txt"

View on GitHub

Tools Used

PacBio sequencing

Raw Source Text

Sequencing data was processed using the Isoseq v4 pipeline with lima (parameter: --isoseq) to generate full-length non-concatemer reads and isoseq refine (parameter: --require-polya) to generate refined reads.
HEK293T APOBEC1-only and Ribo-STAMP data were aligned to hg19 reference and MDA-MB-231 Ribo-STAMP data (NT and CoCl2) were aligned to hg38 reference using pbmm2 align (parameter: --preset ISOSEQ).
QC was completed using NanoPlot (parameters: --raw and --tsv_stats).
Reads were filtered for uniquely mapped reads and read counts obtained using IsoQuant (parameters: --data_type pacbio, --transcript_quantification unique_only, and --gene_quantification unique_only)
Edits were identified and filtered using custom scripts.
Assembly: hg19, hg38
Supplementary files format and content: tab-delimited file containing edited positions, the number of reads with C-to-U edits at each positions (conversion), the total number of reads at each position, and each edit's assignment to a gene and isoform (BED)

← Back to Analysis