GSE240325 Processing Pipeline

RNA-Seq code_examples 7 steps

Publication

High-sensitivity in situ capture of endogenous RNA-protein interactions in fixed cells and primary tissues.

Nature communications (2024) — PMID 39152130

Dataset

An in situ method for identification of transcriptome-wide protein-RNA interactions in cells [in_situ_STAMP - Long-Read]"

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Demultiplex primers with lima (v2.6.0)

lima v2.6.0

$ Bash example

# Install lima (e.g., via Bioconda)
# conda install -c bioconda lima=2.6.0

# Example usage of lima for demultiplexing primers
# Replace 'input_reads.fastq' with your actual input sequencing reads file (e.g., PacBio CCS reads).
# Replace 'primers.fasta' with your FASTA file containing barcode sequences.
# Replace 'output_demultiplexed_reads' with your desired output prefix.
# The --ccs flag is commonly used if the input reads are PacBio CCS reads.
# Adjust other parameters like --min-score, --num-threads, --split-bam-by-barcode as needed based on your data and requirements.
lima input_reads.fastq primers.fasta output_demultiplexed_reads --ccs --peek-guess --min-score 20 --num-threads 8

Refine reads with Isoseq3 refine (v3.8.0)

IsoSeq v3.8.0 GitHub

$ Bash example

# Install IsoSeq3 via Bioconda
# conda create -n isoseq3_env python=3.8
# conda activate isoseq3_env
# conda install -c bioconda isoseq3=3.8.0

# Example command for refining reads with Isoseq3 (v3.8.0)
# This step typically follows 'isoseq3 cluster' and requires its output.
# Replace 'flnc_reads.bam', 'primer_sequences.fasta', 'cluster_report.csv', and 'refined_reads.bam' with actual file paths.

# Input files:
# --flnc: Full-length non-chimeric (FLNC) reads BAM file (output from 'lima' or 'isoseq3 cluster' if not using 'lima')
# --primer: FASTA file containing primer sequences used for library preparation
# --cluster-report: CSV report from 'isoseq3 cluster' containing cluster information

# Output files:
# --output: Refined FLNC reads BAM file
# --gff: GFF file containing refined transcript annotations

isoseq3 refine \
    --flnc flnc_reads.bam \
    --primer primer_sequences.fasta \
    --cluster-report cluster_report.csv \
    --output refined_reads.bam \
    --gff refined_reads.gff

View on GitHub

Align reads using pbmm2 (v1.9.0)

pbmm2 v1.9.0 GitHub

$ Bash example

# Install pbmm2 if not already installed
# conda install -c bioconda pbmm2=1.9.0

# Define input/output files and reference genome
# Replace 'reads.fastq' with your actual input reads file (e.g., .fastq, .fasta, .bam)
READS_FILE="reads.fastq"

# Replace 'reference.fasta' with the path to your reference genome file (e.g., GRCh38.fasta)
# pbmm2 can also use an .mmi index file if pre-built (e.g., reference.mmi)
REFERENCE_GENOME="reference.fasta"

# Define the output BAM file name
OUTPUT_BAM="aligned_reads.bam"

# Align reads using pbmm2
# This command performs alignment and outputs a sorted BAM file by default.
pbmm2 align "${REFERENCE_GENOME}" "${READS_FILE}" "${OUTPUT_BAM}"

View on GitHub

Cluster reads with isoseq3 cluster (v3.8.0)

IsoSeq v3.8.0 GitHub

$ Bash example

# Install isoseq3 via conda if not already installed
# conda install -c bioconda -c conda-forge pbbioconda

# Cluster reads using isoseq3 cluster
# Replace input.fasta with your actual input FASTA/FASTQ file containing polished reads
# Replace output_prefix with your desired output file prefix
isoseq3 cluster --input input.fasta --output output_prefix

View on GitHub

Filter for primary mapped reads with custom script

samtools (Inferred with models/gemini-2.5-flash) v1.10 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install samtools if not available
# conda install -c bioconda samtools

# Define input and output file names
INPUT_BAM="input.bam"
OUTPUT_BAM="primary_mapped.bam"

# This is a custom script to filter for primary mapped reads.
# It uses samtools view to exclude secondary and supplementary alignments.
# Additional filtering logic (e.g., mapping quality, read length, specific flags)
# could be added here based on specific assay requirements.

# Filter for primary mapped reads (flag -F 256 excludes secondary and supplementary alignments)
# -b: output in BAM format
samtools view -b -F 256 "${INPUT_BAM}" > "${OUTPUT_BAM}"

# Index the filtered BAM file for downstream processing
samtools index "${OUTPUT_BAM}"

View on GitHub

Identify gene editing using custom script

custom script (Inferred with models/gemini-2.5-flash) vN/A

$ Bash example

# Placeholder for a custom gene editing identification script.
# This script would typically take aligned reads (BAM/CRAM) or variant calls (VCF)
# and a reference genome to identify specific gene edits.

# Define input and output files (placeholders)
INPUT_FILE="input_variants.vcf" # Or aligned_reads.bam
REFERENCE_GENOME="/path/to/reference/hg38.fa"
OUTPUT_REPORT="gene_editing_report.txt"

# Example execution command for a hypothetical custom Python script
# Replace 'custom_gene_editing_script.py' with the actual script name
# and adjust parameters as needed for the specific custom script.
python custom_gene_editing_script.py \
  --input "${INPUT_FILE}" \
  --reference "${REFERENCE_GENOME}" \
  --output "${OUTPUT_REPORT}"

remove edits found in annotated SNP positions using custom script

filter_snps.py (Inferred with models/gemini-2.5-flash) vN/A GitHub

$ Bash example

# Install dependencies if not already available
# conda install -c bioconda pysam pybedtools

# Assuming 'input_edits.bed' is the file containing detected edits (e.g., from variant calling or RNA editing detection)
# Assuming '/path/to/human_grch38_known_snps.vcf.gz' is a VCF file of known SNPs for the human GRCh38 reference genome (e.g., from dbSNP or gnomAD)

# Execute the custom script to remove edits found in annotated SNP positions
python filter_snps.py \
    --input-bed input_edits.bed \
    --output-bed filtered_edits.bed \
    --vcf /path/to/human_grch38_known_snps.vcf.gz

View on GitHub

Raw Source Text

Demultiplex primers with lima (v2.6.0)
Refine reads with Isoseq3 refine (v3.8.0)
Align reads using pbmm2 (v1.9.0)
Cluster reads with isoseq3 cluster (v3.8.0)
Filter for primary mapped reads with custom script
Identify gene editing using custom script
remove edits found in annotated SNP positions using custom script
Assembly: hg38
Supplementary files format and content: isoforms from clustering (.gff)
Supplementary files format and content: gene edits (.bed)

← Back to Analysis