GSE240325 Processing Pipeline
RNA-Seq
code_examples
7 steps
Publication
High-sensitivity in situ capture of endogenous RNA-protein interactions in fixed cells and primary tissues.Nature communications (2024) — PMID 39152130
Dataset
GSE240325An in situ method for identification of transcriptome-wide protein-RNA interactions in cells [in_situ_STAMP - Long-Read]"
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Demultiplex primers with lima (v2.6.0)
lima v2.6.0$ Bash example
# Install lima (e.g., via Bioconda) # conda install -c bioconda lima=2.6.0 # Example usage of lima for demultiplexing primers # Replace 'input_reads.fastq' with your actual input sequencing reads file (e.g., PacBio CCS reads). # Replace 'primers.fasta' with your FASTA file containing barcode sequences. # Replace 'output_demultiplexed_reads' with your desired output prefix. # The --ccs flag is commonly used if the input reads are PacBio CCS reads. # Adjust other parameters like --min-score, --num-threads, --split-bam-by-barcode as needed based on your data and requirements. lima input_reads.fastq primers.fasta output_demultiplexed_reads --ccs --peek-guess --min-score 20 --num-threads 8
-
2
Refine reads with Isoseq3 refine (v3.8.0)
$ Bash example
# Install IsoSeq3 via Bioconda # conda create -n isoseq3_env python=3.8 # conda activate isoseq3_env # conda install -c bioconda isoseq3=3.8.0 # Example command for refining reads with Isoseq3 (v3.8.0) # This step typically follows 'isoseq3 cluster' and requires its output. # Replace 'flnc_reads.bam', 'primer_sequences.fasta', 'cluster_report.csv', and 'refined_reads.bam' with actual file paths. # Input files: # --flnc: Full-length non-chimeric (FLNC) reads BAM file (output from 'lima' or 'isoseq3 cluster' if not using 'lima') # --primer: FASTA file containing primer sequences used for library preparation # --cluster-report: CSV report from 'isoseq3 cluster' containing cluster information # Output files: # --output: Refined FLNC reads BAM file # --gff: GFF file containing refined transcript annotations isoseq3 refine \ --flnc flnc_reads.bam \ --primer primer_sequences.fasta \ --cluster-report cluster_report.csv \ --output refined_reads.bam \ --gff refined_reads.gff -
3
Align reads using pbmm2 (v1.9.0)
$ Bash example
# Install pbmm2 if not already installed # conda install -c bioconda pbmm2=1.9.0 # Define input/output files and reference genome # Replace 'reads.fastq' with your actual input reads file (e.g., .fastq, .fasta, .bam) READS_FILE="reads.fastq" # Replace 'reference.fasta' with the path to your reference genome file (e.g., GRCh38.fasta) # pbmm2 can also use an .mmi index file if pre-built (e.g., reference.mmi) REFERENCE_GENOME="reference.fasta" # Define the output BAM file name OUTPUT_BAM="aligned_reads.bam" # Align reads using pbmm2 # This command performs alignment and outputs a sorted BAM file by default. pbmm2 align "${REFERENCE_GENOME}" "${READS_FILE}" "${OUTPUT_BAM}" -
4
Cluster reads with isoseq3 cluster (v3.8.0)
$ Bash example
# Install isoseq3 via conda if not already installed # conda install -c bioconda -c conda-forge pbbioconda # Cluster reads using isoseq3 cluster # Replace input.fasta with your actual input FASTA/FASTQ file containing polished reads # Replace output_prefix with your desired output file prefix isoseq3 cluster --input input.fasta --output output_prefix
-
5
Filter for primary mapped reads with custom script
samtools (Inferred with models/gemini-2.5-flash) v1.10 (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install samtools if not available # conda install -c bioconda samtools # Define input and output file names INPUT_BAM="input.bam" OUTPUT_BAM="primary_mapped.bam" # This is a custom script to filter for primary mapped reads. # It uses samtools view to exclude secondary and supplementary alignments. # Additional filtering logic (e.g., mapping quality, read length, specific flags) # could be added here based on specific assay requirements. # Filter for primary mapped reads (flag -F 256 excludes secondary and supplementary alignments) # -b: output in BAM format samtools view -b -F 256 "${INPUT_BAM}" > "${OUTPUT_BAM}" # Index the filtered BAM file for downstream processing samtools index "${OUTPUT_BAM}" -
6
Identify gene editing using custom script
custom script (Inferred with models/gemini-2.5-flash) vN/A$ Bash example
# Placeholder for a custom gene editing identification script. # This script would typically take aligned reads (BAM/CRAM) or variant calls (VCF) # and a reference genome to identify specific gene edits. # Define input and output files (placeholders) INPUT_FILE="input_variants.vcf" # Or aligned_reads.bam REFERENCE_GENOME="/path/to/reference/hg38.fa" OUTPUT_REPORT="gene_editing_report.txt" # Example execution command for a hypothetical custom Python script # Replace 'custom_gene_editing_script.py' with the actual script name # and adjust parameters as needed for the specific custom script. python custom_gene_editing_script.py \ --input "${INPUT_FILE}" \ --reference "${REFERENCE_GENOME}" \ --output "${OUTPUT_REPORT}" -
7
remove edits found in annotated SNP positions using custom script
$ Bash example
# Install dependencies if not already available # conda install -c bioconda pysam pybedtools # Assuming 'input_edits.bed' is the file containing detected edits (e.g., from variant calling or RNA editing detection) # Assuming '/path/to/human_grch38_known_snps.vcf.gz' is a VCF file of known SNPs for the human GRCh38 reference genome (e.g., from dbSNP or gnomAD) # Execute the custom script to remove edits found in annotated SNP positions python filter_snps.py \ --input-bed input_edits.bed \ --output-bed filtered_edits.bed \ --vcf /path/to/human_grch38_known_snps.vcf.gz
Raw Source Text
Demultiplex primers with lima (v2.6.0) Refine reads with Isoseq3 refine (v3.8.0) Align reads using pbmm2 (v1.9.0) Cluster reads with isoseq3 cluster (v3.8.0) Filter for primary mapped reads with custom script Identify gene editing using custom script remove edits found in annotated SNP positions using custom script Assembly: hg38 Supplementary files format and content: isoforms from clustering (.gff) Supplementary files format and content: gene edits (.bed)