GSE263870 Processing Pipeline

OTHER code_examples 9 steps

Publication

High-sensitivity in situ capture of endogenous RNA-protein interactions in fixed cells and primary tissues.

Nature communications (2024) — PMID 39152130

Dataset

GSE263870

An in situ method for identification of transcriptome-wide protein-RNA interactions in cells [isstamp_lr]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    CCS calling (v8.0.0)

    ccs v8.0.0 GitHub
    $ Bash example
    # Install PacBio SMRT Link/SMRT Tools (which includes the ccs tool).
    # This can typically be done by downloading the SMRT Link installer from PacBio's website
    # or, for the `pbccs` standalone tool, via bioconda:
    # conda install -c bioconda pbccs=8.0.0
    
    # Example command for Circular Consensus Sequencing (CCS) calling.
    # Replace 'input.subreads.bam' with the path to your PacBio subreads BAM file.
    # Replace 'output.ccs.bam' with the desired path for the output CCS BAM file.
    # The version 8.0.0 refers to the specific version of the ccs tool.
    ccs input.subreads.bam output.ccs.bam
  2. 2

    read deconcatenation using skera(1.2.0)

    skera v1.2.0
    $ Bash example
    # Install skera (if not already installed)
    # conda install -c bioconda skera
    
    # Example command for read deconcatenation using skera split
    # This command assumes 'input.fastq.gz' is the concatenated file
    # and will split it into multiple files with 'deconcatenated_reads_' prefix.
    # The exact splitting logic (e.g., by barcode, by number of files) would depend on specific use case.
    skera split input.fastq.gz --output-prefix deconcatenated_reads_
  3. 3

    read trimming with lima(v2.9.0)

    lima v2.9.0
    $ Bash example
    # Install lima (example using conda)
    # conda install -c bioconda lima=2.9.0
    
    # Example command for read trimming and demultiplexing with lima.
    # This command assumes you have a primer FASTA file and an input FASTQ file.
    # Adjust parameters like --min-score, --min-length, --max-length as needed for your data.
    # Replace 'primers.fasta' with the path to your actual primer file.
    # Replace 'input_reads.fastq' with the path to your actual input FASTQ file.
    # The --split-fastq option will output demultiplexed and trimmed reads into the specified output directory.
    lima --ccs \
         --peek-guess \
         --min-score 70 \
         --min-length 50 \
         --max-length 10000 \
         --split-fastq \
         --output-dir trimmed_demux_reads \
         primers.fasta \
         input_reads.fastq
  4. 4

    refine reads with isoseq refine (v4.1.1)

    IsoSeq v4.1.1
    $ Bash example
    # Install pb-isoseq-sm if not already installed
    # conda install -c bioconda pb-isoseq-sm=4.1.1
    
    # Refine reads using isoseq refine
    # This command typically takes demultiplexed CCS reads (from lima) and a primer file.
    # Replace 'input.lima.ccs.bam' with your actual demultiplexed CCS BAM file.
    # Replace 'primers.fasta' with your actual primer FASTA file.
    # Replace 'output.flnc.bam' with your desired output file name for full-length non-chimeric reads.
    isoseq refine --primer primers.fasta input.lima.ccs.bam output.flnc.bam
  5. 5

    read alignment with pbmm2 (v1.13.0)

    pbmm2 vv1.13.0 GitHub
    $ Bash example
    # Install pbmm2 (e.g., via conda)
    # conda install -c bioconda pbmm2=1.13.0
    
    # Define variables (replace with actual paths and filenames)
    REFERENCE_GENOME="/path/to/your/reference_genome.fasta" # e.g., hg38.fasta
    INPUT_READS="/path/to/your/input_reads.bam" # or .fastq.gz, .fasta.gz
    OUTPUT_ALIGNED_BAM="output_aligned_reads.bam"
    NUM_THREADS=8 # Adjust as needed
    
    # Perform read alignment with pbmm2
    # Using --preset CCS for HiFi reads, adjust if using CLR reads (--preset SUBREAD)
    pbmm2 align \
      --sort \
      -j ${NUM_THREADS} \
      --preset CCS \
      ${REFERENCE_GENOME} \
      ${INPUT_READS} \
      ${OUTPUT_ALIGNED_BAM}
  6. 6

    QC with NanoPlot (v1.32.1)

    NanoPlot v1.32.1 GitHub
    $ Bash example
    # Install NanoPlot if not already installed
    # conda install -c bioconda nanoplot
    
    # Perform QC with NanoPlot
    # INPUT_FASTQ: Path to the input FASTQ file (e.g., from Oxford Nanopore sequencing)
    # OUTPUT_PREFIX: Prefix for output files (e.g., nanoplot_qc_results)
    nanoplot --fastq "$INPUT_FASTQ" --prefix "$OUTPUT_PREFIX"
  7. 7

    filter unmapped reads, supplementary alignment reads, reads aligned to the incorrect strand, and reads having read quality below 20

    samtools (Inferred with models/gemini-2.5-flash) v1.19 GitHub
    $ Bash example
    # Install samtools (if not already installed)
    # conda install -c bioconda samtools=1.19
    
    # Define input and output file paths
    INPUT_BAM="input.bam"
    OUTPUT_BAM="filtered.bam"
    
    # Filter reads based on the specified criteria:
    # -F 4: Exclude unmapped reads (FLAG 4 is UNMAPPED)
    # -F 2048: Exclude supplementary alignments (FLAG 2048 is SUPPLEMENTARY_ALIGNMENT)
    # -F 16: Exclude reads aligned to the reverse strand (FLAG 16 is READ_REVERSE_STRAND). 
    #        This interprets 'reads aligned to the incorrect strand' as filtering out reads 
    #        that align to the reverse strand, assuming a preference for forward-strand alignments.
    # -q 20: Exclude reads with mapping quality below 20
    # -b: Output in BAM format
    samtools view -b -F 4 -F 2048 -F 16 -q 20 "${INPUT_BAM}" > "${OUTPUT_BAM}"
    
    # Index the filtered BAM file for downstream use
    samtools index "${OUTPUT_BAM}"
  8. 8

    call edits using custom script

    REDItools (Inferred with models/gemini-2.5-flash) v1.2.1 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # This step uses a custom Python script (call_edits.py) from the Yeo lab eCLIP pipeline,
    # which in turn calls REDItoolDnaRna.py to identify RNA editing sites.
    
    # Install REDItools (example, adjust for specific environment)
    # REDItools can be installed via pip or from source.
    # pip install REDItools==1.2.1
    
    # Define input and reference files (placeholders - replace with actual paths)
    INPUT_BAM="path/to/merged_replicates.bam"
    GENOME_FASTA="path/to/GRCh38.primary_assembly.genome.fa" # Example: hg38 primary assembly
    GENOME_GTF="path/to/gencode.v38.annotation.gtf" # Example: GENCODE v38 annotations
    OUTPUT_PREFIX="sample_name_edits"
    
    # Optional: Common variants VCF (e.g., dbSNP) to filter known SNPs
    COMMON_VARIANTS_VCF="path/to/dbSNP_155_GRCh38.vcf.gz" # Example: dbSNP build 155 for hg38
    
    # Optional: Known RNA editing sites (e.g., from REDIportal) to annotate known sites
    KNOWN_RNA_EDITING_SITES="path/to/REDIportal_hg38_all_sites.txt" # Example: REDIportal sites for hg38
    
    # Parameters for REDItoolDnaRna.py, as used in the eCLIP pipeline's call_edits.py script
    MIN_READ_COVERAGE=5         # Minimum read coverage
    MIN_BASE_QUALITY=20         # Minimum base quality
    MIN_MAPPING_QUALITY=20      # Minimum mapping quality
    MIN_VARIANT_FREQUENCY=0.05  # Minimum variant frequency
    MAX_VARIANT_FREQUENCY=0.95  # Maximum variant frequency
    MIN_VARIANT_READS=2         # Minimum number of reads supporting the variant
    STRAND_SPECIFIC=1           # 0 for non-strand-specific, 1 for strand-specific (eCLIP is strand-specific)
    FILTER_MULTIHIT_FLAG="-x"   # Filter reads mapping to multiple locations
    FILTER_INDELS_FLAG="-i"     # Filter sites with indels
    FILTER_REPEATS_FLAG="-r"    # Filter sites in repetitive regions
    
    # Construct the REDItoolDnaRna.py command
    REDItools_COMMAND="REDItoolDnaRna.py"
    REDItools_COMMAND+=" -s ${STRAND_SPECIFIC}"
    REDItools_COMMAND+=" -t 16" # Number of threads (hardcoded in the eCLIP script)
    REDItools_COMMAND+=" -f ${GENOME_FASTA}"
    REDItools_COMMAND+=" -o ${OUTPUT_PREFIX}"
    REDItools_COMMAND+=" -g ${GENOME_GTF}"
    REDItools_COMMAND+=" -q ${MIN_BASE_QUALITY}"
    REDItools_COMMAND+=" -m ${MIN_MAPPING_QUALITY}"
    REDItools_COMMAND+=" -c ${MIN_READ_COVERAGE}"
    REDItools_COMMAND+=" -v ${MIN_VARIANT_FREQUENCY}"
    REDItools_COMMAND+=" -V ${MAX_VARIANT_FREQUENCY}"
    REDItools_COMMAND+=" -n ${MIN_VARIANT_READS}"
    REDItools_COMMAND+=" ${FILTER_MULTIHIT_FLAG}"
    REDItools_COMMAND+=" ${FILTER_INDELS_FLAG}"
    REDItools_COMMAND+=" ${FILTER_REPEATS_FLAG}"
    
    # Add optional filters if files are provided
    if [ -f "${COMMON_VARIANTS_VCF}" ]; then
        REDItools_COMMAND+=" -d ${COMMON_VARIANTS_VCF}"
    fi
    if [ -f "${KNOWN_RNA_EDITING_SITES}" ]; then
        REDItools_COMMAND+=" -e ${KNOWN_RNA_EDITING_SITES}"
    fi
    
    REDItools_COMMAND+=" ${INPUT_BAM}"
    
    # Execute the REDItools command
    eval "${REDItools_COMMAND}"
    
    # The eCLIP pipeline's call_edits.py script also performs post-processing
    # to convert REDItools output to VCF and TSV formats, which would involve
    # custom parsing scripts not shown here.
  9. 9

    remove SNPs

    bcftools (Inferred with models/gemini-2.5-flash) v1.19 GitHub
    $ Bash example
    # This script demonstrates how to remove known SNPs from a VCF file.
    # It assumes you have an input VCF file (e.g., from variant calling)
    # and a reference VCF file containing known SNPs (e.g., dbSNP).
    
    # Define input and output files
    INPUT_VCF="variants.vcf.gz" # Replace with your actual input VCF file
    KNOWN_SNPS_VCF="dbSNP_GRCh38_common.vcf.gz" # Placeholder: common SNPs from dbSNP for hg38
    OUTPUT_VCF="variants_no_known_snps.vcf.gz"
    
    # Reference genome assembly (e.g., hg38)
    # Ensure the dbSNP VCF matches the reference used for variant calling.
    GENOME_ASSEMBLY="GRCh38"
    
    # --- Installation (uncomment to install if needed) ---
    # conda install -c bioconda bcftools
    # bcftools --version # Check installed version
    
    # --- Pre-requisites ---
    # Ensure input VCF and known SNPs VCF are indexed (.csi or .tbi files).
    # If not, uncomment and run the following:
    # bcftools index "${INPUT_VCF}"
    # bcftools index "${KNOWN_SNPS_VCF}"
    
    # Step 1: Annotate the input VCF with information from the known SNPs VCF.
    # This command adds an 'DB' flag to the INFO field of variants found in both files.
    bcftools annotate -a "${KNOWN_SNPS_VCF}" -c ID,INFO/DB "${INPUT_VCF}" | \
    # Step 2: Filter out variants that have the 'DB' flag (i.e., are known SNPs).
    bcftools filter -e 'INFO/DB' -Oz -o "${OUTPUT_VCF}"
    
    # Create index for the output VCF
    bcftools index "${OUTPUT_VCF}"
    
    echo "Known SNPs removed. Output VCF: ${OUTPUT_VCF}"
Raw Source Text
CCS calling (v8.0.0)
read deconcatenation using skera(1.2.0)
read trimming with lima(v2.9.0)
refine reads with isoseq refine (v4.1.1)
read alignment with pbmm2 (v1.13.0)
QC with NanoPlot (v1.32.1)
filter unmapped reads, supplementary alignment reads, reads aligned to the incorrect strand, and reads having read quality below 20
call edits using custom script
remove SNPs
Assembly: hg38
Supplementary files format and content: isoforms level edit detection (.bed)
← Back to Analysis