GSE277047 Processing Pipeline

OTHER code_examples 2 steps

Publication

Integrated multi-omics analysis of zinc-finger proteins uncovers roles in RNA regulation.

Molecular cell (2024) — PMID 39303722

Dataset

GSE277047

Integrated multi-omics analysis of zinc finger proteins uncovers roles in RNA regulation [RBNS]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Raw data provided is demultiplexed reads with adapters removed and only unique synthetic reads.

    umi_tools (dedup subcommand) (Inferred with models/gemini-2.5-flash) v1.1.2 GitHub
    $ Bash example
    # Install umi_tools if not already installed
    # conda install -c bioconda umi_tools
    
    # This command assumes 'input.bam' is an aligned BAM file with Unique Molecular Identifiers (UMIs) 
    # in the read names (e.g., after processing with umi_tools extract and alignment).
    # It performs UMI-based deduplication to retain only unique synthetic reads.
    # The input BAM is derived from demultiplexed, adapter-trimmed raw reads.
    umi_tools dedup \
        --extract-method=string \
        --method=unique \
        --output-stats=deduplication_stats.tsv \
        --log=deduplication.log \
        --paired \
        --stdin=input.bam \
        --stdout=output_unique.bam
  2. 2

    Library strategy: RBNS

    $ Bash example
    # RBNS (RNA Bind-n-Seq) is an experimental method to determine RNA-binding protein (RBP) binding specificities.
    # The computational analysis of RBNS data typically involves several steps:
    # 1. Alignment of sequencing reads to a defined RNA library (e.g., using Bowtie2 or BWA).
    # 2. Quantification of reads for each RNA sequence and identification of enriched sequences.
    # 3. Motif discovery on the identified enriched sequences to find the RBP's binding preferences.
    
    # This code block focuses on the motif discovery step, which is a key output of RBNS analysis.
    # It assumes that a FASTA file of enriched RNA sequences has already been generated from upstream processing.
    
    # Placeholder for input and output files
    # Replace with the actual FASTA file containing RNA sequences identified as enriched for RBP binding.
    # This file is typically generated after aligning RBNS reads to a synthetic RNA library and quantifying enrichment.
    ENRICHED_SEQUENCES_FASTA="rbns_enriched_sequences.fasta"
    OUTPUT_DIR="rbns_motif_discovery_output"
    
    mkdir -p "${OUTPUT_DIR}"
    
    # Execute MEME for motif discovery on enriched sequences
    # MEME (Multiple Em for Motif Elicitation) is a widely used tool for discovering novel, ungapped motifs.
    # Parameters:
    #   -dna: Specifies that the input sequences are DNA (or RNA, treated as DNA for motif finding).
    #   -mod zoops: Specifies the ZOOPS (Zero or One Occurrence Per Sequence) model, suitable for motifs that may appear once or not at all in each sequence.
    #   -nmotifs 3: Search for up to 3 motifs.
    #   -minw 6 -maxw 15: Specifies the minimum and maximum width of motifs to find.
    #   -o: Output directory.
    # conda install -c bioconda meme
    meme "${ENRICHED_SEQUENCES_FASTA}" -o "${OUTPUT_DIR}" -dna -mod zoops -nmotifs 3 -minw 6 -maxw 15

Tools Used

Raw Source Text
Raw data provided is demultiplexed reads with adapters removed and only unique synthetic reads.
Assembly: N/A
Supplementary files format and content: Raw data provided is demultiplexed reads with adapters removed and only unique synthetic reads.
Library strategy: RBNS
← Back to Analysis