GSE277047 Processing Pipeline

OTHER code_examples 2 steps

Publication

Integrated multi-omics analysis of zinc-finger proteins uncovers roles in RNA regulation.

Molecular cell (2024) — PMID 39303722

Dataset

Integrated multi-omics analysis of zinc finger proteins uncovers roles in RNA regulation [RBNS]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Raw data provided is demultiplexed reads with adapters removed and only unique synthetic reads.

umi_tools (dedup subcommand) (Inferred with models/gemini-2.5-flash) v1.1.2 GitHub

$ Bash example

# Install umi_tools if not already installed
# conda install -c bioconda umi_tools

# This command assumes 'input.bam' is an aligned BAM file with Unique Molecular Identifiers (UMIs) 
# in the read names (e.g., after processing with umi_tools extract and alignment).
# It performs UMI-based deduplication to retain only unique synthetic reads.
# The input BAM is derived from demultiplexed, adapter-trimmed raw reads.
umi_tools dedup \
    --extract-method=string \
    --method=unique \
    --output-stats=deduplication_stats.tsv \
    --log=deduplication.log \
    --paired \
    --stdin=input.bam \
    --stdout=output_unique.bam

View on GitHub

Library strategy: RBNS

RBNS vN/A GitHub

$ Bash example

# RBNS (RNA Bind-n-Seq) is an experimental method to determine RNA-binding protein (RBP) binding specificities.
# The computational analysis of RBNS data typically involves several steps:
# 1. Alignment of sequencing reads to a defined RNA library (e.g., using Bowtie2 or BWA).
# 2. Quantification of reads for each RNA sequence and identification of enriched sequences.
# 3. Motif discovery on the identified enriched sequences to find the RBP's binding preferences.

# This code block focuses on the motif discovery step, which is a key output of RBNS analysis.
# It assumes that a FASTA file of enriched RNA sequences has already been generated from upstream processing.

# Placeholder for input and output files
# Replace with the actual FASTA file containing RNA sequences identified as enriched for RBP binding.
# This file is typically generated after aligning RBNS reads to a synthetic RNA library and quantifying enrichment.
ENRICHED_SEQUENCES_FASTA="rbns_enriched_sequences.fasta"
OUTPUT_DIR="rbns_motif_discovery_output"

mkdir -p "${OUTPUT_DIR}"

# Execute MEME for motif discovery on enriched sequences
# MEME (Multiple Em for Motif Elicitation) is a widely used tool for discovering novel, ungapped motifs.
# Parameters:
# -dna: Specifies that the input sequences are DNA (or RNA, treated as DNA for motif finding).
# -mod zoops: Specifies the ZOOPS (Zero or One Occurrence Per Sequence) model, suitable for motifs that may appear once or not at all in each sequence.
# -nmotifs 3: Search for up to 3 motifs.
# -minw 6 -maxw 15: Specifies the minimum and maximum width of motifs to find.
# -o: Output directory.
# conda install -c bioconda meme
meme "${ENRICHED_SEQUENCES_FASTA}" -o "${OUTPUT_DIR}" -dna -mod zoops -nmotifs 3 -minw 6 -maxw 15

View on GitHub

Tools Used

RBNS

Raw Source Text

Raw data provided is demultiplexed reads with adapters removed and only unique synthetic reads.
Assembly: N/A
Supplementary files format and content: Raw data provided is demultiplexed reads with adapters removed and only unique synthetic reads.
Library strategy: RBNS

← Back to Analysis