GSE277047 Processing Pipeline
OTHER
code_examples
2 steps
Publication
Integrated multi-omics analysis of zinc-finger proteins uncovers roles in RNA regulation.Molecular cell (2024) — PMID 39303722
Dataset
GSE277047Integrated multi-omics analysis of zinc finger proteins uncovers roles in RNA regulation [RBNS]
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Raw data provided is demultiplexed reads with adapters removed and only unique synthetic reads.
$ Bash example
# Install umi_tools if not already installed # conda install -c bioconda umi_tools # This command assumes 'input.bam' is an aligned BAM file with Unique Molecular Identifiers (UMIs) # in the read names (e.g., after processing with umi_tools extract and alignment). # It performs UMI-based deduplication to retain only unique synthetic reads. # The input BAM is derived from demultiplexed, adapter-trimmed raw reads. umi_tools dedup \ --extract-method=string \ --method=unique \ --output-stats=deduplication_stats.tsv \ --log=deduplication.log \ --paired \ --stdin=input.bam \ --stdout=output_unique.bam -
2
Library strategy: RBNS
$ Bash example
# RBNS (RNA Bind-n-Seq) is an experimental method to determine RNA-binding protein (RBP) binding specificities. # The computational analysis of RBNS data typically involves several steps: # 1. Alignment of sequencing reads to a defined RNA library (e.g., using Bowtie2 or BWA). # 2. Quantification of reads for each RNA sequence and identification of enriched sequences. # 3. Motif discovery on the identified enriched sequences to find the RBP's binding preferences. # This code block focuses on the motif discovery step, which is a key output of RBNS analysis. # It assumes that a FASTA file of enriched RNA sequences has already been generated from upstream processing. # Placeholder for input and output files # Replace with the actual FASTA file containing RNA sequences identified as enriched for RBP binding. # This file is typically generated after aligning RBNS reads to a synthetic RNA library and quantifying enrichment. ENRICHED_SEQUENCES_FASTA="rbns_enriched_sequences.fasta" OUTPUT_DIR="rbns_motif_discovery_output" mkdir -p "${OUTPUT_DIR}" # Execute MEME for motif discovery on enriched sequences # MEME (Multiple Em for Motif Elicitation) is a widely used tool for discovering novel, ungapped motifs. # Parameters: # -dna: Specifies that the input sequences are DNA (or RNA, treated as DNA for motif finding). # -mod zoops: Specifies the ZOOPS (Zero or One Occurrence Per Sequence) model, suitable for motifs that may appear once or not at all in each sequence. # -nmotifs 3: Search for up to 3 motifs. # -minw 6 -maxw 15: Specifies the minimum and maximum width of motifs to find. # -o: Output directory. # conda install -c bioconda meme meme "${ENRICHED_SEQUENCES_FASTA}" -o "${OUTPUT_DIR}" -dna -mod zoops -nmotifs 3 -minw 6 -maxw 15
Tools Used
Raw Source Text
Raw data provided is demultiplexed reads with adapters removed and only unique synthetic reads. Assembly: N/A Supplementary files format and content: Raw data provided is demultiplexed reads with adapters removed and only unique synthetic reads. Library strategy: RBNS