GSE248461 Processing Pipeline
RNA-Seq
code_examples
6 steps
Publication
High-sensitivity in situ capture of endogenous RNA-protein interactions in fixed cells and primary tissues.Nature communications (2024) — PMID 39152130
Dataset
GSE248461An in situ method for identification of transcriptome-wide protein-RNA interactions in cells [in_situ_STAMP II]
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
remove adapter with Cutadapt
$ Bash example
# Install Cutadapt (e.g., via conda) # conda install -c bioconda cutadapt=4.0 # Define input and output files INPUT_FASTQ="reads.fastq.gz" OUTPUT_FASTQ="reads_trimmed.fastq.gz" # Define adapter sequence (example for a common Illumina 3' adapter in eCLIP) # This adapter sequence is a placeholder. Replace with the actual adapter used in your library preparation. ADAPTER_SEQUENCE="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" # Define trimming parameters MINIMUM_LENGTH=18 # Minimum read length after trimming QUALITY_CUTOFF=20 # Quality cutoff for trimming low-quality bases from the 3' end THREADS=8 # Number of CPU threads to use # Execute Cutadapt command cutadapt \ -a "${ADAPTER_SEQUENCE}" \ -o "${OUTPUT_FASTQ}" \ -m ${MINIMUM_LENGTH} \ -q ${QUALITY_CUTOFF} \ --cores ${THREADS} \ "${INPUT_FASTQ}" -
2
align to hg38 using STAR 2.4.0 (Homo sapiens)
STAR v2.4.0$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star=2.4.0 # Define variables GENOME_DIR="/path/to/STAR_index/hg38" # Replace with the actual path to your hg38 STAR index READ1="read1.fastq.gz" # Replace with your R1 FASTQ file READ2="read2.fastq.gz" # Replace with your R2 FASTQ file (if paired-end) OUTPUT_PREFIX="aligned_reads_" # Prefix for output files (e.g., aligned_reads_Aligned.sortedByCoord.out.bam) THREADS=8 # Adjust as needed # Create STAR index if it doesn't exist (or download a pre-built one) # This step is usually done once per genome. Example command to build index (requires genome FASTA and GTF): # STAR --runMode genomeGenerate \ # --genomeDir ${GENOME_DIR} \ # --genomeFastaFiles /path/to/hg38.fa \ # --sjdbGTFfile /path/to/gencode.vXX.annotation.gtf \ # --runThreadN ${THREADS} # Align reads to hg38 using STAR STAR --genomeDir ${GENOME_DIR} \ --readFilesIn ${READ1} ${READ2} \ --runThreadN ${THREADS} \ --outFileNamePrefix ${OUTPUT_PREFIX} \ --outSAMtype BAM SortedByCoordinate \ --outSAMunmapped Within \ --outSAMattributes Standard \ --readFilesCommand zcat \ --outFilterMultimapNmax 20 \ --outFilterMismatchNmax 999 \ --outFilterMismatchNoverLmax 0.04 \ --alignIntronMin 20 \ --alignIntronMax 1000000 \ --alignMatesGapMax 1000000 \ --sjdbScore 1 \ --limitBAMsortRAM 30000000000 -
3
SAILOR analysis to call C-to-U edits and keep only sites with score >0.5 and edit fraction <80%
$ Bash example
# Install SAILOR (if not already installed) # pip install sailor # or # conda install -c bioconda sailor # Placeholder for input BAM file and reference FASTA. # Replace 'input.bam' with your actual aligned BAM file. # Replace 'reference.fasta' with the path to your reference genome FASTA file (e.g., hg38). sailor call_edits \ --bam input.bam \ --fasta reference.fasta \ --output output_edits.vcf \ --min_score 0.5 \ --max_edit_fraction 0.8 -
4
FLARE analysis to call C-to-U edit clusters
FLARE vNot explicitly versioned, often used as a script (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Clone the FLARE repository # git clone https://github.com/yeolab/FLARE.git # cd FLARE # # Create a conda environment (example, check FLARE's dependencies like pysam, pybedtools) # conda create -n flare_env python=3.8 samtools bedtools -y # conda activate flare_env # # If there are specific Python dependencies, they might be in a requirements.txt or need manual installation # pip install pysam pybedtools # Define input and output paths INPUT_BAM="aligned_reads.bam" # Replace with your input BAM file (e.g., from STAR alignment) REFERENCE_GENOME_FASTA="path/to/human_hg38.fa" # Replace with your reference genome FASTA (e.g., from UCSC, Ensembl) OUTPUT_DIR="flare_c_to_u_output" SAMPLE_NAME="sample1" # Create output directory if it doesn't exist mkdir -p "${OUTPUT_DIR}" # Run FLARE analysis to identify RNA editing sites. # FLARE is primarily designed for A-to-I editing, but its framework can detect any base changes. # The output can then be filtered for C-to-U specific edits. # Ensure the 'flare.py' script is accessible (e.g., by being in the current directory or in PATH). # Assuming FLARE repository was cloned to './FLARE' python FLARE/src/flare.py \ -i "${INPUT_BAM}" \ -r "${REFERENCE_GENOME_FASTA}" \ -o "${OUTPUT_DIR}" \ -s "${SAMPLE_NAME}" \ --min_coverage 10 \ --min_edit_freq 0.05 \ --strand_specific # Use if your library preparation is strand-specific # Add other relevant parameters as needed, e.g., --regions for specific genomic regions # After FLARE generates its output (e.g., a TSV or BED file), # you would typically post-process it to filter for C-to-U edits. # Example of a conceptual filtering step (requires tools like awk, grep, or a custom Python script): # grep -E "^chr.*\t.*\t.*\tC\tU" "${OUTPUT_DIR}/${SAMPLE_NAME}_editing_sites.tsv" > "${OUTPUT_DIR}/${SAMPLE_NAME}_c_to_u_editing_sites.tsv" -
5
Intersect the edit clusters from 3 replicates, which yields "*confident_peaks.bed"
$ Bash example
# Install dependencies and clone the repository # conda create -n merge_peaks_env python=3.8 # conda activate merge_peaks_env # pip install numpy scipy pybedtools matplotlib seaborn # git clone https://github.com/yeolab/merge_peaks.git # cd merge_peaks # Execute the peak merging script # Assuming 'rep1_edit_clusters.bed', 'rep2_edit_clusters.bed', and 'rep3_edit_clusters.bed' are the input edit cluster files from the 3 replicates. # The '-m 3' parameter ensures that only peaks present in all 3 replicates are reported, matching the 'intersect' description. python merge_peaks.py -i rep1_edit_clusters.bed rep2_edit_clusters.bed rep3_edit_clusters.bed -o confident_peaks.bed -m 3
-
6
Subtract STAMP confident clusters to Buffer only control, which yields "*cleaned_confident_peaks.bed"
$ Bash example
# Install bedtools if not already installed # conda install -c bioconda bedtools # Subtract regions present in the buffer_control_peaks.bed from the stamp_confident_clusters.bed. # This yields 'cleaned_confident_peaks.bed' which contains regions unique to the STAMP clusters. bedtools subtract -a stamp_confident_clusters.bed -b buffer_control_peaks.bed > cleaned_confident_peaks.bed
Raw Source Text
remove adapter with Cutadapt align to hg38 using STAR 2.4.0 (Homo sapiens) SAILOR analysis to call C-to-U edits and keep only sites with score >0.5 and edit fraction <80% FLARE analysis to call C-to-U edit clusters Intersect the edit clusters from 3 replicates, which yields "*confident_peaks.bed" Subtract STAMP confident clusters to Buffer only control, which yields "*cleaned_confident_peaks.bed" Assembly: hg38 Supplementary files format and content: SAILOR step yields bed file: *0.5Score0.8Fraction.fastqTr.sorted.STARUnmapped.out.sorted.STARAligned.out.sorted.bam.combined.readfiltered.formatted.varfiltered.snpfiltered.ranked.bed Supplementary files format and content: FLARE step yields .tsv file: "*merged_sorted_peaks.fdr_0.1.d_15.scored.tsv" Supplementary files format and content: Intersection step yields .bed file: "*confident_peaks.bed" Supplementary files format and content: Subtraction step yields .bed file: "*cleaned_confident_peaks.bed"