GSE263371 Processing Pipeline
OTHER
code_examples
6 steps
Publication
High-sensitivity in situ capture of endogenous RNA-protein interactions in fixed cells and primary tissues.Nature communications (2024) — PMID 39152130
Dataset
GSE263371An in situ method for identification of transcriptome-wide protein-RNA interactions in cells [isSTAMP]
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
remove adapter with Cutadapt
$ Bash example
# Install Cutadapt (if not already installed) # conda install -c bioconda cutadapt # Define input and output file paths INPUT_R1="input_R1.fastq.gz" # Placeholder for your forward read input file INPUT_R2="input_R2.fastq.gz" # Placeholder for your reverse read input file (if paired-end) OUTPUT_R1="output_R1_trimmed.fastq.gz" # Placeholder for your forward read output file OUTPUT_R2="output_R2_trimmed.fastq.gz" # Placeholder for your reverse read output file (if paired-end) # Define adapter sequences. Replace with actual adapter sequences used in your library preparation. # Common Illumina adapters (partial sequences often sufficient): # ADAPTER_FWD="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" # ADAPTER_REV="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT" # If you don't know the exact adapter, Cutadapt can sometimes auto-detect common Illumina adapters # or you can provide a shorter, conserved part of the adapter. # For example, a generic Illumina adapter sequence for 3' trimming: ADAPTER_FWD="AGATCGGAAGAGC" ADAPTER_REV="AGATCGGAAGAGC" # Run Cutadapt to remove adapter sequences, perform quality trimming, and filter by length. # -a ADAPTER_FWD: 3' adapter for forward reads # -A ADAPTER_REV: 3' adapter for reverse reads (for paired-end data) # -o: Output file for forward reads # -p: Output file for reverse reads (for paired-end data) # -q 20,20: Trim low-quality bases from 5' and 3' ends (quality cutoff 20) # --minimum-length 25: Discard reads shorter than 25 bp after trimming cutadapt -a ${ADAPTER_FWD} -A ${ADAPTER_REV} \ -o ${OUTPUT_R1} -p ${OUTPUT_R2} \ -q 20,20 --minimum-length 25 \ ${INPUT_R1} ${INPUT_R2} # For single-end reads, the command would be simpler: # cutadapt -a ${ADAPTER_FWD} -o ${OUTPUT_R1} -q 20 --minimum-length 25 ${INPUT_R1} -
2
align to hg38 using STAR 2.4.0 (Homo sapiens) or mm10 using STAR 2.5.2 (Mus musculus)
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star # --- Setup Reference Genome (hg38 example) --- # Replace with the actual path to your STAR genome index directory for hg38. # This directory should contain files like Genome, SA, SAindex, etc. # If you need to build the index, use a command like: # STAR --runMode genomeGenerate --genomeDir /path/to/hg38_star_index --genomeFastaFiles /path/to/hg38.fa --sjdbGTFfile /path/to/gencode.vXX.annotation.gtf --runThreadN <num_threads> GENOME_DIR="/path/to/star_index/hg38" # --- Input Files --- # Replace with your actual input FASTQ files READ1="input_R1.fastq.gz" READ2="input_R2.fastq.gz" # For paired-end reads. If single-end, remove READ2 and adjust --readFilesIn # --- Output Prefix --- OUTPUT_PREFIX="aligned_sample" # --- Alignment Command --- # This command aligns reads to hg38 using STAR 2.4.0 # For Mus musculus (mm10) alignment, you would typically use STAR 2.5.2 or later # and point to an mm10 genome index. STAR \ --genomeDir "${GENOME_DIR}" \ --readFilesIn "${READ1}" "${READ2}" \ --readFilesCommand zcat \ --outFileNamePrefix "${OUTPUT_PREFIX}_" \ --outSAMtype BAM SortedByCoordinate \ --outSAMunmapped Within \ --outSAMattributes Standard \ --outFilterType BySJout \ --outFilterMultimapNmax 20 \ --outFilterMismatchNmax 999 \ --outFilterMismatchNoverLmax 0.04 \ --alignIntronMin 20 \ --alignIntronMax 1000000 \ --alignMatesGapMax 1000000 \ --alignSJoverhangMin 8 \ --alignSJDBoverhangMin 1 \ --sjdbScore 1 \ --runThreadN 8 # Adjust number of threads as needed # Rename the output BAM file for clarity mv "${OUTPUT_PREFIX}_Aligned.sortedByCoord.out.bam" "${OUTPUT_PREFIX}.bam" # Index the BAM file samtools index "${OUTPUT_PREFIX}.bam" -
3
SAILOR analysis to call C-to-U edits and keep only sites with score >0.5 and edit fraction <80%
$ Bash example
# Install SAILOR (if not already installed) # git clone https://github.com/yeolab/SAILOR.git # cd SAILOR # pip install -r requirements.txt # # Ensure SAILOR.py is in your PATH or call it directly # # For example, if you are in the SAILOR directory: # # python SAILOR.py ... # Placeholder variables for input and output files INPUT_BAM="input.bam" REFERENCE_FASTA="reference.fasta" # e.g., hg38.fa OUTPUT_VCF="output_c_to_u_edits.vcf" # Run SAILOR to call C-to-U edits with specified filters # The default --min_score is 0.5 and --max_edit_fraction is 0.8, # so explicitly setting them here for clarity based on the description. python SAILOR.py \ --bam "$INPUT_BAM" \ --ref "$REFERENCE_FASTA" \ --output "$OUTPUT_VCF" \ --min_score 0.5 \ --max_edit_fraction 0.8 \ --edit_type C_to_U -
4
FLARE analysis to call C-to-U edit clusters
FLARE vNot specified (Inferred with models/gemini-2.5-flash)$ Bash example
# Clone FLARE repository # git clone https://github.com/yeolab/FLARE.git # cd FLARE # Install dependencies (if not already installed in environment) # pip install pysam numpy # Define variables INPUT_BAM="input.bam" # Replace with your input BAM file, typically aligned RNA-seq or eCLIP data REFERENCE_GENOME="path/to/GRCh38.fa" # Placeholder for human hg38 reference genome (e.g., from UCSC or Ensembl) KNOWN_SNPS_VCF="path/to/dbSNP_GRCh38.vcf.gz" # Placeholder for known SNPs VCF for GRCh38 (e.g., from NCBI dbSNP) OUTPUT_PREFIX="flare_output" THREADS=8 # Number of threads # Create output directory if it doesn't exist mkdir -p "${OUTPUT_PREFIX}_results" # Run FLARE analysis to call C-to-U edit clusters python FLARE.py \ -i "${INPUT_BAM}" \ -g "${REFERENCE_GENOME}" \ -s "${KNOWN_SNPS_VCF}" \ -o "${OUTPUT_PREFIX}_results/${OUTPUT_PREFIX}" \ -t "${THREADS}" \ --min_coverage 10 \ --min_base_quality 20 \ --min_mapping_quality 20 \ --min_edit_ratio 0.1 # Example parameters for C-to-U editing, adjust as needed -
5
Intersect the edit clusters from 3 replicates, which yields "*confident_peaks.bed"
$ Bash example
# Clone the merge_peaks repository if not already available # git clone https://github.com/yeolab/merge_peaks.git # cd merge_peaks # Assuming the merge_peaks.py script is accessible in the current directory or PATH python merge_peaks.py -i replicate1_edit_clusters.bed replicate2_edit_clusters.bed replicate3_edit_clusters.bed -o confident_peaks
-
6
Subtract STAMP confident clusters to Buffer only control, which yields "*cleaned_confident_peaks.bed"
$ Bash example
# Install bedtools if not already available # conda install -c bioconda bedtools # Subtract regions in 'buffer_only_control.bed' from 'stamp_confident_clusters.bed' # The output 'cleaned_confident_peaks.bed' will contain regions from the STAMP clusters # that do not overlap with the control regions. bedtools subtract -a stamp_confident_clusters.bed -b buffer_only_control.bed > cleaned_confident_peaks.bed
Raw Source Text
remove adapter with Cutadapt align to hg38 using STAR 2.4.0 (Homo sapiens) or mm10 using STAR 2.5.2 (Mus musculus) SAILOR analysis to call C-to-U edits and keep only sites with score >0.5 and edit fraction <80% FLARE analysis to call C-to-U edit clusters Intersect the edit clusters from 3 replicates, which yields "*confident_peaks.bed" Subtract STAMP confident clusters to Buffer only control, which yields "*cleaned_confident_peaks.bed" Assembly: hg38/mm10 Supplementary files format and content: SAILOR step yields bed file: *0.5Score0.8Fraction.fastqTr.sorted.STARUnmapped.out.sorted.STARAligned.out.sorted.bam.combined.readfiltered.formatted.varfiltered.snpfiltered.ranked.bed Supplementary files format and content: FLARE step yields .tsv file: "*merged_sorted_peaks.fdr_0.1.d_15.scored.tsv" Supplementary files format and content: Intersection step yields .bed file: "*confident_peaks.bed" Supplementary files format and content: Subtraction step yields .bed file: "*cleaned_confident_peaks.bed"