GSE240326 Processing Pipeline
GSE
code_examples
6 steps
Publication
High-sensitivity in situ capture of endogenous RNA-protein interactions in fixed cells and primary tissues.Nature communications (2024) — PMID 39152130
Dataset
GSE240326An in situ method for identification of transcriptome-wide protein-RNA interactions in cells
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
remove adapter with Cutadapt
$ Bash example
# Install Cutadapt (if not already installed) # conda install -c bioconda cutadapt=4.1 # Define input and output files # Replace with your actual input FASTQ files INPUT_R1="path/to/your/input_read1.fastq.gz" INPUT_R2="path/to/your/input_read2.fastq.gz" # For paired-end reads. Remove if single-end. # Replace with your desired output FASTQ files OUTPUT_R1_TRIMMED="path/to/your/output_read1_trimmed.fastq.gz" OUTPUT_R2_TRIMMED="path/to/your/output_read2_trimmed.fastq.gz" # For paired-end reads. Remove if single-end. # Define a report file for Cutadapt's summary REPORT_FILE="cutadapt_trimming_report.txt" # Define adapter sequences # These are common Illumina TruSeq adapters. You MUST replace these with the actual adapter sequences # used in your library preparation. If you don't know them, you might need to auto-detect or consult # your sequencing provider/library prep kit documentation. # For single-end reads, typically only -a ADAPTER_R1 is needed. # For paired-end reads, -a ADAPTER_R1 for read 1 and -A ADAPTER_R2 for read 2. ADAPTER_R1="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" # Example: Illumina universal adapter ADAPTER_R2="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT" # Example: Illumina index adapter (reverse complement of universal adapter) # Run Cutadapt for paired-end reads # Adjust parameters like --minimum-length, --quality-cutoff, --cores as needed. # If processing single-end reads, remove -A, -p, and INPUT_R2. cutadapt \ -a "${ADAPTER_R1}" \ -A "${ADAPTER_R2}" \ -o "${OUTPUT_R1_TRIMMED}" \ -p "${OUTPUT_R2_TRIMMED}" \ --minimum-length 18 \ --quality-cutoff 20 \ --cores 8 \ "${INPUT_R1}" "${INPUT_R2}" > "${REPORT_FILE}" 2>&1 # For single-end reads, the command would look like this: # cutadapt \ # -a "${ADAPTER_R1}" \ # -o "${OUTPUT_R1_TRIMMED}" \ # --minimum-length 18 \ # --quality-cutoff 20 \ # --cores 8 \ # "${INPUT_R1}" > "${REPORT_FILE}" 2>&1 -
2
align to hg38 using STAR 2.4.0 (Homo sapiens) or mm10 using STAR 2.5.2 (Mus musculus)
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star=2.4.0 # Define input and output variables # Replace with actual paths and filenames READ1="input_R1.fastq.gz" READ2="input_R2.fastq.gz" # Remove if single-end OUTPUT_PREFIX="aligned_output" NUM_THREADS=8 # Adjust as needed # Define genome index paths # Replace with actual paths to your STAR indices # For Homo sapiens (hg38) HG38_STAR_INDEX="/path/to/STAR_index/hg38" # For Mus musculus (mm10) MM10_STAR_INDEX="/path/to/STAR_index/mm10" # --- Choose the appropriate genome index based on species --- # For Homo sapiens (hg38): GENOME_DIR="${HG38_STAR_INDEX}" # For Mus musculus (mm10): # GENOME_DIR="${MM10_STAR_INDEX}" # Run STAR alignment STAR --genomeDir "${GENOME_DIR}" \ --readFilesIn "${READ1}" "${READ2}" \ --runThreadN "${NUM_THREADS}" \ --outFileNamePrefix "${OUTPUT_PREFIX}_" \ --outSAMtype BAM SortedByCoordinate \ --outSAMstrandField intronMotif \ --outFilterMultimapNmax 20 \ --alignSJDBoverhangMin 1 \ --alignSJoverhangMin 8 \ --alignIntronMin 20 \ --alignIntronMax 1000000 \ --alignMatesGapMax 1000000 \ --outReadsUnmapped Fastx \ --quantMode GeneCounts # Optional: if gene counts are desired, otherwise remove -
3
SAILOR analysis to call C-to-U edits and keep only sites with score >0.5 and edit fraction <80%
SAILOR v0.1.0$ Bash example
# Install SAILOR (e.g., via conda) # conda create -n sailor_env python=3.8 # conda activate sailor_env # conda install -c bioconda sailor=0.1.0 # Define input and output files # Replace 'aligned_reads.bam' with your actual input BAM file containing aligned RNA-seq reads. INPUT_BAM="aligned_reads.bam" # Replace with the path to your reference genome FASTA file (e.g., GRCh38). REFERENCE_FASTA="path/to/human_genome/GRCh38.p13.genome.fa" # Replace with the path to a VCF file of known SNPs for the reference genome (e.g., dbSNP for GRCh38). KNOWN_SNPS_VCF="path/to/known_snps/dbSNP_153_GRCh38.vcf.gz" # Define the output file for the filtered C-to-U editing sites. OUTPUT_TSV="c_to_u_edits_filtered.tsv" # Run SAILOR to call C-to-U edits and apply filtering criteria. # --min-score 0.5: Filters for sites with an editing score greater than 0.5. # --max-edit-fraction 0.8: Filters for sites where the edit fraction is less than 80% (0.8). # --fasta: Specifies the reference genome FASTA file. # --vcf: Specifies a VCF file of known SNPs to exclude from editing calls. sailor call \ --min-score 0.5 \ --max-edit-fraction 0.8 \ --fasta "${REFERENCE_FASTA}" \ --vcf "${KNOWN_SNPS_VCF}" \ "${INPUT_BAM}" \ > "${OUTPUT_TSV}" -
4
FLARE analysis to call C-to-U edit clusters
$ Bash example
# Clone the FLARE repository # git clone https://github.com/yeolab/FLARE.git # cd FLARE # Install dependencies (if not already installed) # pip install -r requirements.txt # Example usage of FLARE to call C-to-U edit clusters # Replace <input_bam>, <output_directory>, <reference_fasta>, and <gene_annotation> with actual paths. # Reference datasets: GRCh38 is used as a placeholder for human genome. # Gene annotation: A GTF file for GRCh38 is used as a placeholder. # Define placeholder variables INPUT_BAM="path/to/your/aligned.bam" OUTPUT_DIR="flare_c_to_u_edits" REFERENCE_FASTA="path/to/GRCh38.fa" # e.g., from Gencode or Ensembl GENE_ANNOTATION="path/to/GRCh38.gtf" # e.g., from Gencode or Ensembl # Create output directory if it doesn't exist mkdir -p "${OUTPUT_DIR}" # Execute FLARE analysis # The description implies calling C-to-U edits, which is the default behavior of FLARE. # Common parameters might include: # --min_reads 5 (minimum reads supporting an edit) # --min_edit_frac 0.1 (minimum fraction of reads supporting an edit) # --min_coverage 10 (minimum coverage at a site) # --min_base_qual 20 (minimum base quality) # --min_map_qual 20 (minimum mapping quality) # --blacklist (path to a blacklist BED file) # --known_edits (path to a VCF of known edits for filtering) python FLARE.py \ -i "${INPUT_BAM}" \ -o "${OUTPUT_DIR}" \ -r "${REFERENCE_FASTA}" \ -g "${GENE_ANNOTATION}" -
5
Intersect the edit clusters from 3 replicates, which yields "*confident_peaks.bed"
$ Bash example
# Install bedtools if not already installed, as it's a common dependency for intersection operations within pipelines like merge_peaks. # conda install -c bioconda bedtools # Assume input edit cluster BED files are: # replicate1_edit_clusters.bed # replicate2_edit_clusters.bed # replicate3_edit_clusters.bed # Intersect the edit clusters from the first two replicates bedtools intersect -a replicate1_edit_clusters.bed -b replicate2_edit_clusters.bed > temp_intersect_1_2.bed # Intersect the result with the third replicate to find regions common to all three bedtools intersect -a temp_intersect_1_2.bed -b replicate3_edit_clusters.bed > confident_peaks.bed # Clean up temporary file rm temp_intersect_1_2.bed
-
6
Subtract STAMP confident clusters to Buffer only control, which yields "*cleaned_confident_peaks.bed"
bedtools (Inferred with models/gemini-2.5-flash) vv2.30.0 (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install bedtools (if not already installed) # conda install -c bioconda bedtools # Subtract Buffer only control regions from STAMP confident clusters # This yields regions that are present in STAMP clusters but not in the control. # Assuming 'stamp_confident_clusters.bed' contains the STAMP confident clusters # and 'buffer_only_control.bed' contains the Buffer only control regions. bedtools subtract -a stamp_confident_clusters.bed -b buffer_only_control.bed > cleaned_confident_peaks.bed
Raw Source Text
remove adapter with Cutadapt align to hg38 using STAR 2.4.0 (Homo sapiens) or mm10 using STAR 2.5.2 (Mus musculus) SAILOR analysis to call C-to-U edits and keep only sites with score >0.5 and edit fraction <80% FLARE analysis to call C-to-U edit clusters Intersect the edit clusters from 3 replicates, which yields "*confident_peaks.bed" Subtract STAMP confident clusters to Buffer only control, which yields "*cleaned_confident_peaks.bed" Assembly: hg38 Assembly: mm10 Supplementary files format and content: SAILOR step yields bed file: *0.5Score0.8Fraction.fastqTr.sorted.STARUnmapped.out.sorted.STARAligned.out.sorted.bam.combined.readfiltered.formatted.varfiltered.snpfiltered.ranked.bed Supplementary files format and content: FLARE step yields .tsv file: "*merged_sorted_peaks.fdr_0.1.d_15.scored.tsv" Supplementary files format and content: Intersection step yields .bed file: "*confident_peaks.bed" Supplementary files format and content: Subtraction step yields .bed file: "*cleaned_confident_peaks.bed"