GSE263371 Processing Pipeline

OTHER code_examples 6 steps

Publication

High-sensitivity in situ capture of endogenous RNA-protein interactions in fixed cells and primary tissues.

Nature communications (2024) — PMID 39152130

Dataset

GSE263371

An in situ method for identification of transcriptome-wide protein-RNA interactions in cells [isSTAMP]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    remove adapter with Cutadapt

    cutadapt v4.4 GitHub
    $ Bash example
    # Install Cutadapt (if not already installed)
    # conda install -c bioconda cutadapt
    
    # Define input and output file paths
    INPUT_R1="input_R1.fastq.gz" # Placeholder for your forward read input file
    INPUT_R2="input_R2.fastq.gz" # Placeholder for your reverse read input file (if paired-end)
    OUTPUT_R1="output_R1_trimmed.fastq.gz" # Placeholder for your forward read output file
    OUTPUT_R2="output_R2_trimmed.fastq.gz" # Placeholder for your reverse read output file (if paired-end)
    
    # Define adapter sequences. Replace with actual adapter sequences used in your library preparation.
    # Common Illumina adapters (partial sequences often sufficient):
    # ADAPTER_FWD="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA"
    # ADAPTER_REV="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT"
    # If you don't know the exact adapter, Cutadapt can sometimes auto-detect common Illumina adapters
    # or you can provide a shorter, conserved part of the adapter.
    # For example, a generic Illumina adapter sequence for 3' trimming:
    ADAPTER_FWD="AGATCGGAAGAGC"
    ADAPTER_REV="AGATCGGAAGAGC"
    
    # Run Cutadapt to remove adapter sequences, perform quality trimming, and filter by length.
    # -a ADAPTER_FWD: 3' adapter for forward reads
    # -A ADAPTER_REV: 3' adapter for reverse reads (for paired-end data)
    # -o: Output file for forward reads
    # -p: Output file for reverse reads (for paired-end data)
    # -q 20,20: Trim low-quality bases from 5' and 3' ends (quality cutoff 20)
    # --minimum-length 25: Discard reads shorter than 25 bp after trimming
    
    cutadapt -a ${ADAPTER_FWD} -A ${ADAPTER_REV} \
             -o ${OUTPUT_R1} -p ${OUTPUT_R2} \
             -q 20,20 --minimum-length 25 \
             ${INPUT_R1} ${INPUT_R2}
    
    # For single-end reads, the command would be simpler:
    # cutadapt -a ${ADAPTER_FWD} -o ${OUTPUT_R1} -q 20 --minimum-length 25 ${INPUT_R1}
  2. 2

    align to hg38 using STAR 2.4.0 (Homo sapiens) or mm10 using STAR 2.5.2 (Mus musculus)

    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star
    
    # --- Setup Reference Genome (hg38 example) ---
    # Replace with the actual path to your STAR genome index directory for hg38.
    # This directory should contain files like Genome, SA, SAindex, etc.
    # If you need to build the index, use a command like:
    # STAR --runMode genomeGenerate --genomeDir /path/to/hg38_star_index --genomeFastaFiles /path/to/hg38.fa --sjdbGTFfile /path/to/gencode.vXX.annotation.gtf --runThreadN <num_threads>
    GENOME_DIR="/path/to/star_index/hg38"
    
    # --- Input Files ---
    # Replace with your actual input FASTQ files
    READ1="input_R1.fastq.gz"
    READ2="input_R2.fastq.gz" # For paired-end reads. If single-end, remove READ2 and adjust --readFilesIn
    
    # --- Output Prefix ---
    OUTPUT_PREFIX="aligned_sample"
    
    # --- Alignment Command ---
    # This command aligns reads to hg38 using STAR 2.4.0
    # For Mus musculus (mm10) alignment, you would typically use STAR 2.5.2 or later
    # and point to an mm10 genome index.
    STAR \
      --genomeDir "${GENOME_DIR}" \
      --readFilesIn "${READ1}" "${READ2}" \
      --readFilesCommand zcat \
      --outFileNamePrefix "${OUTPUT_PREFIX}_" \
      --outSAMtype BAM SortedByCoordinate \
      --outSAMunmapped Within \
      --outSAMattributes Standard \
      --outFilterType BySJout \
      --outFilterMultimapNmax 20 \
      --outFilterMismatchNmax 999 \
      --outFilterMismatchNoverLmax 0.04 \
      --alignIntronMin 20 \
      --alignIntronMax 1000000 \
      --alignMatesGapMax 1000000 \
      --alignSJoverhangMin 8 \
      --alignSJDBoverhangMin 1 \
      --sjdbScore 1 \
      --runThreadN 8 # Adjust number of threads as needed
    
    # Rename the output BAM file for clarity
    mv "${OUTPUT_PREFIX}_Aligned.sortedByCoord.out.bam" "${OUTPUT_PREFIX}.bam"
    
    # Index the BAM file
    samtools index "${OUTPUT_PREFIX}.bam"
    
  3. 3

    SAILOR analysis to call C-to-U edits and keep only sites with score >0.5 and edit fraction <80%

    SAILOR vNot specified GitHub
    $ Bash example
    # Install SAILOR (if not already installed)
    # git clone https://github.com/yeolab/SAILOR.git
    # cd SAILOR
    # pip install -r requirements.txt
    # # Ensure SAILOR.py is in your PATH or call it directly
    # # For example, if you are in the SAILOR directory:
    # # python SAILOR.py ...
    
    # Placeholder variables for input and output files
    INPUT_BAM="input.bam"
    REFERENCE_FASTA="reference.fasta" # e.g., hg38.fa
    OUTPUT_VCF="output_c_to_u_edits.vcf"
    
    # Run SAILOR to call C-to-U edits with specified filters
    # The default --min_score is 0.5 and --max_edit_fraction is 0.8, 
    # so explicitly setting them here for clarity based on the description.
    python SAILOR.py \
        --bam "$INPUT_BAM" \
        --ref "$REFERENCE_FASTA" \
        --output "$OUTPUT_VCF" \
        --min_score 0.5 \
        --max_edit_fraction 0.8 \
        --edit_type C_to_U
  4. 4

    FLARE analysis to call C-to-U edit clusters

    FLARE vNot specified (Inferred with models/gemini-2.5-flash)
    $ Bash example
    # Clone FLARE repository
    # git clone https://github.com/yeolab/FLARE.git
    # cd FLARE
    
    # Install dependencies (if not already installed in environment)
    # pip install pysam numpy
    
    # Define variables
    INPUT_BAM="input.bam" # Replace with your input BAM file, typically aligned RNA-seq or eCLIP data
    REFERENCE_GENOME="path/to/GRCh38.fa" # Placeholder for human hg38 reference genome (e.g., from UCSC or Ensembl)
    KNOWN_SNPS_VCF="path/to/dbSNP_GRCh38.vcf.gz" # Placeholder for known SNPs VCF for GRCh38 (e.g., from NCBI dbSNP)
    OUTPUT_PREFIX="flare_output"
    THREADS=8 # Number of threads
    
    # Create output directory if it doesn't exist
    mkdir -p "${OUTPUT_PREFIX}_results"
    
    # Run FLARE analysis to call C-to-U edit clusters
    python FLARE.py \
        -i "${INPUT_BAM}" \
        -g "${REFERENCE_GENOME}" \
        -s "${KNOWN_SNPS_VCF}" \
        -o "${OUTPUT_PREFIX}_results/${OUTPUT_PREFIX}" \
        -t "${THREADS}" \
        --min_coverage 10 \
        --min_base_quality 20 \
        --min_mapping_quality 20 \
        --min_edit_ratio 0.1 # Example parameters for C-to-U editing, adjust as needed
    
  5. 5

    Intersect the edit clusters from 3 replicates, which yields "*confident_peaks.bed"

    merge_peaks (Inferred with models/gemini-2.5-flash) vN/A GitHub
    $ Bash example
    # Clone the merge_peaks repository if not already available
    # git clone https://github.com/yeolab/merge_peaks.git
    # cd merge_peaks
    
    # Assuming the merge_peaks.py script is accessible in the current directory or PATH
    python merge_peaks.py -i replicate1_edit_clusters.bed replicate2_edit_clusters.bed replicate3_edit_clusters.bed -o confident_peaks
  6. 6

    Subtract STAMP confident clusters to Buffer only control, which yields "*cleaned_confident_peaks.bed"

    bedtools (Inferred with models/gemini-2.5-flash) v2.29.2 GitHub
    $ Bash example
    # Install bedtools if not already available
    # conda install -c bioconda bedtools
    
    # Subtract regions in 'buffer_only_control.bed' from 'stamp_confident_clusters.bed'
    # The output 'cleaned_confident_peaks.bed' will contain regions from the STAMP clusters
    # that do not overlap with the control regions.
    bedtools subtract -a stamp_confident_clusters.bed -b buffer_only_control.bed > cleaned_confident_peaks.bed

Tools Used

Raw Source Text
remove adapter with Cutadapt
align to hg38 using STAR 2.4.0 (Homo sapiens) or mm10 using STAR 2.5.2 (Mus musculus)
SAILOR analysis to call C-to-U edits and keep only sites with score >0.5 and edit fraction <80%
FLARE analysis to call C-to-U edit clusters
Intersect the edit clusters from 3 replicates, which yields "*confident_peaks.bed"
Subtract STAMP confident clusters to Buffer only control, which yields "*cleaned_confident_peaks.bed"
Assembly: hg38/mm10
Supplementary files format and content: SAILOR step yields bed file: *0.5Score0.8Fraction.fastqTr.sorted.STARUnmapped.out.sorted.STARAligned.out.sorted.bam.combined.readfiltered.formatted.varfiltered.snpfiltered.ranked.bed
Supplementary files format and content: FLARE step yields .tsv file: "*merged_sorted_peaks.fdr_0.1.d_15.scored.tsv"
Supplementary files format and content: Intersection step yields .bed file:  "*confident_peaks.bed"
Supplementary files format and content: Subtraction step yields .bed file:  "*cleaned_confident_peaks.bed"
← Back to Analysis