GSE232515 Processing Pipeline

RNA-Seq code_examples 4 steps

Publication

High-sensitivity in situ capture of endogenous RNA-protein interactions in fixed cells and primary tissues.

Nature communications (2024) — PMID 39152130

Dataset

GSE232515

Expanded repertoire for RNA-editing-based detection for RNA binding protein interactions (3)

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    remove adapter with Cutadapt

    cutadapt v4.0 (Inferred from yeolab/skipper workflow) GitHub
    $ Bash example
    # Install cutadapt (example using conda)
    # conda install -c bioconda cutadapt
    
    # Define input and output files
    INPUT_FASTQ="path/to/your/input_reads.fastq.gz"
    OUTPUT_FASTQ="path/to/your/trimmed_reads.fastq.gz"
    
    # Define adapter sequence(s)
    # For eCLIP, this sequence is typically specific to the library preparation kit.
    # Example: Illumina TruSeq RNA 3' Adapter
    ADAPTER_3PRIME="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA"
    # If a 5' adapter is also present, define it here:
    # ADAPTER_5PRIME="YOUR_5_PRIME_ADAPTER_SEQUENCE"
    
    # Define trimming parameters
    MIN_LENGTH=18 # Minimum read length after trimming
    QUALITY_CUTOFF=20 # Quality cutoff for trimming low-quality bases from 3' end
    
    # Run cutadapt to remove 3' adapter
    # -a: 3' adapter sequence (for single-end or 3' end of forward read in paired-end)
    # -o: output file for trimmed reads
    # --minimum-length: discard reads shorter than MIN_LENGTH after trimming
    # --quality-cutoff: trim low-quality bases from the 3' end based on QUALITY_CUTOFF
    cutadapt -a "${ADAPTER_3PRIME}" \
             -o "${OUTPUT_FASTQ}" \
             --minimum-length "${MIN_LENGTH}" \
             --quality-cutoff "${QUALITY_CUTOFF}" \
             "${INPUT_FASTQ}"
    
    # For paired-end reads, the command would look like this:
    # INPUT_R1="path/to/your/input_R1.fastq.gz"
    # INPUT_R2="path/to/your/input_R2.fastq.gz"
    # OUTPUT_R1="path/to/your/trimmed_R1.fastq.gz"
    # OUTPUT_R2="path/to/your/trimmed_R2.fastq.gz"
    # ADAPTER_FWD="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" # Adapter for R1
    # ADAPTER_REV="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT" # Adapter for R2 (often reverse complement or different)
    # cutadapt -a "${ADAPTER_FWD}" -A "${ADAPTER_REV}" \
    #          -o "${OUTPUT_R1}" -p "${OUTPUT_R2}" \
    #          --minimum-length "${MIN_LENGTH}" \
    #          --quality-cutoff "${QUALITY_CUTOFF}" \
    #          "${INPUT_R1}" "${INPUT_R2}"
  2. 2

    align to human genome using STAR 2.7.6a

    $ Bash example
    # Define variables
    FASTQ_R1="sample_R1.fastq.gz"
    FASTQ_R2="sample_R2.fastq.gz" # Remove if single-end
    GENOME_DIR="/path/to/STAR_human_GRCh38_index" # Replace with actual path to STAR genome index (e.g., GRCh38/hg38)
    OUTPUT_PREFIX="sample_aligned_"
    THREADS=8 # Adjust as needed
    
    # Installation (example, uncomment and modify if needed)
    # conda install -c bioconda star=2.7.6a
    
    # Run STAR alignment
    STAR \
      --genomeDir "${GENOME_DIR}" \
      --readFilesIn "${FASTQ_R1}" "${FASTQ_R2}" \
      --runThreadN "${THREADS}" \
      --outFileNamePrefix "${OUTPUT_PREFIX}" \
      --outSAMtype BAM SortedByCoordinate \
      --readFilesCommand zcat \
      --outSAMattributes Standard \
      --quantMode GeneCounts \
      --twopassMode Basic
  3. 3

    SAILOR analysis of data for C-to-U edits

    SAILOR vNot specified
    $ Bash example
    # Install Miniconda or Anaconda if not already installed
    # conda create -n sailor_env python=2.7 pysam numpy scipy
    # conda activate sailor_env
    # git clone https://github.com/gersteinlab/SAILOR.git
    # cd SAILOR
    
    # Define input and output paths
    INPUT_BAM="input.bam" # Placeholder: Path to your aligned RNA-seq BAM file
    OUTPUT_PREFIX="sailor_output" # Prefix for output files
    SAMPLE_NAME="sample1" # A name for the sample
    
    # Define reference files (using latest human assembly as placeholder if not specified)
    REFERENCE_FASTA="GRCh38.p14.genome.fa" # Placeholder: Path to reference genome FASTA
    GENE_ANNOTATION_GTF="gencode.v45.annotation.gtf" # Placeholder: Path to gene annotation GTF
    
    # Execute SAILOR for C-to-U edit detection
    # Assuming SAILOR.py is in the current directory or in PATH
    # Default parameters are used as no specific parameters were provided in the description.
    # -c: Minimum coverage (default 10)
    # -q: Minimum base quality (default 20)
    # -m: Minimum mapping quality (default 20)
    # -e: Minimum editing ratio (default 0.1)
    # -d: Minimum depth for editing site (default 5)
    python SAILOR.py \
        -i "${INPUT_BAM}" \
        -o "${OUTPUT_PREFIX}" \
        -r "${REFERENCE_FASTA}" \
        -g "${GENE_ANNOTATION_GTF}" \
        -s "${SAMPLE_NAME}" \
        -c 10 \
        -q 20 \
        -m 20 \
        -e 0.1 \
        -d 5
  4. 4

    SAILOR analysis of data for A-to-I edits

    SAILOR vv1.0.0
    $ Bash example
    # Install SAILOR (if not already installed)
    # pip install SAILOR
    
    # Define input and output paths
    INPUT_BAM="path/to/your/aligned_reads.bam" # Replace with your input BAM file
    REFERENCE_FASTA="path/to/your/reference_genome.fa" # e.g., hg38.fa, replace with your reference genome FASTA
    OUTPUT_DIR="sailor_analysis_output"
    
    # Create output directory if it doesn't exist
    mkdir -p "${OUTPUT_DIR}"
    
    # Run SAILOR for A-to-I RNA editing detection
    # Adjust parameters like --min-coverage, --min-base-quality, --min-mapping-quality as needed
    SAILOR -i "${INPUT_BAM}" -r "${REFERENCE_FASTA}" -o "${OUTPUT_DIR}"

Tools Used

Raw Source Text
remove adapter with Cutadapt
align to human genome using STAR 2.7.6a
SAILOR analysis of data for C-to-U edits
SAILOR analysis of data for A-to-I edits
Assembly: GRCh38
Supplementary files format and content: cleaned =  RBFOX2-rBE data  with the rBE only edit clusters present in all three replicates were substracted
Supplementary files format and content: ai = A-to-I edits
Supplementary files format and content: ct = C-to-U edits
Supplementary files format and content: both = both A-to-I and C-to-U edits considered simultaneously
← Back to Analysis