GSE232520 Processing Pipeline

GSE code_examples 3 steps

Publication

High-sensitivity in situ capture of endogenous RNA-protein interactions in fixed cells and primary tissues.

Nature communications (2024) — PMID 39152130

Dataset

GSE232520

Expanded palette of RNA base editors for comprehensive RBP-RNA interactome studies

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    remove adapter with Cutadapt

    cutadapt v4.0 GitHub
    $ Bash example
    # Install cutadapt (if not already installed)
    # conda install -c bioconda cutadapt
    
    # Example for paired-end reads, common in eCLIP assays.
    # Replace 'input_R1.fastq.gz' and 'input_R2.fastq.gz' with your actual input files.
    # Replace adapter sequences if different from standard Illumina adapters.
    # -a: Adapter sequence for read 1
    # -A: Adapter sequence for read 2
    # -o: Output file for trimmed read 1
    # -p: Output file for trimmed read 2
    # -m: Minimum read length after trimming (e.g., 18 bp)
    # -q: Trim low-quality ends from reads (e.g., quality score 20)
    # --cores: Number of CPU cores to use
    # --report=full: Print a full report of the trimming process
    
    cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \
             -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \
             -o trimmed_R1.fastq.gz \
             -p trimmed_R2.fastq.gz \
             -m 18 \
             -q 20 \
             --cores=4 \
             --report=full \
             input_R1.fastq.gz \
             input_R2.fastq.gz
  2. 2

    align to genome using bowtie2 or bwa-mem

    Bowtie2 v2.5.2 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install Bowtie2 and Samtools (if not already installed)
    # conda install -c bioconda bowtie2 samtools
    
    # Define variables
    # Replace 'path/to/GRCh38_index' with the actual path to your pre-built Bowtie2 genome index for GRCh38.
    # You can build an index using `bowtie2-build <genome.fa> GRCh38_index`.
    GENOME_INDEX="path/to/GRCh38_index" 
    READ1="sample_R1.fastq.gz" # Replace with your R1 fastq file
    READ2="sample_R2.fastq.gz" # Replace with your R2 fastq file
    OUTPUT_SAM="sample_aligned.sam"
    OUTPUT_BAM="sample_aligned.bam"
    SORTED_BAM="sample_aligned.sorted.bam"
    
    # Align paired-end reads to the genome using Bowtie2
    # -x: specify the genome index basename
    # -1: specify the first mate reads file
    # -2: specify the second mate reads file
    # -S: specify the output SAM file
    # --threads: number of threads to use (adjust based on available CPU cores)
    bowtie2 -x "${GENOME_INDEX}" -1 "${READ1}" -2 "${READ2}" -S "${OUTPUT_SAM}" --threads 8
    
    # Convert SAM to BAM format and sort the BAM file by coordinate
    # samtools view: convert SAM to BAM (-bS) and output to file (-o)
    # samtools sort: sort BAM file by coordinate and output to file (-o)
    # -@: number of threads for samtools
    samtools view -bS "${OUTPUT_SAM}" -o "${OUTPUT_BAM}" -@ 8
    samtools sort "${OUTPUT_BAM}" -o "${SORTED_BAM}" -@ 8
    
    # Index the sorted BAM file (optional, but highly recommended for downstream tools like IGV or peak callers)
    samtools index "${SORTED_BAM}"
    
    # Clean up intermediate files (optional)
    # rm "${OUTPUT_SAM}" "${OUTPUT_BAM}"
  3. 3

    generate count tables along reporter sequence using Pysamstats

    Pysamstats v1.2.0 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install Pysamstats (if not already installed)
    # pip install pysamstats
    # or
    # conda install -c bioconda pysamstats
    
    # Generate count tables along reporter sequences
    # This command assumes you have an aligned BAM file, a BED file defining the reporter sequences,
    # and a reference FASTA file (optional, but good practice for coverage calculations).
    # The --field reads_all counts all reads overlapping the region.
    # Adjust --type and --field as needed for specific counting requirements (e.g., base_counts, reads_pp).
    pysamstats --type coverage \
               --field reads_all \
               --regions-file reporter_sequences.bed \
               --fasta reference.fa \
               input.bam > output_reporter_counts.tsv

Tools Used

Raw Source Text
remove adapter with Cutadapt
align to genome using bowtie2 or bwa-mem
generate count tables along reporter sequence using Pysamstats
Assembly: 12X MS2 stem-loop reporter mRNA
Supplementary files format and content: pysamstats tables that contain the read counts for RNA bases  at each position along the reporter mRNA sequence
← Back to Analysis