GSE232520 Processing Pipeline

GSE code_examples 3 steps

Publication

High-sensitivity in situ capture of endogenous RNA-protein interactions in fixed cells and primary tissues.

Nature communications (2024) — PMID 39152130

Dataset

Expanded palette of RNA base editors for comprehensive RBP-RNA interactome studies

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

remove adapter with Cutadapt

cutadapt v4.0 GitHub

$ Bash example

# Install cutadapt (if not already installed)
# conda install -c bioconda cutadapt

# Example for paired-end reads, common in eCLIP assays.
# Replace 'input_R1.fastq.gz' and 'input_R2.fastq.gz' with your actual input files.
# Replace adapter sequences if different from standard Illumina adapters.
# -a: Adapter sequence for read 1
# -A: Adapter sequence for read 2
# -o: Output file for trimmed read 1
# -p: Output file for trimmed read 2
# -m: Minimum read length after trimming (e.g., 18 bp)
# -q: Trim low-quality ends from reads (e.g., quality score 20)
# --cores: Number of CPU cores to use
# --report=full: Print a full report of the trimming process

cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \
         -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \
         -o trimmed_R1.fastq.gz \
         -p trimmed_R2.fastq.gz \
         -m 18 \
         -q 20 \
         --cores=4 \
         --report=full \
         input_R1.fastq.gz \
         input_R2.fastq.gz

View on GitHub

align to genome using bowtie2 or bwa-mem

Bowtie2 v2.5.2 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install Bowtie2 and Samtools (if not already installed)
# conda install -c bioconda bowtie2 samtools

# Define variables
# Replace 'path/to/GRCh38_index' with the actual path to your pre-built Bowtie2 genome index for GRCh38.
# You can build an index using `bowtie2-build <genome.fa> GRCh38_index`.
GENOME_INDEX="path/to/GRCh38_index" 
READ1="sample_R1.fastq.gz" # Replace with your R1 fastq file
READ2="sample_R2.fastq.gz" # Replace with your R2 fastq file
OUTPUT_SAM="sample_aligned.sam"
OUTPUT_BAM="sample_aligned.bam"
SORTED_BAM="sample_aligned.sorted.bam"

# Align paired-end reads to the genome using Bowtie2
# -x: specify the genome index basename
# -1: specify the first mate reads file
# -2: specify the second mate reads file
# -S: specify the output SAM file
# --threads: number of threads to use (adjust based on available CPU cores)
bowtie2 -x "${GENOME_INDEX}" -1 "${READ1}" -2 "${READ2}" -S "${OUTPUT_SAM}" --threads 8

# Convert SAM to BAM format and sort the BAM file by coordinate
# samtools view: convert SAM to BAM (-bS) and output to file (-o)
# samtools sort: sort BAM file by coordinate and output to file (-o)
# -@: number of threads for samtools
samtools view -bS "${OUTPUT_SAM}" -o "${OUTPUT_BAM}" -@ 8
samtools sort "${OUTPUT_BAM}" -o "${SORTED_BAM}" -@ 8

# Index the sorted BAM file (optional, but highly recommended for downstream tools like IGV or peak callers)
samtools index "${SORTED_BAM}"

# Clean up intermediate files (optional)
# rm "${OUTPUT_SAM}" "${OUTPUT_BAM}"

View on GitHub

generate count tables along reporter sequence using Pysamstats

Pysamstats v1.2.0 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install Pysamstats (if not already installed)
# pip install pysamstats
# or
# conda install -c bioconda pysamstats

# Generate count tables along reporter sequences
# This command assumes you have an aligned BAM file, a BED file defining the reporter sequences,
# and a reference FASTA file (optional, but good practice for coverage calculations).
# The --field reads_all counts all reads overlapping the region.
# Adjust --type and --field as needed for specific counting requirements (e.g., base_counts, reads_pp).
pysamstats --type coverage \
           --field reads_all \
           --regions-file reporter_sequences.bed \
           --fasta reference.fa \
           input.bam > output_reporter_counts.tsv

View on GitHub

Tools Used

Bowtie2

Raw Source Text

remove adapter with Cutadapt
align to genome using bowtie2 or bwa-mem
generate count tables along reporter sequence using Pysamstats
Assembly: 12X MS2 stem-loop reporter mRNA
Supplementary files format and content: pysamstats tables that contain the read counts for RNA bases  at each position along the reporter mRNA sequence

← Back to Analysis