GSE232519 Processing Pipeline

RNA-Seq code_examples 3 steps

Publication

High-sensitivity in situ capture of endogenous RNA-protein interactions in fixed cells and primary tissues.

Nature communications (2024) — PMID 39152130

Dataset

Expanded repertoire of RNA-editing-based detection for RNA binding protein interactions (5)

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

remove adapter with Cutadapt

cutadapt v4.0 GitHub

$ Bash example

# Install Cutadapt (if not already installed)
# conda create -n cutadapt_env cutadapt=4.0 -y
# conda activate cutadapt_env

# Define input and output file paths
INPUT_READ1="input_R1.fastq.gz"
INPUT_READ2="input_R2.fastq.gz" # Required for paired-end
OUTPUT_READ1="trimmed_R1.fastq.gz"
OUTPUT_READ2="trimmed_R2.fastq.gz" # Required for paired-end

# Define adapter sequences (replace with actual adapters for your library prep)
# Example placeholder adapters (replace with actual sequences, e.g., from library prep kit or sequencing facility)
# For Illumina, common forward adapter: AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
# For Illumina, common reverse adapter: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
ADAPTER_FWD="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA"
ADAPTER_REV="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT"

# Run Cutadapt for paired-end reads
cutadapt -a "${ADAPTER_FWD}" -A "${ADAPTER_REV}" \
         -o "${OUTPUT_READ1}" -p "${OUTPUT_READ2}" \
         "${INPUT_READ1}" "${INPUT_READ2}" \
         --minimum-length 15 \
         --quality-cutoff 20 \
         --trim-n

View on GitHub

align to genome using bowtie2 or bwa-mem

Bowtie2 v2.4.5 GitHub

$ Bash example

# Install Bowtie2 (if not already installed)
# conda install -c bioconda bowtie2

# Example: Build Bowtie2 index (if not already available)
# Replace 'path/to/genome.fa' with your reference genome FASTA file (e.g., hg38.fa)
# bowtie2-build path/to/genome.fa path/to/genome_index

# Align reads to the genome using Bowtie2
# Replace 'path/to/genome_index' with the path to your Bowtie2 index (e.g., hg38_index)
# Replace 'path/to/reads.fastq.gz' with your input FASTQ file (e.g., sample_R1.fastq.gz for single-end)
# Replace 'output_aligned.sam' with your desired output SAM file name
# Using common parameters for single-end reads and 8 threads.
# For paired-end reads, use -1 <reads_1.fastq.gz> -2 <reads_2.fastq.gz> instead of -U.

bowtie2 -x path/to/genome_index -U path/to/reads.fastq.gz -S output_aligned.sam --threads 8

View on GitHub

generate count tables along reporter sequence using Pysamstats

Pysamstats v1.1.2

$ Bash example

# Install pysamstats if not already installed
# pip install pysamstats

# Define input and output files
INPUT_BAM="aligned_reads.bam" # Placeholder for the input alignment file
REPORTER_BED="reporter_sequences.bed" # Placeholder for the BED file defining reporter sequences
OUTPUT_COUNTS="reporter_counts.tsv"
REFERENCE_FASTA="GRCh38.p14.genome.fa" # Placeholder for the latest human reference genome FASTA

# Generate count tables (e.g., coverage) along reporter sequences using pysamstats
# The --type parameter can be adjusted based on the specific "count" desired (e.g., coverage, reads, gc, tlen, etc.)
# 'coverage' is a common and reasonable default for "generate count tables along reporter sequence".
pysamstats --type coverage --fasta "${REFERENCE_FASTA}" --regions "${REPORTER_BED}" "${INPUT_BAM}" > "${OUTPUT_COUNTS}"

Tools Used

Bowtie2

Raw Source Text

remove adapter with Cutadapt
align to genome using bowtie2 or bwa-mem
generate count tables along reporter sequence using Pysamstats
Assembly: 6X MS2 and 6X PP7 stem-loop-bearing mRNAs (alternating):  6X MS2 and 6X PP7 alternating and 50 bp apart mRNA.fa, 6X MS2 and 6X PP7 alternating and 50 bp apart mRNA features.txt
Assembly: 2X MS2 and 2X PP7 stem-loop-bearing mRNAs (alternating):  2X MS2 and 2X PP7 alternating and 50 bp apart mRNA.fa, 2X MS2 and 2X PP7 alternating and 50 bp apart mRNA features.txt
Assembly: 2X MS2 and 2X PP7 stem-loop-bearing mRNAs (50bp apart):  2X MS2 (50 bp apart)-350 bp-2X PP7 (50 bp apart) mRNA.fa, 2X MS2 (50 bp apart)-350 bp-2X PP7 (50 bp apart) mRNA features.txt
Supplementary files format and content: count tables with called edits along reporter sequences

← Back to Analysis