GSE232519 Processing Pipeline
RNA-Seq
code_examples
3 steps
Publication
High-sensitivity in situ capture of endogenous RNA-protein interactions in fixed cells and primary tissues.Nature communications (2024) — PMID 39152130
Dataset
GSE232519Expanded repertoire of RNA-editing-based detection for RNA binding protein interactions (5)
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
remove adapter with Cutadapt
$ Bash example
# Install Cutadapt (if not already installed) # conda create -n cutadapt_env cutadapt=4.0 -y # conda activate cutadapt_env # Define input and output file paths INPUT_READ1="input_R1.fastq.gz" INPUT_READ2="input_R2.fastq.gz" # Required for paired-end OUTPUT_READ1="trimmed_R1.fastq.gz" OUTPUT_READ2="trimmed_R2.fastq.gz" # Required for paired-end # Define adapter sequences (replace with actual adapters for your library prep) # Example placeholder adapters (replace with actual sequences, e.g., from library prep kit or sequencing facility) # For Illumina, common forward adapter: AGATCGGAAGAGCACACGTCTGAACTCCAGTCA # For Illumina, common reverse adapter: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT ADAPTER_FWD="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" ADAPTER_REV="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT" # Run Cutadapt for paired-end reads cutadapt -a "${ADAPTER_FWD}" -A "${ADAPTER_REV}" \ -o "${OUTPUT_READ1}" -p "${OUTPUT_READ2}" \ "${INPUT_READ1}" "${INPUT_READ2}" \ --minimum-length 15 \ --quality-cutoff 20 \ --trim-n -
2
align to genome using bowtie2 or bwa-mem
$ Bash example
# Install Bowtie2 (if not already installed) # conda install -c bioconda bowtie2 # Example: Build Bowtie2 index (if not already available) # Replace 'path/to/genome.fa' with your reference genome FASTA file (e.g., hg38.fa) # bowtie2-build path/to/genome.fa path/to/genome_index # Align reads to the genome using Bowtie2 # Replace 'path/to/genome_index' with the path to your Bowtie2 index (e.g., hg38_index) # Replace 'path/to/reads.fastq.gz' with your input FASTQ file (e.g., sample_R1.fastq.gz for single-end) # Replace 'output_aligned.sam' with your desired output SAM file name # Using common parameters for single-end reads and 8 threads. # For paired-end reads, use -1 <reads_1.fastq.gz> -2 <reads_2.fastq.gz> instead of -U. bowtie2 -x path/to/genome_index -U path/to/reads.fastq.gz -S output_aligned.sam --threads 8
-
3
generate count tables along reporter sequence using Pysamstats
Pysamstats v1.1.2$ Bash example
# Install pysamstats if not already installed # pip install pysamstats # Define input and output files INPUT_BAM="aligned_reads.bam" # Placeholder for the input alignment file REPORTER_BED="reporter_sequences.bed" # Placeholder for the BED file defining reporter sequences OUTPUT_COUNTS="reporter_counts.tsv" REFERENCE_FASTA="GRCh38.p14.genome.fa" # Placeholder for the latest human reference genome FASTA # Generate count tables (e.g., coverage) along reporter sequences using pysamstats # The --type parameter can be adjusted based on the specific "count" desired (e.g., coverage, reads, gc, tlen, etc.) # 'coverage' is a common and reasonable default for "generate count tables along reporter sequence". pysamstats --type coverage --fasta "${REFERENCE_FASTA}" --regions "${REPORTER_BED}" "${INPUT_BAM}" > "${OUTPUT_COUNTS}"
Tools Used
Raw Source Text
remove adapter with Cutadapt align to genome using bowtie2 or bwa-mem generate count tables along reporter sequence using Pysamstats Assembly: 6X MS2 and 6X PP7 stem-loop-bearing mRNAs (alternating): 6X MS2 and 6X PP7 alternating and 50 bp apart mRNA.fa, 6X MS2 and 6X PP7 alternating and 50 bp apart mRNA features.txt Assembly: 2X MS2 and 2X PP7 stem-loop-bearing mRNAs (alternating): 2X MS2 and 2X PP7 alternating and 50 bp apart mRNA.fa, 2X MS2 and 2X PP7 alternating and 50 bp apart mRNA features.txt Assembly: 2X MS2 and 2X PP7 stem-loop-bearing mRNAs (50bp apart): 2X MS2 (50 bp apart)-350 bp-2X PP7 (50 bp apart) mRNA.fa, 2X MS2 (50 bp apart)-350 bp-2X PP7 (50 bp apart) mRNA features.txt Supplementary files format and content: count tables with called edits along reporter sequences