GSE232519 Processing Pipeline

RNA-Seq code_examples 3 steps

Publication

High-sensitivity in situ capture of endogenous RNA-protein interactions in fixed cells and primary tissues.

Nature communications (2024) — PMID 39152130

Dataset

GSE232519

Expanded repertoire of RNA-editing-based detection for RNA binding protein interactions (5)

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    remove adapter with Cutadapt

    cutadapt v4.0 GitHub
    $ Bash example
    # Install Cutadapt (if not already installed)
    # conda create -n cutadapt_env cutadapt=4.0 -y
    # conda activate cutadapt_env
    
    # Define input and output file paths
    INPUT_READ1="input_R1.fastq.gz"
    INPUT_READ2="input_R2.fastq.gz" # Required for paired-end
    OUTPUT_READ1="trimmed_R1.fastq.gz"
    OUTPUT_READ2="trimmed_R2.fastq.gz" # Required for paired-end
    
    # Define adapter sequences (replace with actual adapters for your library prep)
    # Example placeholder adapters (replace with actual sequences, e.g., from library prep kit or sequencing facility)
    # For Illumina, common forward adapter: AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
    # For Illumina, common reverse adapter: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
    ADAPTER_FWD="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA"
    ADAPTER_REV="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT"
    
    # Run Cutadapt for paired-end reads
    cutadapt -a "${ADAPTER_FWD}" -A "${ADAPTER_REV}" \
             -o "${OUTPUT_READ1}" -p "${OUTPUT_READ2}" \
             "${INPUT_READ1}" "${INPUT_READ2}" \
             --minimum-length 15 \
             --quality-cutoff 20 \
             --trim-n
    
  2. 2

    align to genome using bowtie2 or bwa-mem

    $ Bash example
    # Install Bowtie2 (if not already installed)
    # conda install -c bioconda bowtie2
    
    # Example: Build Bowtie2 index (if not already available)
    # Replace 'path/to/genome.fa' with your reference genome FASTA file (e.g., hg38.fa)
    # bowtie2-build path/to/genome.fa path/to/genome_index
    
    # Align reads to the genome using Bowtie2
    # Replace 'path/to/genome_index' with the path to your Bowtie2 index (e.g., hg38_index)
    # Replace 'path/to/reads.fastq.gz' with your input FASTQ file (e.g., sample_R1.fastq.gz for single-end)
    # Replace 'output_aligned.sam' with your desired output SAM file name
    # Using common parameters for single-end reads and 8 threads.
    # For paired-end reads, use -1 <reads_1.fastq.gz> -2 <reads_2.fastq.gz> instead of -U.
    
    bowtie2 -x path/to/genome_index -U path/to/reads.fastq.gz -S output_aligned.sam --threads 8
  3. 3

    generate count tables along reporter sequence using Pysamstats

    Pysamstats v1.1.2
    $ Bash example
    # Install pysamstats if not already installed
    # pip install pysamstats
    
    # Define input and output files
    INPUT_BAM="aligned_reads.bam" # Placeholder for the input alignment file
    REPORTER_BED="reporter_sequences.bed" # Placeholder for the BED file defining reporter sequences
    OUTPUT_COUNTS="reporter_counts.tsv"
    REFERENCE_FASTA="GRCh38.p14.genome.fa" # Placeholder for the latest human reference genome FASTA
    
    # Generate count tables (e.g., coverage) along reporter sequences using pysamstats
    # The --type parameter can be adjusted based on the specific "count" desired (e.g., coverage, reads, gc, tlen, etc.)
    # 'coverage' is a common and reasonable default for "generate count tables along reporter sequence".
    pysamstats --type coverage --fasta "${REFERENCE_FASTA}" --regions "${REPORTER_BED}" "${INPUT_BAM}" > "${OUTPUT_COUNTS}"

Tools Used

Raw Source Text
remove adapter with Cutadapt
align to genome using bowtie2 or bwa-mem
generate count tables along reporter sequence using Pysamstats
Assembly: 6X MS2 and 6X PP7 stem-loop-bearing mRNAs (alternating):  6X MS2 and 6X PP7 alternating and 50 bp apart mRNA.fa, 6X MS2 and 6X PP7 alternating and 50 bp apart mRNA features.txt
Assembly: 2X MS2 and 2X PP7 stem-loop-bearing mRNAs (alternating):  2X MS2 and 2X PP7 alternating and 50 bp apart mRNA.fa, 2X MS2 and 2X PP7 alternating and 50 bp apart mRNA features.txt
Assembly: 2X MS2 and 2X PP7 stem-loop-bearing mRNAs (50bp apart):  2X MS2 (50 bp apart)-350 bp-2X PP7 (50 bp apart) mRNA.fa, 2X MS2 (50 bp apart)-350 bp-2X PP7 (50 bp apart) mRNA features.txt
Supplementary files format and content: count tables with called edits along reporter sequences
← Back to Analysis