GSE232513 Processing Pipeline
RNA-Seq
code_examples
3 steps
Publication
High-sensitivity in situ capture of endogenous RNA-protein interactions in fixed cells and primary tissues.Nature communications (2024) — PMID 39152130
Dataset
GSE232513Expanded repertoire of RNA-editing-based detection for RNA binding protein interactions (1)
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
remove adapter with Cutadapt
$ Bash example
# Install Cutadapt (if not already installed) # conda install -c bioconda cutadapt # Define input and output file paths INPUT_R1="reads_R1.fastq.gz" INPUT_R2="reads_R2.fastq.gz" OUTPUT_R1="trimmed_R1.fastq.gz" OUTPUT_R2="trimmed_R2.fastq.gz" # Define adapter sequences (common Illumina adapters, adjust as needed) # For 3' adapter on Read 1 ADAPTER_R1="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" # For 3' adapter on Read 2 (often the reverse complement of R1 adapter or a different sequence) ADAPTER_R2="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT" # Define minimum read length after trimming MIN_LENGTH=18 # Define quality cutoff QUALITY_CUTOFF=20 # Define number of threads to use THREADS=8 # Execute Cutadapt for paired-end reads cutadapt \ -j ${THREADS} \ -q ${QUALITY_CUTOFF} \ --minimum-length ${MIN_LENGTH} \ -a ${ADAPTER_R1} \ -A ${ADAPTER_R2} \ -o ${OUTPUT_R1} \ -p ${OUTPUT_R2} \ ${INPUT_R1} ${INPUT_R2} # For single-end reads, use a command like this: # INPUT_SE="reads.fastq.gz" # OUTPUT_SE="trimmed.fastq.gz" # cutadapt \ # -j ${THREADS} \ # -q ${QUALITY_CUTOFF} \ # --minimum-length ${MIN_LENGTH} \ # -a ${ADAPTER_R1} \ # -o ${OUTPUT_SE} \ # ${INPUT_SE} -
2
align to genome using bowtie2 or bwa-mem
$ Bash example
# Install Bowtie2 (if not already installed) # conda install -c bioconda bowtie2 # Define variables # Placeholder for human genome GRCh38 index. Replace with your actual index path. # The index should be built using bowtie2-build from a FASTA file (e.g., GRCh38.fa). GENOME_INDEX="/path/to/your/genome/index/GRCh38" # Example: /data/indexes/bowtie2/GRCh38/GRCh38 # Placeholder for input FASTQ file(s). Replace with your actual file paths. # For single-end reads: INPUT_READS="input.fastq.gz" # For paired-end reads, uncomment and set these variables: # READ1="input_R1.fastq.gz" # READ2="input_R2.fastq.gz" OUTPUT_SAM="aligned.sam" NUM_THREADS=8 # Number of threads to use # Align single-end reads to the genome using Bowtie2 bowtie2 -x "${GENOME_INDEX}" \ -U "${INPUT_READS}" \ -S "${OUTPUT_SAM}" \ -p "${NUM_THREADS}" # To align paired-end reads, comment out the single-end command above and uncomment/use the following: # bowtie2 -x "${GENOME_INDEX}" \ # -1 "${READ1}" \ # -2 "${READ2}" \ # -S "${OUTPUT_SAM}" \ # -p "${NUM_THREADS}" # Optional: Convert SAM to BAM, sort, and index for downstream analysis (e.g., peak calling) # # Install samtools (if not already installed) # # conda install -c bioconda samtools # samtools view -bS "${OUTPUT_SAM}" > "aligned.bam" # samtools sort "aligned.bam" -o "aligned.sorted.bam" # samtools index "aligned.sorted.bam" -
3
generate count tables along reporter sequence using Pysamstats
Pysamstats v1.2.0 (Inferred with models/gemini-2.5-flash)$ Bash example
# Install Pysamstats (if not already installed) # conda install -c bioconda pysamstats # Define input and output files INPUT_BAM="aligned_reads.bam" REPORTER_BED="reporter_sequences.bed" GENOME_FASTA="hg38.fa" # Placeholder for reference genome FASTA OUTPUT_COUNTS="reporter_sequence_counts.tsv" # Generate count tables along reporter sequences using Pysamstats # --type coverage_all: Calculates coverage for all reads (forward and reverse) # --fasta: Provides the reference genome for sequence-based statistics (optional for simple coverage, but good practice) # --regions: Specifies a BED file containing the reporter sequences to analyze pysamstats --type coverage_all \ --fasta ${GENOME_FASTA} \ --regions ${REPORTER_BED} \ ${INPUT_BAM} > ${OUTPUT_COUNTS}
Tools Used
Raw Source Text
remove adapter with Cutadapt align to genome using bowtie2 or bwa-mem generate count tables along reporter sequence using Pysamstats Assembly: 12X MS2 stem-loop reporter mRNA Supplementary files format and content: pysamstats tables that contain the read counts for RNA bases at each position along the reporter mRNA sequence