GSE232515 Processing Pipeline
RNA-Seq
code_examples
4 steps
Publication
High-sensitivity in situ capture of endogenous RNA-protein interactions in fixed cells and primary tissues.Nature communications (2024) — PMID 39152130
Dataset
GSE232515Expanded repertoire for RNA-editing-based detection for RNA binding protein interactions (3)
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
remove adapter with Cutadapt
$ Bash example
# Install cutadapt (example using conda) # conda install -c bioconda cutadapt # Define input and output files INPUT_FASTQ="path/to/your/input_reads.fastq.gz" OUTPUT_FASTQ="path/to/your/trimmed_reads.fastq.gz" # Define adapter sequence(s) # For eCLIP, this sequence is typically specific to the library preparation kit. # Example: Illumina TruSeq RNA 3' Adapter ADAPTER_3PRIME="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" # If a 5' adapter is also present, define it here: # ADAPTER_5PRIME="YOUR_5_PRIME_ADAPTER_SEQUENCE" # Define trimming parameters MIN_LENGTH=18 # Minimum read length after trimming QUALITY_CUTOFF=20 # Quality cutoff for trimming low-quality bases from 3' end # Run cutadapt to remove 3' adapter # -a: 3' adapter sequence (for single-end or 3' end of forward read in paired-end) # -o: output file for trimmed reads # --minimum-length: discard reads shorter than MIN_LENGTH after trimming # --quality-cutoff: trim low-quality bases from the 3' end based on QUALITY_CUTOFF cutadapt -a "${ADAPTER_3PRIME}" \ -o "${OUTPUT_FASTQ}" \ --minimum-length "${MIN_LENGTH}" \ --quality-cutoff "${QUALITY_CUTOFF}" \ "${INPUT_FASTQ}" # For paired-end reads, the command would look like this: # INPUT_R1="path/to/your/input_R1.fastq.gz" # INPUT_R2="path/to/your/input_R2.fastq.gz" # OUTPUT_R1="path/to/your/trimmed_R1.fastq.gz" # OUTPUT_R2="path/to/your/trimmed_R2.fastq.gz" # ADAPTER_FWD="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" # Adapter for R1 # ADAPTER_REV="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT" # Adapter for R2 (often reverse complement or different) # cutadapt -a "${ADAPTER_FWD}" -A "${ADAPTER_REV}" \ # -o "${OUTPUT_R1}" -p "${OUTPUT_R2}" \ # --minimum-length "${MIN_LENGTH}" \ # --quality-cutoff "${QUALITY_CUTOFF}" \ # "${INPUT_R1}" "${INPUT_R2}" -
2
align to human genome using STAR 2.7.6a
$ Bash example
# Define variables FASTQ_R1="sample_R1.fastq.gz" FASTQ_R2="sample_R2.fastq.gz" # Remove if single-end GENOME_DIR="/path/to/STAR_human_GRCh38_index" # Replace with actual path to STAR genome index (e.g., GRCh38/hg38) OUTPUT_PREFIX="sample_aligned_" THREADS=8 # Adjust as needed # Installation (example, uncomment and modify if needed) # conda install -c bioconda star=2.7.6a # Run STAR alignment STAR \ --genomeDir "${GENOME_DIR}" \ --readFilesIn "${FASTQ_R1}" "${FASTQ_R2}" \ --runThreadN "${THREADS}" \ --outFileNamePrefix "${OUTPUT_PREFIX}" \ --outSAMtype BAM SortedByCoordinate \ --readFilesCommand zcat \ --outSAMattributes Standard \ --quantMode GeneCounts \ --twopassMode Basic -
3
SAILOR analysis of data for C-to-U edits
SAILOR vNot specified$ Bash example
# Install Miniconda or Anaconda if not already installed # conda create -n sailor_env python=2.7 pysam numpy scipy # conda activate sailor_env # git clone https://github.com/gersteinlab/SAILOR.git # cd SAILOR # Define input and output paths INPUT_BAM="input.bam" # Placeholder: Path to your aligned RNA-seq BAM file OUTPUT_PREFIX="sailor_output" # Prefix for output files SAMPLE_NAME="sample1" # A name for the sample # Define reference files (using latest human assembly as placeholder if not specified) REFERENCE_FASTA="GRCh38.p14.genome.fa" # Placeholder: Path to reference genome FASTA GENE_ANNOTATION_GTF="gencode.v45.annotation.gtf" # Placeholder: Path to gene annotation GTF # Execute SAILOR for C-to-U edit detection # Assuming SAILOR.py is in the current directory or in PATH # Default parameters are used as no specific parameters were provided in the description. # -c: Minimum coverage (default 10) # -q: Minimum base quality (default 20) # -m: Minimum mapping quality (default 20) # -e: Minimum editing ratio (default 0.1) # -d: Minimum depth for editing site (default 5) python SAILOR.py \ -i "${INPUT_BAM}" \ -o "${OUTPUT_PREFIX}" \ -r "${REFERENCE_FASTA}" \ -g "${GENE_ANNOTATION_GTF}" \ -s "${SAMPLE_NAME}" \ -c 10 \ -q 20 \ -m 20 \ -e 0.1 \ -d 5 -
4
SAILOR analysis of data for A-to-I edits
SAILOR vv1.0.0$ Bash example
# Install SAILOR (if not already installed) # pip install SAILOR # Define input and output paths INPUT_BAM="path/to/your/aligned_reads.bam" # Replace with your input BAM file REFERENCE_FASTA="path/to/your/reference_genome.fa" # e.g., hg38.fa, replace with your reference genome FASTA OUTPUT_DIR="sailor_analysis_output" # Create output directory if it doesn't exist mkdir -p "${OUTPUT_DIR}" # Run SAILOR for A-to-I RNA editing detection # Adjust parameters like --min-coverage, --min-base-quality, --min-mapping-quality as needed SAILOR -i "${INPUT_BAM}" -r "${REFERENCE_FASTA}" -o "${OUTPUT_DIR}"
Raw Source Text
remove adapter with Cutadapt align to human genome using STAR 2.7.6a SAILOR analysis of data for C-to-U edits SAILOR analysis of data for A-to-I edits Assembly: GRCh38 Supplementary files format and content: cleaned = RBFOX2-rBE data with the rBE only edit clusters present in all three replicates were substracted Supplementary files format and content: ai = A-to-I edits Supplementary files format and content: ct = C-to-U edits Supplementary files format and content: both = both A-to-I and C-to-U edits considered simultaneously