GSE240014 Processing Pipeline
RNA-Seq
code_examples
6 steps
Publication
High-sensitivity in situ capture of endogenous RNA-protein interactions in fixed cells and primary tissues.Nature communications (2024) — PMID 39152130
Dataset
GSE240014An in situ method for identification of transcriptome-wide protein-RNA interactions in cells [in_situ_STAMP]
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
remove adapter with Cutadapt
$ Bash example
# Install Cutadapt (if not already installed) # conda install -c bioconda cutadapt # Define input and output file paths (placeholders) INPUT_FASTQ="input.fastq.gz" OUTPUT_TRIMMED_FASTQ="output_trimmed.fastq.gz" # Define common Illumina adapter sequence (placeholder, adjust if a specific adapter is known) # This example uses a common Illumina 3' adapter sequence. ADAPTER_SEQUENCE="AGATCGGAAGAGC" # Run Cutadapt to remove 3' adapters, perform quality trimming, and filter by minimum length. # -a: Specifies a 3' adapter sequence. # -q 20: Trims low-quality bases from the 3' end with a quality cutoff of 20. # --minimum-length 20: Discards reads shorter than 20 bases after trimming. # -o: Specifies the output file for trimmed reads. cutadapt -a "${ADAPTER_SEQUENCE}" \ -q 20 \ --minimum-length 20 \ -o "${OUTPUT_TRIMMED_FASTQ}" \ "${INPUT_FASTQ}" -
2
align to hg38 using STAR 2.4.0 (Homo sapiens) or mm10 using STAR 2.5.2 (Mus musculus)
$ Bash example
# Install STAR (example using conda) # conda install -c bioconda star=2.4.0 # Define variables GENOME_DIR="/path/to/STAR_index/hg38" # Placeholder for hg38 STAR index READ1="sample_R1.fastq.gz" # Placeholder for input read 1 FASTQ file READ2="sample_R2.fastq.gz" # Placeholder for input read 2 FASTQ file (remove if single-end) OUTPUT_PREFIX="sample_aligned" NUM_THREADS=8 # Number of threads to use # Create output directory if it doesn't exist mkdir -p "${OUTPUT_PREFIX}_dir" # Run STAR alignment for paired-end reads STAR --genomeDir "${GENOME_DIR}" \ --readFilesIn "${READ1}" "${READ2}" \ --runThreadN "${NUM_THREADS}" \ --outFileNamePrefix "${OUTPUT_PREFIX}_dir/${OUTPUT_PREFIX}_" \ --outSAMtype BAM SortedByCoordinate \ --outSAMunmapped Within \ --outSAMattributes Standard \ --outFilterType BySJout \ --outFilterMultimapNmax 20 \ --alignSJDBoverhangMin 1 \ --alignSJoverhangMin 8 \ --alignIntronMin 20 \ --alignIntronMax 1000000 \ --alignMatesGapMax 1000000 \ --limitBAMsortRAM 31000000000 # Example: 31GB RAM for sorting (adjust based on available RAM) -
3
SAILOR analysis to call C-to-U edits and keep only sites with score >0.5 and edit fraction <80%
SAILOR vv0.1.0$ Bash example
# Install SAILOR (if not already installed) # git clone https://github.com/gersteinlab/SAILOR.git # cd SAILOR # # It is recommended to create a conda environment for SAILOR: # # conda env create -f environment.yml # # conda activate SAILOR_env # Example usage for calling C-to-U edits with specified filters. # Replace <input.bam>, <reference.fasta>, and <output_prefix> with actual file paths. # The default parameters for minimum score (-s 0.5) and maximum edit fraction (-f 0.8) # directly correspond to the description's criteria (score >0.5 and edit fraction <80%). # A common reference genome for human would be hg38.fa. python SAILOR.py \ -i <input.bam> \ -r <reference.fasta> \ -o <output_prefix> \ -s 0.5 \ -f 0.8 -
4
FLARE analysis to call C-to-U edit clusters
$ Bash example
# Install FLARE (if not already available in the environment) # It's recommended to clone the repository and run from source or add to PATH: # git clone https://github.com/yeolab/flare.git # cd flare # # Add the flare directory to your PATH or run scripts directly from here # # export PATH=$(pwd):$PATH # Define input and output paths INPUT_BAM="aligned_reads.bam" # Replace with your actual aligned BAM file REFERENCE_GENOME="GRCh38.fa" # Replace with your actual reference genome FASTA (e.g., from GENCODE, Ensembl) OUTPUT_DIR="flare_output" CHROM_SIZES="GRCh38.chrom.sizes" # Replace with your actual chromosome sizes file (e.g., from UCSC table browser) # Create output directory mkdir -p "${OUTPUT_DIR}" # Run FLARE analysis to call C-to-U edit clusters # -i: Input BAM file # -g: Reference genome FASTA file # -o: Output directory # -c: Chromosome sizes file (optional but good practice for filtering) # -s: Strand-specific (use if your library is strand-specific, e.g., dUTP) # -m: Minimum coverage (e.g., 10 reads) # -q: Minimum base quality (e.g., 20) # -e: Minimum edit fraction (e.g., 0.1, meaning at least 10% of reads show the edit) python flare/flare.py \ -i "${INPUT_BAM}" \ -g "${REFERENCE_GENOME}" \ -o "${OUTPUT_DIR}" \ -c "${CHROM_SIZES}" \ -m 10 \ -q 20 \ -e 0.1 \ -s # Use -s for strand-specific libraries -
5
Intersect the edit clusters from 3 replicates, which yields "*confident_peaks.bed"
$ Bash example
# Install Python (if not already available) # conda create -n merge_peaks_env python=3.8 # conda activate merge_peaks_env # Install pybedtools, a dependency for intersect_peaks.py # conda install -c bioconda pybedtools # Clone the merge_peaks repository if not already present # git clone https://github.com/yeolab/merge_peaks.git # cd merge_peaks # Assuming input edit cluster BED files are named rep1_clusters.bed, rep2_clusters.bed, rep3_clusters.bed # and the script is in the current directory or accessible via PATH python intersect_peaks.py \ --input_files rep1_clusters.bed rep2_clusters.bed rep3_clusters.bed \ --min_replicates 3 \ --output_file confident_peaks.bed
-
6
Subtract STAMP confident clusters to Buffer only control, which yields "*cleaned_confident_peaks.bed"
$ Bash example
# Install bedtools if not already installed # conda install -c bioconda bedtools # Define placeholder input files based on the description # Replace these with actual file paths from your pipeline STAMP_CONFIDENT_CLUSTERS_BED="stamp_confident_clusters.bed" BUFFER_ONLY_CONTROL_BED="buffer_only_control.bed" # Define the output file name as specified CLEANED_CONFIDENT_PEAKS_BED="cleaned_confident_peaks.bed" # Subtract the buffer-only control regions from the STAMP confident clusters # The -a option specifies the file from which features are subtracted (STAMP clusters) # The -b option specifies the file containing features to subtract (Buffer only control) bedtools subtract -a "${STAMP_CONFIDENT_CLUSTERS_BED}" -b "${BUFFER_ONLY_CONTROL_BED}" > "${CLEANED_CONFIDENT_PEAKS_BED}"
Raw Source Text
remove adapter with Cutadapt align to hg38 using STAR 2.4.0 (Homo sapiens) or mm10 using STAR 2.5.2 (Mus musculus) SAILOR analysis to call C-to-U edits and keep only sites with score >0.5 and edit fraction <80% FLARE analysis to call C-to-U edit clusters Intersect the edit clusters from 3 replicates, which yields "*confident_peaks.bed" Subtract STAMP confident clusters to Buffer only control, which yields "*cleaned_confident_peaks.bed" Assembly: hg38 Assembly: mm10 Supplementary files format and content: SAILOR step yields bed file: *0.5Score0.8Fraction.fastqTr.sorted.STARUnmapped.out.sorted.STARAligned.out.sorted.bam.combined.readfiltered.formatted.varfiltered.snpfiltered.ranked.bed Supplementary files format and content: FLARE step yields .tsv file: "*merged_sorted_peaks.fdr_0.1.d_15.scored.tsv" Supplementary files format and content: Intersection step yields .bed file: "*confident_peaks.bed" Supplementary files format and content: Subtraction step yields .bed file: "*cleaned_confident_peaks.bed"