GSE232520 Processing Pipeline
GSE
code_examples
3 steps
Publication
High-sensitivity in situ capture of endogenous RNA-protein interactions in fixed cells and primary tissues.Nature communications (2024) — PMID 39152130
Dataset
GSE232520Expanded palette of RNA base editors for comprehensive RBP-RNA interactome studies
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
remove adapter with Cutadapt
$ Bash example
# Install cutadapt (if not already installed) # conda install -c bioconda cutadapt # Example for paired-end reads, common in eCLIP assays. # Replace 'input_R1.fastq.gz' and 'input_R2.fastq.gz' with your actual input files. # Replace adapter sequences if different from standard Illumina adapters. # -a: Adapter sequence for read 1 # -A: Adapter sequence for read 2 # -o: Output file for trimmed read 1 # -p: Output file for trimmed read 2 # -m: Minimum read length after trimming (e.g., 18 bp) # -q: Trim low-quality ends from reads (e.g., quality score 20) # --cores: Number of CPU cores to use # --report=full: Print a full report of the trimming process cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \ -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \ -o trimmed_R1.fastq.gz \ -p trimmed_R2.fastq.gz \ -m 18 \ -q 20 \ --cores=4 \ --report=full \ input_R1.fastq.gz \ input_R2.fastq.gz -
2
align to genome using bowtie2 or bwa-mem
$ Bash example
# Install Bowtie2 and Samtools (if not already installed) # conda install -c bioconda bowtie2 samtools # Define variables # Replace 'path/to/GRCh38_index' with the actual path to your pre-built Bowtie2 genome index for GRCh38. # You can build an index using `bowtie2-build <genome.fa> GRCh38_index`. GENOME_INDEX="path/to/GRCh38_index" READ1="sample_R1.fastq.gz" # Replace with your R1 fastq file READ2="sample_R2.fastq.gz" # Replace with your R2 fastq file OUTPUT_SAM="sample_aligned.sam" OUTPUT_BAM="sample_aligned.bam" SORTED_BAM="sample_aligned.sorted.bam" # Align paired-end reads to the genome using Bowtie2 # -x: specify the genome index basename # -1: specify the first mate reads file # -2: specify the second mate reads file # -S: specify the output SAM file # --threads: number of threads to use (adjust based on available CPU cores) bowtie2 -x "${GENOME_INDEX}" -1 "${READ1}" -2 "${READ2}" -S "${OUTPUT_SAM}" --threads 8 # Convert SAM to BAM format and sort the BAM file by coordinate # samtools view: convert SAM to BAM (-bS) and output to file (-o) # samtools sort: sort BAM file by coordinate and output to file (-o) # -@: number of threads for samtools samtools view -bS "${OUTPUT_SAM}" -o "${OUTPUT_BAM}" -@ 8 samtools sort "${OUTPUT_BAM}" -o "${SORTED_BAM}" -@ 8 # Index the sorted BAM file (optional, but highly recommended for downstream tools like IGV or peak callers) samtools index "${SORTED_BAM}" # Clean up intermediate files (optional) # rm "${OUTPUT_SAM}" "${OUTPUT_BAM}" -
3
generate count tables along reporter sequence using Pysamstats
$ Bash example
# Install Pysamstats (if not already installed) # pip install pysamstats # or # conda install -c bioconda pysamstats # Generate count tables along reporter sequences # This command assumes you have an aligned BAM file, a BED file defining the reporter sequences, # and a reference FASTA file (optional, but good practice for coverage calculations). # The --field reads_all counts all reads overlapping the region. # Adjust --type and --field as needed for specific counting requirements (e.g., base_counts, reads_pp). pysamstats --type coverage \ --field reads_all \ --regions-file reporter_sequences.bed \ --fasta reference.fa \ input.bam > output_reporter_counts.tsv
Tools Used
Raw Source Text
remove adapter with Cutadapt align to genome using bowtie2 or bwa-mem generate count tables along reporter sequence using Pysamstats Assembly: 12X MS2 stem-loop reporter mRNA Supplementary files format and content: pysamstats tables that contain the read counts for RNA bases at each position along the reporter mRNA sequence