GSE202881 Processing Pipeline
Publication
Pyruvate Kinase M (PKM) binds ribosomes in a poly-ADP ribosylation dependent manner to induce translational stalling.Nucleic acids research (2023) — PMID 37224531
Dataset
GSE202881Pyruvate Kinase M (PKM) binds ribosomes in a poly-ADP ribosylation dependent manner to induce translational stalling
Processing Steps
Generate Jupyter Notebook-
1
Ribosome profiling data were processed using RiboFlow.
Ribo-seq vv1.0.0$ Bash example
# Install Nextflow (if not already installed) # curl -s https://get.nextflow.io | bash # mv nextflow /usr/local/bin/ # Example command for running the RiboFlow pipeline. # This command assumes you have a samplesheet (e.g., samples.csv or samples.tsv) # and reference genome files (FASTA, GTF). # Replace 'path/to/samples.csv', 'path/to/genome.fasta', 'path/to/genome.gtf' # with the actual paths to your input files and reference data. # 'GRCh38' is used as a placeholder for the genome assembly name. # The '-profile docker' or '-profile singularity' is recommended for reproducibility # and requires Docker or Singularity to be installed and running. nextflow run riboflow/riboflow -r v1.0.0 \ -profile docker \ --input "path/to/samples.csv" \ --genome "GRCh38" \ --fasta "path/to/genome.fasta" \ --gtf "path/to/genome.gtf" \ --outdir "riboflow_output" -
2
We extracted the first 12 nucleotides from the 5â end of the reads using UMI-tools with the following parameters: âumi_tools extract -p "^(?P.{12})(?P.{4}).+$" --extract-method=regexâ.
$ Bash example
# Install UMI-tools (example using conda) # conda install -c bioconda umi-tools # Define input and output files (placeholders) INPUT_FASTQ="input_read.fastq.gz" OUTPUT_FASTQ="output_read.fastq.gz" # Extract UMIs from the 5' end of reads using the specified regex pattern. # The pattern captures the first 12 bases as the UMI (named 'umi_1') and # the subsequent 4 bases as a discardable sequence (named 'discard_1'). # The UMI is then appended to the read header and removed from the read sequence. umi_tools extract \ --extract-method=regex \ -p "^(?P<umi_1>.{12})(?P<discard_1>.{4}).+$" \ -I "${INPUT_FASTQ}" \ -S "${OUTPUT_FASTQ}" -
3
The four nucleotides downstream of the UMIs are discarded as they are incorporated during the reverse transcription step.
$ Bash example
# Install cutadapt if not already installed # conda install -c bioconda cutadapt # This command assumes that the UMIs have already been extracted or handled in a preceding step, # and the four nucleotides to be discarded are now at the 5' end of the input reads. # It trims 4 bases from the 5' end of the reads. cutadapt -u 4 -o trimmed_reads.fastq.gz input_reads.fastq.gz
-
4
Next, we used cutadapt to clip the 3â adapter AAAAAAAAAACAAAAAAAAAA.
$ Bash example
# Install cutadapt (if not already installed) # conda install -c bioconda cutadapt # Clip the 3' adapter from a FASTQ file # Replace 'input.fastq.gz' with your actual input file and 'output.fastq.gz' with your desired output file. cutadapt -a AAAAAAAAAACAAAAAAAAAA -o output.fastq.gz input.fastq.gz
-
5
After UMI extraction and adapter trimming, ribosomal and transfer RNAs were filtered by alignment using Bowtie2.
$ Bash example
# Install Bowtie2 (if not already installed) # conda install -c bioconda bowtie2 # Define variables INPUT_FASTQ="trimmed_reads.fastq.gz" # Input FASTQ file after UMI extraction and adapter trimming OUTPUT_FASTQ="filtered_rRNA_tRNA_reads.fastq.gz" # Output FASTQ file containing reads with ribosomal and transfer RNAs removed RRNA_TRNA_INDEX_BASE="rRNA_tRNA_index" # Basename for the Bowtie2 index of ribosomal and transfer RNA sequences NUM_THREADS=8 # Number of threads to use for alignment # --- Reference Data Preparation (Example) --- # For human (e.g., hg38), ribosomal and transfer RNA sequences can be obtained from various sources: # - UCSC Genome Browser: Specific RNA files or extracted from repeatmasker tracks. # - NCBI RefSeq: Individual rRNA (e.g., NR_003286.2 for 18S, NR_003287.2 for 28S) and tRNA sequences. # - Rfam database: Comprehensive collection of RNA families (e.g., RF00001 for 5S rRNA, RF00005 for tRNA). # - A custom combined FASTA file of known ribosomal and transfer RNAs relevant to the organism being studied. # # Example command to build the Bowtie2 index (uncomment and modify if needed): # # Assuming you have a combined FASTA file named 'combined_rRNA_tRNA.fa' # # cat human_rRNA.fa human_tRNA.fa > combined_rRNA_tRNA.fa # bowtie2-build combined_rRNA_tRNA.fa ${RRNA_TRNA_INDEX_BASE} # Run Bowtie2 to filter ribosomal and transfer RNAs # Reads that align to the rRNA/tRNA index are considered ribosomal/transfer RNAs and are discarded. # Reads that do NOT align to the rRNA/tRNA index are kept and written to the output file. bowtie2 \ -x "${RRNA_TRNA_INDEX_BASE}" \ -U "${INPUT_FASTQ}" \ --un-gz "${OUTPUT_FASTQ}" \ -S /dev/null \ -p "${NUM_THREADS}" \ --very-fast # Using a fast preset like --very-fast is common for filtering steps # to quickly identify and remove obvious matches. Other presets like # --fast, --sensitive, or --very-sensitive can be used depending on # the desired stringency and computational resources. -
6
The remaining reads were mapped to human transcriptome and alignments with mapping quality greater than two were retained.
$ Bash example
# Install STAR and Samtools if not already available # conda install -c bioconda star samtools # Define variables READS_FASTQ="remaining_reads.fastq.gz" # Placeholder for input reads (e.g., output from a previous trimming/deduplication step) STAR_INDEX_DIR="/path/to/STAR_human_genome_and_transcriptome_index" # Placeholder for STAR index built from human genome FASTA and GTF OUTPUT_PREFIX="aligned_reads" # Prefix for output files THREADS=8 # Adjust as needed for available CPU cores # 1. Map reads to the human transcriptome (genome with GTF-guided splicing) # Parameters are commonly used in eCLIP pipelines for robust alignment. # --outFilterMultimapNmax 1: Retain only uniquely mapping reads. # --outFilterMismatchNmax 3: Allow up to 3 mismatches. # --outFilterScoreMinOverLread 0.6 and --outFilterMatchNminOverLread 0.6: Ensure good alignment quality relative to read length. STAR --genomeDir "${STAR_INDEX_DIR}" \ --readFilesIn "${READS_FASTQ}" \ --runThreadN "${THREADS}" \ --outFileNamePrefix "${OUTPUT_PREFIX}" \ --outSAMtype BAM SortedByCoordinate \ --outFilterMultimapNmax 1 \ --outFilterMismatchNmax 3 \ --outFilterScoreMinOverLread 0.6 \ --outFilterMatchNminOverLread 0.6 \ --outReadsUnmapped Fastx \ --outSAMattributes All \ --limitBAMsortRAM 30000000000 # Adjust RAM as needed (e.g., 30GB) # 2. Retain alignments with mapping quality greater than two (MAPQ > 2, which means MAPQ >= 3) samtools view -b -h -q 3 "${OUTPUT_PREFIX}Aligned.sortedByCoord.out.bam" > "${OUTPUT_PREFIX}.filtered.bam" # 3. Index the filtered BAM file for downstream processing samtools index "${OUTPUT_PREFIX}.filtered.bam" -
7
UMIs were used for deduplication and .ribo files are created using RiboPy.
RiboPy (Inferred with models/gemini-2.5-flash) v0.1.1 (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install RiboPy (if not already installed) # conda create -n ribopy_env python=3.8 # conda activate ribopy_env # pip install ribopy # Assuming 'aligned_reads_with_umis.bam' is the input BAM file # with UMIs tagged (e.g., in the 'RX' tag, common for UMI-tools output) # and 'sample_name' is the prefix for the output .ribo file. # Create .ribo file with UMI deduplication ribopy count \ --bam aligned_reads_with_umis.bam \ --output sample_name.ribo \ --umi-tag RX \ --dedup -
8
Library strategy: Ribo-seq
$ Bash example
# Install riboviz (example using conda) # conda create -n riboviz_env python=3.8 # conda activate riboviz_env # pip install riboviz # Define input and output paths READS="sample.fastq.gz" # Placeholder for input Ribo-seq FASTQ file (often single-end) OUTPUT_DIR="riboviz_output" CONFIG_FILE="riboviz_config.yaml" # Define reference datasets (placeholders - replace with actual paths) # For human (Homo sapiens), common references include hg38. REFERENCE_GENOME="/path/to/reference/genome/hg38.fa" GENOME_ANNOTATION="/path/to/reference/annotation/gencode.v38.annotation.gtf" # Create a dummy riboviz configuration file # This file defines the analysis parameters, input/output, and reference files. # For a real analysis, this file would be more detailed. # Refer to riboviz documentation for comprehensive configuration options: # https://riboviz.readthedocs.io/en/latest/user_guide/configuration.html cat << EOF > ${CONFIG_FILE} # Example riboviz configuration # This is a simplified example. A real config would be more extensive. dir_in: . dir_out: ${OUTPUT_DIR} rpf_in: - ${READS} riboviz_datadir: /path/to/riboviz_data # Directory containing pre-built indices or other data fasta: ${REFERENCE_GENOME} gtf: ${GENOME_ANNOTATION} features: CDS stop_codons: ["TAG", "TAA", "TGA"] start_codons: ["ATG"] min_read_length: 10 max_read_length: 50 # Add other parameters like adapter sequences, UMI handling, etc. as needed. EOF # Execute riboviz workflow using the generated configuration file # This command runs the riboviz pipeline, performing trimming, alignment, # and ribosome footprint analysis based on the configuration. python -m riboviz.workflow --config ${CONFIG_FILE}
Raw Source Text
Ribosome profiling data were processed using RiboFlow. We extracted the first 12 nucleotides from the 5â end of the reads using UMI-tools with the following parameters: âumi_tools extract -p "^(?P.{12})(?P.{4}).+$" --extract-method=regexâ. The four nucleotides downstream of the UMIs are discarded as they are incorporated during the reverse transcription step. Next, we used cutadapt to clip the 3â adapter AAAAAAAAAACAAAAAAAAAA. After UMI extraction and adapter trimming, ribosomal and transfer RNAs were filtered by alignment using Bowtie2. The remaining reads were mapped to human transcriptome and alignments with mapping quality greater than two were retained. UMIs were used for deduplication and .ribo files are created using RiboPy.
Assembly: hg38
Supplementary files format and content: CSV file containing the RPKM value of detectable transcripts
Library strategy: Ribo-seq