GSE277161 Processing Pipeline

RNA-Seq code_examples 4 steps

Publication

Integrated multi-omics analysis of zinc-finger proteins uncovers roles in RNA regulation.

Molecular cell (2024) — PMID 39303722

Dataset

Integrated multi-omics analysis of zinc finger proteins uncovers roles in RNA regulation [Ribo-STAMP cell lines]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

We aligned reads to the human genome version GRCh38 with annotation version Gencode v40 using STAR (v2.7.1a).

STAR v2.7.1a GitHub

$ Bash example

# Install STAR (e.g., using Bioconda)
# conda install -c bioconda star=2.7.1a

# --- Reference Data Setup ---
# The human genome GRCh38 and Gencode v40 annotation are required to build the STAR index.
# Example commands to download and build the index (run once):
# mkdir -p /path/to/STAR_genome_index_GRCh38_Gencode_v40
# cd /path/to/STAR_genome_index_GRCh38_Gencode_v40
# wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_40/GRCh38.primary_assembly.genome.fa.gz
# wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_40/gencode.v40.annotation.gtf.gz
# gunzip GRCh38.primary_assembly.genome.fa.gz
# gunzip gencode.v40.annotation.gtf.gz
#
# STAR --runThreadN 8 --runMode genomeGenerate \
#      --genomeDir /path/to/STAR_genome_index_GRCh38_Gencode_v40 \
#      --genomeFastaFiles GRCh38.primary_assembly.genome.fa \
#      --sjdbGTFfile gencode.v40.annotation.gtf \
#      --sjdbOverhang 100 # Adjust sjdbOverhang based on your read length - 1

# --- Alignment Step ---
# Define input files and genome directory
INPUT_READS_R1="sample_R1.fastq.gz" # Replace with your actual R1 FASTQ file
INPUT_READS_R2="sample_R2.fastq.gz" # Replace with your actual R2 FASTQ file (if paired-end)
GENOME_INDEX_DIR="/path/to/STAR_genome_index_GRCh38_Gencode_v40" # Path to your pre-built STAR index
OUTPUT_PREFIX="aligned_reads_"
NUM_THREADS=8 # Number of threads to use

# Align reads to the human genome (GRCh38 with Gencode v40 annotation)
STAR --runThreadN ${NUM_THREADS} \
     --genomeDir ${GENOME_INDEX_DIR} \
     --readFilesIn ${INPUT_READS_R1} ${INPUT_READS_R2} \
     --readFilesCommand zcat \
     --outFileNamePrefix ${OUTPUT_PREFIX} \
     --outSAMtype BAM SortedByCoordinate \
     --outBAMcompression 6 \
     --limitBAMsortRAM 30000000000 # Adjust based on available RAM (e.g., 30GB)

View on GitHub

Bam files were then filtered to include only read1 values using samtools (v1.16) with option âview -hbf 64.â

samtools v1.16 GitHub

$ Bash example

# Placeholder for input BAM file
# input.bam: The original BAM file to be filtered.
INPUT_BAM="input.bam"

# Placeholder for output BAM file
# read1_filtered.bam: The output BAM file containing only read1 values.
OUTPUT_BAM="read1_filtered.bam"

samtools view -hbf 64 "${INPUT_BAM}" > "${OUTPUT_BAM}"

View on GitHub

C-to-U edit sites were obtained using SAILOR.

SAILOR v1.0

$ Bash example

# Clone the SAILOR repository
# git clone https://github.com/gersteinlab/sailor.git
# cd sailor

# Install dependencies (assuming Python 3 and required libraries like pysam, numpy, scipy)
# pip install pysam numpy scipy

# Example usage: Detect C-to-U RNA editing sites
# Replace 'aligned_reads.bam' with your actual input BAM file (e.g., from STAR or HISAT2 alignment)
# Replace 'hg38.fa' with your reference genome FASTA file (e.g., from UCSC or Ensembl)
# Replace 'c_to_u_edits' with your desired output prefix
python sailor.py -i aligned_reads.bam -r hg38.fa -o c_to_u_edits

Edits were divided by the featurecounts (v1.5.2) output for each geneâs exons to generate EPR values based on GENCODE v40 annotations.

featureCounts v1.5.2 GitHub

$ Bash example

# Install featureCounts (part of Subread package)
# conda install -c bioconda subread

# Define variables
# Placeholder for GENCODE v40 annotations. Replace with the actual path to your GTF file.
ANNOTATION_GTF="/path/to/gencode.v40.annotation.gtf"
# Placeholder for the input BAM file containing aligned reads.
INPUT_BAM="input_aligned_reads.bam"
# Output file for gene exon counts.
OUTPUT_FILE="gene_exon_counts.txt"

# Execute featureCounts to count reads over exons for each gene.
# -a: Specify the annotation file (GTF/GFF).
# -o: Specify the output file for counts.
# -F GTF: Specify that the annotation file is in GTF format.
# -t exon: Count features of type 'exon'.
# -g gene_id: Aggregate counts by 'gene_id' (i.e., sum exon counts for each gene).
# Note: Strandedness (-s 0/1/2) is not specified in the description. Adjust if your data is stranded.
featureCounts -a "${ANNOTATION_GTF}" -o "${OUTPUT_FILE}" -F GTF -t exon -g gene_id "${INPUT_BAM}"

View on GitHub

Tools Used

STAR SAILOR

Raw Source Text

We aligned reads to the human genome version GRCh38 with annotation version Gencode v40 using STAR (v2.7.1a).
Bam files were then filtered to include only read1 values using samtools (v1.16) with option âview -hbf 64.â
C-to-U edit sites were obtained using SAILOR.
Edits were divided by the featurecounts (v1.5.2) output for each geneâs exons to generate EPR values based on GENCODE v40 annotations.
Assembly: GRCh38
Supplementary files format and content: Bam files and EPR (edits-per-read) quantification

← Back to Analysis