GSE277161 Processing Pipeline
RNA-Seq
code_examples
4 steps
Publication
Integrated multi-omics analysis of zinc-finger proteins uncovers roles in RNA regulation.Molecular cell (2024) — PMID 39303722
Dataset
GSE277161Integrated multi-omics analysis of zinc finger proteins uncovers roles in RNA regulation [Ribo-STAMP cell lines]
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
We aligned reads to the human genome version GRCh38 with annotation version Gencode v40 using STAR (v2.7.1a).
$ Bash example
# Install STAR (e.g., using Bioconda) # conda install -c bioconda star=2.7.1a # --- Reference Data Setup --- # The human genome GRCh38 and Gencode v40 annotation are required to build the STAR index. # Example commands to download and build the index (run once): # mkdir -p /path/to/STAR_genome_index_GRCh38_Gencode_v40 # cd /path/to/STAR_genome_index_GRCh38_Gencode_v40 # wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_40/GRCh38.primary_assembly.genome.fa.gz # wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_40/gencode.v40.annotation.gtf.gz # gunzip GRCh38.primary_assembly.genome.fa.gz # gunzip gencode.v40.annotation.gtf.gz # # STAR --runThreadN 8 --runMode genomeGenerate \ # --genomeDir /path/to/STAR_genome_index_GRCh38_Gencode_v40 \ # --genomeFastaFiles GRCh38.primary_assembly.genome.fa \ # --sjdbGTFfile gencode.v40.annotation.gtf \ # --sjdbOverhang 100 # Adjust sjdbOverhang based on your read length - 1 # --- Alignment Step --- # Define input files and genome directory INPUT_READS_R1="sample_R1.fastq.gz" # Replace with your actual R1 FASTQ file INPUT_READS_R2="sample_R2.fastq.gz" # Replace with your actual R2 FASTQ file (if paired-end) GENOME_INDEX_DIR="/path/to/STAR_genome_index_GRCh38_Gencode_v40" # Path to your pre-built STAR index OUTPUT_PREFIX="aligned_reads_" NUM_THREADS=8 # Number of threads to use # Align reads to the human genome (GRCh38 with Gencode v40 annotation) STAR --runThreadN ${NUM_THREADS} \ --genomeDir ${GENOME_INDEX_DIR} \ --readFilesIn ${INPUT_READS_R1} ${INPUT_READS_R2} \ --readFilesCommand zcat \ --outFileNamePrefix ${OUTPUT_PREFIX} \ --outSAMtype BAM SortedByCoordinate \ --outBAMcompression 6 \ --limitBAMsortRAM 30000000000 # Adjust based on available RAM (e.g., 30GB) -
2
Bam files were then filtered to include only read1 values using samtools (v1.16) with option âview -hbf 64.â
$ Bash example
# Placeholder for input BAM file # input.bam: The original BAM file to be filtered. INPUT_BAM="input.bam" # Placeholder for output BAM file # read1_filtered.bam: The output BAM file containing only read1 values. OUTPUT_BAM="read1_filtered.bam" samtools view -hbf 64 "${INPUT_BAM}" > "${OUTPUT_BAM}" -
3
C-to-U edit sites were obtained using SAILOR.
SAILOR v1.0$ Bash example
# Clone the SAILOR repository # git clone https://github.com/gersteinlab/sailor.git # cd sailor # Install dependencies (assuming Python 3 and required libraries like pysam, numpy, scipy) # pip install pysam numpy scipy # Example usage: Detect C-to-U RNA editing sites # Replace 'aligned_reads.bam' with your actual input BAM file (e.g., from STAR or HISAT2 alignment) # Replace 'hg38.fa' with your reference genome FASTA file (e.g., from UCSC or Ensembl) # Replace 'c_to_u_edits' with your desired output prefix python sailor.py -i aligned_reads.bam -r hg38.fa -o c_to_u_edits
-
4
Edits were divided by the featurecounts (v1.5.2) output for each geneâs exons to generate EPR values based on GENCODE v40 annotations.
$ Bash example
# Install featureCounts (part of Subread package) # conda install -c bioconda subread # Define variables # Placeholder for GENCODE v40 annotations. Replace with the actual path to your GTF file. ANNOTATION_GTF="/path/to/gencode.v40.annotation.gtf" # Placeholder for the input BAM file containing aligned reads. INPUT_BAM="input_aligned_reads.bam" # Output file for gene exon counts. OUTPUT_FILE="gene_exon_counts.txt" # Execute featureCounts to count reads over exons for each gene. # -a: Specify the annotation file (GTF/GFF). # -o: Specify the output file for counts. # -F GTF: Specify that the annotation file is in GTF format. # -t exon: Count features of type 'exon'. # -g gene_id: Aggregate counts by 'gene_id' (i.e., sum exon counts for each gene). # Note: Strandedness (-s 0/1/2) is not specified in the description. Adjust if your data is stranded. featureCounts -a "${ANNOTATION_GTF}" -o "${OUTPUT_FILE}" -F GTF -t exon -g gene_id "${INPUT_BAM}"
Raw Source Text
We aligned reads to the human genome version GRCh38 with annotation version Gencode v40 using STAR (v2.7.1a). Bam files were then filtered to include only read1 values using samtools (v1.16) with option âview -hbf 64.â C-to-U edit sites were obtained using SAILOR. Edits were divided by the featurecounts (v1.5.2) output for each geneâs exons to generate EPR values based on GENCODE v40 annotations. Assembly: GRCh38 Supplementary files format and content: Bam files and EPR (edits-per-read) quantification