GSE224548 Processing Pipeline

RNA-Seq code_examples 3 steps

Publication

In Vivo Screening Unveils Pervasive RNA-Binding Protein Dependencies in Leukemic Stem Cells and Identifies ELAVL1 as a Therapeutic Target.

Blood cancer discovery (2023) — PMID 36763002

Dataset

GSE224548

A two-step in vivo CRISPR screen unveils pervasive RNA binding protein dependencies for leukemic stem cells and identifies ELAVL1 as a therapeutic ta…

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Reads were quality checked using fastQC

FastQC v0.11.9 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install FastQC using Conda
# conda create -n fastqc_env fastqc -c bioconda -y
# conda activate fastqc_env

# Run FastQC on input reads
# Replace reads.fastq.gz with your actual input file(s)
# Replace output_dir with your desired output directory
mkdir -p output_dir
fastqc reads.fastq.gz -o output_dir

View on GitHub

Sequencing reads were aligned to the hg38 reference genome using STAR (v2.7.2c)

STAR v2.7.2c GitHub

$ Bash example

# Install STAR (if not already installed)
# conda install -c bioconda star

# --- Reference Data Setup ---
# Download hg38 genome FASTA and GTF files (example from UCSC)
# wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
# wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/hg38.ncbiRefSeq.gtf.gz
# gunzip hg38.fa.gz
# gunzip hg38.ncbiRefSeq.gtf.gz

# Create STAR genome index (if not already available)
# mkdir -p /path/to/star_index/hg38
# STAR --runMode genomeGenerate \
#      --genomeDir /path/to/star_index/hg38 \
#      --genomeFastaFiles hg38.fa \
#      --sjdbGTFfile hg38.ncbiRefSeq.gtf \
#      --sjdbOverhang 100 \
#      --runThreadN 16

# --- Alignment Step ---
# Define input files and output prefix
INPUT_R1="input_R1.fastq.gz"
INPUT_R2="input_R2.fastq.gz" # Remove if single-end
GENOME_DIR="/path/to/star_index/hg38"
OUTPUT_PREFIX="aligned_reads_"
THREADS=8

# Execute STAR alignment
STAR --genomeDir "${GENOME_DIR}" \
     --readFilesIn "${INPUT_R1}" "${INPUT_R2}" \
     --runThreadN "${THREADS}" \
     --outFileNamePrefix "${OUTPUT_PREFIX}" \
     --outSAMtype BAM SortedByCoordinate \
     --outSAMunmapped Within \
     --outSAMattributes Standard \
     --quantMode GeneCounts \
     --twopassMode Basic

View on GitHub

Gene-level quantification was performed using RSEM (v1.3.1)

RSEM v1.3.1 GitHub

$ Bash example

# Install RSEM (example using conda)
# conda install -c bioconda rsem

# Placeholder for RSEM reference index.
# This index should have been built previously using 'rsem-prepare-reference'.
# Replace '/path/to/rsem_reference/human_GRCh38' with the actual path to your RSEM reference.
RSEM_REFERENCE="/path/to/rsem_reference/human_GRCh38"

# Input aligned reads file (e.g., BAM).
# Replace 'input.bam' with your actual input file.
INPUT_BAM="input.bam"

# Output prefix for RSEM results (e.g., gene_quantification_results.genes.results, .isoforms.results)
OUTPUT_PREFIX="gene_quantification_results"

# Perform gene-level quantification using rsem-calculate-expression.
# This command assumes:
# 1. Input is a BAM file (--bam).
# 2. Reads are paired-end (--paired-end).
# Adjust parameters if your input is FASTQ, single-end, or requires specific options (e.g., --strandedness).
rsem-calculate-expression \
    --bam \
    --paired-end \
    "${INPUT_BAM}" \
    "${RSEM_REFERENCE}" \
    "${OUTPUT_PREFIX}"

View on GitHub

Tools Used

STAR

Raw Source Text

Reads were quality checked using fastQC
Sequencing reads were aligned to the hg38 reference genome using STAR (v2.7.2c)
Gene-level quantification was performed using RSEM (v1.3.1)
Assembly: hg38
Supplementary files format and content: count_matrix.tsv (matrix of gene counts)

← Back to Analysis