GSE224548 Processing Pipeline

RNA-Seq code_examples 3 steps

Publication

In Vivo Screening Unveils Pervasive RNA-Binding Protein Dependencies in Leukemic Stem Cells and Identifies ELAVL1 as a Therapeutic Target.

Blood cancer discovery (2023) — PMID 36763002

Dataset

GSE224548

A two-step in vivo CRISPR screen unveils pervasive RNA binding protein dependencies for leukemic stem cells and identifies ELAVL1 as a therapeutic ta…

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Reads were quality checked using fastQC

    FastQC v0.11.9 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install FastQC using Conda
    # conda create -n fastqc_env fastqc -c bioconda -y
    # conda activate fastqc_env
    
    # Run FastQC on input reads
    # Replace reads.fastq.gz with your actual input file(s)
    # Replace output_dir with your desired output directory
    mkdir -p output_dir
    fastqc reads.fastq.gz -o output_dir
  2. 2

    Sequencing reads were aligned to the hg38 reference genome using STAR (v2.7.2c)

    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star
    
    # --- Reference Data Setup ---
    # Download hg38 genome FASTA and GTF files (example from UCSC)
    # wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
    # wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/hg38.ncbiRefSeq.gtf.gz
    # gunzip hg38.fa.gz
    # gunzip hg38.ncbiRefSeq.gtf.gz
    
    # Create STAR genome index (if not already available)
    # mkdir -p /path/to/star_index/hg38
    # STAR --runMode genomeGenerate \
    #      --genomeDir /path/to/star_index/hg38 \
    #      --genomeFastaFiles hg38.fa \
    #      --sjdbGTFfile hg38.ncbiRefSeq.gtf \
    #      --sjdbOverhang 100 \
    #      --runThreadN 16
    
    # --- Alignment Step ---
    # Define input files and output prefix
    INPUT_R1="input_R1.fastq.gz"
    INPUT_R2="input_R2.fastq.gz" # Remove if single-end
    GENOME_DIR="/path/to/star_index/hg38"
    OUTPUT_PREFIX="aligned_reads_"
    THREADS=8
    
    # Execute STAR alignment
    STAR --genomeDir "${GENOME_DIR}" \
         --readFilesIn "${INPUT_R1}" "${INPUT_R2}" \
         --runThreadN "${THREADS}" \
         --outFileNamePrefix "${OUTPUT_PREFIX}" \
         --outSAMtype BAM SortedByCoordinate \
         --outSAMunmapped Within \
         --outSAMattributes Standard \
         --quantMode GeneCounts \
         --twopassMode Basic
  3. 3

    Gene-level quantification was performed using RSEM (v1.3.1)

    RSEM v1.3.1 GitHub
    $ Bash example
    # Install RSEM (example using conda)
    # conda install -c bioconda rsem
    
    # Placeholder for RSEM reference index.
    # This index should have been built previously using 'rsem-prepare-reference'.
    # Replace '/path/to/rsem_reference/human_GRCh38' with the actual path to your RSEM reference.
    RSEM_REFERENCE="/path/to/rsem_reference/human_GRCh38"
    
    # Input aligned reads file (e.g., BAM).
    # Replace 'input.bam' with your actual input file.
    INPUT_BAM="input.bam"
    
    # Output prefix for RSEM results (e.g., gene_quantification_results.genes.results, .isoforms.results)
    OUTPUT_PREFIX="gene_quantification_results"
    
    # Perform gene-level quantification using rsem-calculate-expression.
    # This command assumes:
    # 1. Input is a BAM file (--bam).
    # 2. Reads are paired-end (--paired-end).
    # Adjust parameters if your input is FASTQ, single-end, or requires specific options (e.g., --strandedness).
    rsem-calculate-expression \
        --bam \
        --paired-end \
        "${INPUT_BAM}" \
        "${RSEM_REFERENCE}" \
        "${OUTPUT_PREFIX}"

Tools Used

Raw Source Text
Reads were quality checked using fastQC
Sequencing reads were aligned to the hg38 reference genome using STAR (v2.7.2c)
Gene-level quantification was performed using RSEM (v1.3.1)
Assembly: hg38
Supplementary files format and content: count_matrix.tsv (matrix of gene counts)
← Back to Analysis