GSE224548 Processing Pipeline
RNA-Seq
code_examples
3 steps
Publication
In Vivo Screening Unveils Pervasive RNA-Binding Protein Dependencies in Leukemic Stem Cells and Identifies ELAVL1 as a Therapeutic Target.Blood cancer discovery (2023) — PMID 36763002
Dataset
GSE224548A two-step in vivo CRISPR screen unveils pervasive RNA binding protein dependencies for leukemic stem cells and identifies ELAVL1 as a therapeutic ta…
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Reads were quality checked using fastQC
$ Bash example
# Install FastQC using Conda # conda create -n fastqc_env fastqc -c bioconda -y # conda activate fastqc_env # Run FastQC on input reads # Replace reads.fastq.gz with your actual input file(s) # Replace output_dir with your desired output directory mkdir -p output_dir fastqc reads.fastq.gz -o output_dir
-
2
Sequencing reads were aligned to the hg38 reference genome using STAR (v2.7.2c)
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star # --- Reference Data Setup --- # Download hg38 genome FASTA and GTF files (example from UCSC) # wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz # wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/hg38.ncbiRefSeq.gtf.gz # gunzip hg38.fa.gz # gunzip hg38.ncbiRefSeq.gtf.gz # Create STAR genome index (if not already available) # mkdir -p /path/to/star_index/hg38 # STAR --runMode genomeGenerate \ # --genomeDir /path/to/star_index/hg38 \ # --genomeFastaFiles hg38.fa \ # --sjdbGTFfile hg38.ncbiRefSeq.gtf \ # --sjdbOverhang 100 \ # --runThreadN 16 # --- Alignment Step --- # Define input files and output prefix INPUT_R1="input_R1.fastq.gz" INPUT_R2="input_R2.fastq.gz" # Remove if single-end GENOME_DIR="/path/to/star_index/hg38" OUTPUT_PREFIX="aligned_reads_" THREADS=8 # Execute STAR alignment STAR --genomeDir "${GENOME_DIR}" \ --readFilesIn "${INPUT_R1}" "${INPUT_R2}" \ --runThreadN "${THREADS}" \ --outFileNamePrefix "${OUTPUT_PREFIX}" \ --outSAMtype BAM SortedByCoordinate \ --outSAMunmapped Within \ --outSAMattributes Standard \ --quantMode GeneCounts \ --twopassMode Basic -
3
Gene-level quantification was performed using RSEM (v1.3.1)
$ Bash example
# Install RSEM (example using conda) # conda install -c bioconda rsem # Placeholder for RSEM reference index. # This index should have been built previously using 'rsem-prepare-reference'. # Replace '/path/to/rsem_reference/human_GRCh38' with the actual path to your RSEM reference. RSEM_REFERENCE="/path/to/rsem_reference/human_GRCh38" # Input aligned reads file (e.g., BAM). # Replace 'input.bam' with your actual input file. INPUT_BAM="input.bam" # Output prefix for RSEM results (e.g., gene_quantification_results.genes.results, .isoforms.results) OUTPUT_PREFIX="gene_quantification_results" # Perform gene-level quantification using rsem-calculate-expression. # This command assumes: # 1. Input is a BAM file (--bam). # 2. Reads are paired-end (--paired-end). # Adjust parameters if your input is FASTQ, single-end, or requires specific options (e.g., --strandedness). rsem-calculate-expression \ --bam \ --paired-end \ "${INPUT_BAM}" \ "${RSEM_REFERENCE}" \ "${OUTPUT_PREFIX}"
Tools Used
Raw Source Text
Reads were quality checked using fastQC Sequencing reads were aligned to the hg38 reference genome using STAR (v2.7.2c) Gene-level quantification was performed using RSEM (v1.3.1) Assembly: hg38 Supplementary files format and content: count_matrix.tsv (matrix of gene counts)