GSE135300 Processing Pipeline

RNA-Seq code_examples 2 steps

Publication

An in vivo genome-wide CRISPR screen identifies the RNA-binding protein Staufen2 as a key regulator of myeloid leukemia.

Nature cancer (2020) — PMID 34109316

Dataset

GSE135300

Next Generation Sequencing: in vivo genome-wide CRISPR sgRNA screen in primary cancer-initiating and propagating bcCML stem cells

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Reads were counted by first searching for the CACCG sequence in the primary read file that appears in the vector 5â to all sgRNA inserts.

grep (Inferred with models/gemini-2.5-flash) vN/A

$ Bash example

# This command counts the number of lines in the primary read file that contain the sequence "CACCG".
# The '-c' option tells grep to output only a count of matching lines.
# Replace 'primary_read_file.fastq' with the actual path to your read file.
grep -c "CACCG" primary_read_file.fastq > cac_sequence_count.txt

The next 20 nts are the sgRNA insert, which was then mapped to a reference file of all possible sgRNAs present in the library.

bowtie2 (Inferred with models/gemini-2.5-flash) v2.4.5 GitHub

$ Bash example

# Install bowtie2 if not already installed
# conda install -c bioconda bowtie2

# Define input and output files
# Assuming 'sgRNA_reads.fastq' contains the extracted 20nt sgRNA inserts
SGRNA_READS="sgRNA_reads.fastq"
# 'sgRNA_library.fasta' is the reference file of all possible sgRNAs
SGRNA_LIBRARY_FASTA="sgRNA_library.fasta"
INDEX_PREFIX="sgRNA_library_index"
OUTPUT_SAM="sgRNA_mapped.sam"
THREADS=8 # Example number of threads

# 1. Build Bowtie2 index for the sgRNA library
# This step only needs to be run once for a given reference library
bowtie2-build "${SGRNA_LIBRARY_FASTA}" "${INDEX_PREFIX}"

# 2. Map the sgRNA inserts to the indexed library
# -U for single-end reads
# -S for SAM output
# -p for number of threads
bowtie2 -x "${INDEX_PREFIX}" -U "${SGRNA_READS}" -S "${OUTPUT_SAM}" -p "${THREADS}"

# Optional: Convert SAM to BAM, sort, and index
# samtools view -bS "${OUTPUT_SAM}" > "${OUTPUT_SAM%.sam}.bam"
# samtools sort "${OUTPUT_SAM%.sam}.bam" -o "${OUTPUT_SAM%.sam}.sorted.bam"
# samtools index "${OUTPUT_SAM%.sam}.sorted.bam"

View on GitHub

Raw Source Text

Reads were counted by first searching for the CACCG sequence in the primary read file that appears in the vector 5â to all sgRNA inserts. The next 20 nts are the sgRNA insert, which was then mapped to a reference file of all possible sgRNAs present in the library.
Supplementary_files_format_and_content: tab-delimited text file, includes sgRNA code and count

← Back to Analysis