GSE127743 Processing Pipeline
Publication
In Vivo Screening Unveils Pervasive RNA-Binding Protein Dependencies in Leukemic Stem Cells and Identifies ELAVL1 as a Therapeutic Target.Blood cancer discovery (2023) — PMID 36763002
Dataset
GSE127743In vivo CRISPR screening unveils RNA binding protein dependencies for leukemic stem cells and identifies ELAVL1 as a potential therapeutic target [RN…
Processing Steps
Generate Jupyter Notebook-
1
Raw reads were trimmed using cutadapt (v1.14) using the following parameters: -O 5 --match-read-wildcards --times 2 -e 0.0 --quality-cutoff 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG ATCTCGTATGCCGTCTTCTGCTTG CGACAGGTTCAGAGTTCTACAGTCCGACGATC GATCGGAAGAGCACACGTCTGAACTCCAGTCAC AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
$ Bash example
# Install cutadapt if not already installed # conda install -c bioconda cutadapt=1.14 # Example usage for paired-end reads # Replace input_R1.fastq.gz, input_R2.fastq.gz with your actual input files # Replace output_R1_trimmed.fastq.gz, output_R2_trimmed.fastq.gz with your desired output files cutadapt \ -O 5 \ --match-read-wildcards \ --times 2 \ -e 0.0 \ --quality-cutoff 6 \ -m 18 \ -b TCGTATGCCGTCTTCTGCTTG \ -b ATCTCGTATGCCGTCTTCTGCTTG \ -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC \ -b GATCGGAAGAGCACACGTCTGAACTCCAGTCAC \ -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA \ -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT \ -o output_R1_trimmed.fastq.gz \ -p output_R2_trimmed.fastq.gz \ input_R1.fastq.gz \ input_R2.fastq.gz
-
2
Trimmed reads were mapped to and filtered of mouse-specific repeat elements (RepBase 18.05) with STAR (2.4.0i) using the following parameters: --alignEndsType EndToEnd --genomeDir repbase --genomeLoad NoSharedMemory --outBAMcompression 10 --outFileNamePrefix condition1 --outFilterMultimapNmax 10 --outFilterMultimapScoreRange 1 --outFilterScoreMin 10 --outFilterType BySJout --outReadsUnmapped Fastx --outSAMattrRGline ID:foo --outSAMattributes All --outSAMmode Full --outSAMtype BAM Unsorted --outSAMunmapped Within --outStd Log --readFilesIn r1.fastq r2.fastq --runMode alignReads --runThreadN 8
$ Bash example
# STAR (Spliced Transcripts Alignment to a Reference) is a fast RNA-seq aligner. # Installation (example using conda): # conda install -c bioconda star # Note: The 'repbase' genome directory should have been pre-built using STAR's genomeGenerate mode # with mouse-specific repeat elements from RepBase 18.05. STAR \ --runMode alignReads \ --readFilesIn r1.fastq r2.fastq \ --genomeDir repbase \ --alignEndsType EndToEnd \ --genomeLoad NoSharedMemory \ --outBAMcompression 10 \ --outFileNamePrefix condition1 \ --outFilterMultimapNmax 10 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --outFilterType BySJout \ --outReadsUnmapped Fastx \ --outSAMattrRGline ID:foo \ --outSAMattributes All \ --outSAMmode Full \ --outSAMtype BAM Unsorted \ --outSAMunmapped Within \ --outStd Log \ --runThreadN 8
-
3
Reads unmapped to repeat elements were mapped to the mouse genome with STAR using the same parameters as the previous step, using an mm9 index in place of the repeat element index
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star # Install samtools (if not already installed, for indexing BAM files) # conda install -c bioconda samtools # Define variables GENOME_DIR="/path/to/STAR_mm9_index" # Placeholder for mm9 STAR genome index directory INPUT_FASTQ="unmapped_reads_from_repeats.fastq.gz" # Placeholder for input reads (e.g., generated by a previous STAR run with --outReadsUnmapped Fastx) OUTPUT_PREFIX="mm9_aligned" # Prefix for output files NUM_THREADS=8 # Example number of threads; adjust based on available resources # Note: The STAR genome index for mm9 must be pre-built. # Example command to build index (run once): # STAR --runMode genomeGenerate \ # --genomeDir "${GENOME_DIR}" \ # --genomeFastaFiles /path/to/mm9.fa \ # --sjdbGTFfile /path/to/mm9.gtf \ # --runThreadN "${NUM_THREADS}" # Align reads unmapped to repeat elements to the mouse genome (mm9). # Parameters are inferred based on common eCLIP STAR alignment settings from Yeo lab workflows # and general best practices for mapping short RNA fragments, assuming 'same parameters as previous step' # refers to general alignment stringency. STAR \ --genomeDir "${GENOME_DIR}" \ --readFilesIn "${INPUT_FASTQ}" \ --runThreadN "${NUM_THREADS}" \ --outFileNamePrefix "${OUTPUT_PREFIX}_" \ --outSAMtype BAM SortedByCoordinate \ --outFilterMultimapNmax 20 \ --outFilterMismatchNmax 3 \ --outFilterScoreMinOverLread 0.66 \ --outFilterMatchNminOverLread 0.66 \ --alignIntronMax 1 \ --outSAMattributes NH HI AS nM NM MD jM jI XS \ --outSAMunmapped Within \ --outSAMstrandField intronMotif \ --outSAMmapqUnique 255 \ --outSAMprimaryFlag AllBestScore # Index the resulting BAM file for downstream analysis samtools index "${OUTPUT_PREFIX}_Aligned.sortedByCoord.out.bam" -
4
Subread featureCounts (-a gencode.vM1.annotation.gtf -s 2 -p -o counts.txt RN2c.bam) was used to count features using mouse annotations (Gencode vM1)
$ Bash example
# Install Subread package (which includes featureCounts) # conda install -c bioconda subread # Download Gencode vM1 mouse annotation GTF file # For example, from the Gencode website: # wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M1/gencode.vM1.annotation.gtf.gz # gunzip gencode.vM1.annotation.gtf.gz # Execute featureCounts featureCounts -a gencode.vM1.annotation.gtf -s 2 -p -o counts.txt RN2c.bam
Tools Used
Raw Source Text
Raw reads were trimmed using cutadapt (v1.14) using the following parameters: -O 5 --match-read-wildcards --times 2 -e 0.0 --quality-cutoff 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG ATCTCGTATGCCGTCTTCTGCTTG CGACAGGTTCAGAGTTCTACAGTCCGACGATC GATCGGAAGAGCACACGTCTGAACTCCAGTCAC AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT Trimmed reads were mapped to and filtered of mouse-specific repeat elements (RepBase 18.05) with STAR (2.4.0i) using the following parameters: --alignEndsType EndToEnd --genomeDir repbase --genomeLoad NoSharedMemory --outBAMcompression 10 --outFileNamePrefix condition1 --outFilterMultimapNmax 10 --outFilterMultimapScoreRange 1 --outFilterScoreMin 10 --outFilterType BySJout --outReadsUnmapped Fastx --outSAMattrRGline ID:foo --outSAMattributes All --outSAMmode Full --outSAMtype BAM Unsorted --outSAMunmapped Within --outStd Log --readFilesIn r1.fastq r2.fastq --runMode alignReads --runThreadN 8 Reads unmapped to repeat elements were mapped to the mouse genome with STAR using the same parameters as the previous step, using an mm9 index in place of the repeat element index Subread featureCounts (-a gencode.vM1.annotation.gtf -s 2 -p -o counts.txt RN2c.bam) was used to count features using mouse annotations (Gencode vM1) Genome_build: mm9 Supplementary_files_format_and_content: counts.txt contains read counts from mm9-mapped BAM files