GSE127743 Processing Pipeline

RNA-Seq code_examples 4 steps

Publication

In Vivo Screening Unveils Pervasive RNA-Binding Protein Dependencies in Leukemic Stem Cells and Identifies ELAVL1 as a Therapeutic Target.

Blood cancer discovery (2023) — PMID 36763002

Dataset

GSE127743

In vivo CRISPR screening unveils RNA binding protein dependencies for leukemic stem cells and identifies ELAVL1 as a potential therapeutic target [RN…

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Raw reads were trimmed using cutadapt (v1.14) using the following parameters: -O 5 --match-read-wildcards --times 2 -e 0.0 --quality-cutoff 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG ATCTCGTATGCCGTCTTCTGCTTG CGACAGGTTCAGAGTTCTACAGTCCGACGATC GATCGGAAGAGCACACGTCTGAACTCCAGTCAC AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT

cutadapt v1.14 GitHub

$ Bash example

# Install cutadapt if not already installed
# conda install -c bioconda cutadapt=1.14

# Example usage for paired-end reads
# Replace input_R1.fastq.gz, input_R2.fastq.gz with your actual input files
# Replace output_R1_trimmed.fastq.gz, output_R2_trimmed.fastq.gz with your desired output files

cutadapt \
  -O 5 \
  --match-read-wildcards \
  --times 2 \
  -e 0.0 \
  --quality-cutoff 6 \
  -m 18 \
  -b TCGTATGCCGTCTTCTGCTTG \
  -b ATCTCGTATGCCGTCTTCTGCTTG \
  -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC \
  -b GATCGGAAGAGCACACGTCTGAACTCCAGTCAC \
  -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA \
  -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT \
  -o output_R1_trimmed.fastq.gz \
  -p output_R2_trimmed.fastq.gz \
  input_R1.fastq.gz \
  input_R2.fastq.gz

View on GitHub

Trimmed reads were mapped to and filtered of mouse-specific repeat elements (RepBase 18.05) with STAR (2.4.0i) using the following parameters: --alignEndsType EndToEnd --genomeDir repbase --genomeLoad NoSharedMemory --outBAMcompression 10 --outFileNamePrefix condition1 --outFilterMultimapNmax 10 --outFilterMultimapScoreRange 1 --outFilterScoreMin 10 --outFilterType BySJout --outReadsUnmapped Fastx --outSAMattrRGline ID:foo --outSAMattributes All --outSAMmode Full --outSAMtype BAM Unsorted --outSAMunmapped Within --outStd Log --readFilesIn r1.fastq r2.fastq --runMode alignReads --runThreadN 8

STAR v2.4.0i GitHub

$ Bash example

# STAR (Spliced Transcripts Alignment to a Reference) is a fast RNA-seq aligner.
# Installation (example using conda):
# conda install -c bioconda star

# Note: The 'repbase' genome directory should have been pre-built using STAR's genomeGenerate mode
# with mouse-specific repeat elements from RepBase 18.05.

STAR \
  --runMode alignReads \
  --readFilesIn r1.fastq r2.fastq \
  --genomeDir repbase \
  --alignEndsType EndToEnd \
  --genomeLoad NoSharedMemory \
  --outBAMcompression 10 \
  --outFileNamePrefix condition1 \
  --outFilterMultimapNmax 10 \
  --outFilterMultimapScoreRange 1 \
  --outFilterScoreMin 10 \
  --outFilterType BySJout \
  --outReadsUnmapped Fastx \
  --outSAMattrRGline ID:foo \
  --outSAMattributes All \
  --outSAMmode Full \
  --outSAMtype BAM Unsorted \
  --outSAMunmapped Within \
  --outStd Log \
  --runThreadN 8

View on GitHub

Reads unmapped to repeat elements were mapped to the mouse genome with STAR using the same parameters as the previous step, using an mm9 index in place of the repeat element index

STAR v2.7.9a GitHub

$ Bash example

# Install STAR (if not already installed)
# conda install -c bioconda star

# Install samtools (if not already installed, for indexing BAM files)
# conda install -c bioconda samtools

# Define variables
GENOME_DIR="/path/to/STAR_mm9_index" # Placeholder for mm9 STAR genome index directory
INPUT_FASTQ="unmapped_reads_from_repeats.fastq.gz" # Placeholder for input reads (e.g., generated by a previous STAR run with --outReadsUnmapped Fastx)
OUTPUT_PREFIX="mm9_aligned" # Prefix for output files
NUM_THREADS=8 # Example number of threads; adjust based on available resources

# Note: The STAR genome index for mm9 must be pre-built. 
# Example command to build index (run once):
# STAR --runMode genomeGenerate \
#   --genomeDir "${GENOME_DIR}" \
#   --genomeFastaFiles /path/to/mm9.fa \
#   --sjdbGTFfile /path/to/mm9.gtf \
#   --runThreadN "${NUM_THREADS}"

# Align reads unmapped to repeat elements to the mouse genome (mm9).
# Parameters are inferred based on common eCLIP STAR alignment settings from Yeo lab workflows
# and general best practices for mapping short RNA fragments, assuming 'same parameters as previous step'
# refers to general alignment stringency.
STAR \
  --genomeDir "${GENOME_DIR}" \
  --readFilesIn "${INPUT_FASTQ}" \
  --runThreadN "${NUM_THREADS}" \
  --outFileNamePrefix "${OUTPUT_PREFIX}_" \
  --outSAMtype BAM SortedByCoordinate \
  --outFilterMultimapNmax 20 \
  --outFilterMismatchNmax 3 \
  --outFilterScoreMinOverLread 0.66 \
  --outFilterMatchNminOverLread 0.66 \
  --alignIntronMax 1 \
  --outSAMattributes NH HI AS nM NM MD jM jI XS \
  --outSAMunmapped Within \
  --outSAMstrandField intronMotif \
  --outSAMmapqUnique 255 \
  --outSAMprimaryFlag AllBestScore

# Index the resulting BAM file for downstream analysis
samtools index "${OUTPUT_PREFIX}_Aligned.sortedByCoord.out.bam"

View on GitHub

Subread featureCounts (-a gencode.vM1.annotation.gtf -s 2 -p -o counts.txt RN2c.bam) was used to count features using mouse annotations (Gencode vM1)

featureCounts vNot specified in description GitHub

$ Bash example

# Install Subread package (which includes featureCounts)
# conda install -c bioconda subread

# Download Gencode vM1 mouse annotation GTF file
# For example, from the Gencode website:
# wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M1/gencode.vM1.annotation.gtf.gz
# gunzip gencode.vM1.annotation.gtf.gz

# Execute featureCounts
featureCounts -a gencode.vM1.annotation.gtf -s 2 -p -o counts.txt RN2c.bam

View on GitHub

Tools Used

STAR

Raw Source Text

Raw reads were trimmed using cutadapt (v1.14) using the following parameters: -O 5 --match-read-wildcards --times 2 -e 0.0 --quality-cutoff 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG ATCTCGTATGCCGTCTTCTGCTTG CGACAGGTTCAGAGTTCTACAGTCCGACGATC GATCGGAAGAGCACACGTCTGAACTCCAGTCAC AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
Trimmed reads were mapped to and filtered of mouse-specific repeat elements (RepBase 18.05) with STAR (2.4.0i) using the following parameters: --alignEndsType  EndToEnd  --genomeDir  repbase  --genomeLoad  NoSharedMemory  --outBAMcompression  10  --outFileNamePrefix  condition1  --outFilterMultimapNmax  10  --outFilterMultimapScoreRange  1  --outFilterScoreMin  10  --outFilterType  BySJout  --outReadsUnmapped  Fastx  --outSAMattrRGline  ID:foo  --outSAMattributes  All  --outSAMmode  Full  --outSAMtype  BAM  Unsorted  --outSAMunmapped  Within  --outStd  Log  --readFilesIn  r1.fastq  r2.fastq  --runMode  alignReads  --runThreadN  8
Reads unmapped to repeat elements were mapped to the mouse genome with STAR using the same parameters as the previous step, using an mm9 index in place of the repeat element index
Subread featureCounts (-a gencode.vM1.annotation.gtf -s 2 -p -o counts.txt RN2c.bam) was used to count features using mouse annotations (Gencode vM1)
Genome_build: mm9
Supplementary_files_format_and_content: counts.txt contains read counts from mm9-mapped BAM files

← Back to Analysis