GSE153264 Processing Pipeline

RNA-Seq code_examples 3 steps

Publication

Identification of the global miR-130a targetome reveals a role for TBL1XR1 in hematopoietic stem cell self-renewal and t(8;21) AML.

Cell reports (2022) — PMID 35263585

Dataset

GSE153264

Definition of a Small Core Transcriptional Circuit Regulated by AML1-ETO [RNA-seq]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Adaptors were trimmed using trimmomatic (SE for Kasumi-1 cells, PE for CD34+ cells).

Trimmomatic v0.39

$ Bash example

# Install Trimmomatic (if not already installed)
# conda install -c bioconda trimmomatic=0.39

# Define Trimmomatic path and adapter file paths
TRIMMOMATIC_JAR="/path/to/trimmomatic-0.39.jar" # Adjust path to your Trimmomatic .jar file
ADAPTER_FILE_PE="/path/to/Trimmomatic/adapters/TruSeq3-PE.fa" # Path to Illumina adapter file for Paired-End
ADAPTER_FILE_SE="/path/to/Trimmomatic/adapters/TruSeq3-SE.fa" # Path to Illumina adapter file for Single-End
THREADS=8 # Number of threads to use

# --- Scenario 1: Single-End (SE) for Kasumi-1 cells ---
# Assuming input file is kasumi1_raw.fastq.gz
INPUT_SE_FASTQ="kasumi1_raw.fastq.gz"
OUTPUT_SE_TRIMMED="kasumi1_trimmed.fastq.gz"

echo "Trimming adapters for Single-End reads (Kasumi-1 cells)..."
java -jar "${TRIMMOMATIC_JAR}" SE -phred33 \
    "${INPUT_SE_FASTQ}" \
    "${OUTPUT_SE_TRIMMED}" \
    ILLUMINACLIP:"${ADAPTER_FILE_SE}":2:30:10 \
    LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 \
    -threads "${THREADS}"

echo ""

# --- Scenario 2: Paired-End (PE) for CD34+ cells ---
# Assuming input files are cd34_R1_raw.fastq.gz and cd34_R2_raw.fastq.gz
INPUT_PE_R1="cd34_R1_raw.fastq.gz"
INPUT_PE_R2="cd34_R2_raw.fastq.gz"
OUTPUT_PE_R1_PAIRED="cd34_R1_paired.fastq.gz"
OUTPUT_PE_R1_UNPAIRED="cd34_R1_unpaired.fastq.gz"
OUTPUT_PE_R2_PAIRED="cd34_R2_paired.fastq.gz"
OUTPUT_PE_R2_UNPAIRED="cd34_R2_unpaired.fastq.gz"

echo "Trimming adapters for Paired-End reads (CD34+ cells)..."
java -jar "${TRIMMOMATIC_JAR}" PE -phred33 \
    "${INPUT_PE_R1}" "${INPUT_PE_R2}" \
    "${OUTPUT_PE_R1_PAIRED}" "${OUTPUT_PE_R1_UNPAIRED}" \
    "${OUTPUT_PE_R2_PAIRED}" "${OUTPUT_PE_R2_UNPAIRED}" \
    ILLUMINACLIP:"${ADAPTER_FILE_PE}":2:30:10 \
    LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 \
    -threads "${THREADS}"

Trimmed reads were aligned to the human genome (hg19) with TopHat v2.0.11.

TopHat v2.0.11 GitHub

$ Bash example

# Install TopHat v2.0.11
# conda create -n tophat_env tophat=2.0.11 -y
# conda activate tophat_env

# --- Reference Genome Preparation (if not already done) ---
# TopHat v2 uses Bowtie2 for alignment. Ensure hg19 Bowtie2 index is available.
# Download hg19 reference genome FASTA
# mkdir -p ref/hg19
# wget -P ref/hg19 http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz
# gunzip ref/hg19/hg19.fa.gz

# Build Bowtie2 index for hg19
# bowtie2-build ref/hg19/hg19.fa ref/hg19/hg19_index

# --- Alignment Step ---
# Define input and output paths
INPUT_READS="trimmed_reads.fastq" # Placeholder for your trimmed reads file
GENOME_INDEX_PREFIX="ref/hg19/hg19_index" # Path to the Bowtie2 index prefix for hg19
OUTPUT_DIR="tophat_alignment_hg19"

# Run TopHat alignment
# -o: Output directory
# -p: Number of threads (adjust as needed)
tophat -o "${OUTPUT_DIR}" -p 8 "${GENOME_INDEX_PREFIX}" "${INPUT_READS}"

View on GitHub

Differential gene expression was determined using Cuffdiff v.2.1.1.

Cufflinks v2.1.1 GitHub

$ Bash example

# Install Cufflinks (which includes Cuffdiff)
# conda install -c bioconda cufflinks=2.1.1

# Define variables for input and output
# TRANSCRIPTS_GTF is typically generated by Cufflinks or StringTie from aligned reads
TRANSCRIPTS_GTF="path/to/merged_transcripts.gtf"

# Comma-separated BAM files for each condition/sample group
# Replace with actual paths to your aligned BAM files
SAMPLE1_BAMS="path/to/sample1_rep1.bam,path/to/sample1_rep2.bam"
SAMPLE2_BAMS="path/to/sample2_rep1.bam,path/to/sample2_rep2.bam"

OUTPUT_DIR="cuffdiff_output"

# Placeholder for reference genome FASTA and annotation GTF
# Use the latest assembly (e.g., GRCh38/hg38) and corresponding annotation (e.g., Ensembl, Gencode)
GENOME_FASTA="path/to/GRCh38.fa"
GENOME_ANNOTATION_GTF="path/to/GRCh38.gtf"

# Create output directory
mkdir -p "${OUTPUT_DIR}"

# Run Cuffdiff for differential expression analysis
# -o: output directory
# -L: comma-separated list of condition labels (must match the order of BAM groups)
# -b: enable bias correction using a reference genome FASTA file
# -u: use a reference annotation GTF to guide assembly and quantification
cuffdiff -o "${OUTPUT_DIR}" \
         -L "ConditionA,ConditionB" \
         -b "${GENOME_FASTA}" \
         -u "${GENOME_ANNOTATION_GTF}" \
         "${TRANSCRIPTS_GTF}" \
         "${SAMPLE1_BAMS}" \
         "${SAMPLE2_BAMS}"

View on GitHub

Tools Used

TopHat Cufflinks

Raw Source Text

Adaptors were trimmed using trimmomatic (SE for Kasumi-1 cells, PE for CD34+ cells).
Trimmed reads were aligned to the human genome (hg19) with TopHat v2.0.11.
Differential gene expression was determined using Cuffdiff v.2.1.1.
Genome_build: hg19 (GRCh37)
Supplementary_files_format_and_content: *_gene_exp.diff: Cuffdiff differential gene expression output; tab delimited text file.

← Back to Analysis