GSE153264 Processing Pipeline

RNA-Seq code_examples 3 steps

Publication

Identification of the global miR-130a targetome reveals a role for TBL1XR1 in hematopoietic stem cell self-renewal and t(8;21) AML.

Cell reports (2022) — PMID 35263585

Dataset

GSE153264

Definition of a Small Core Transcriptional Circuit Regulated by AML1-ETO [RNA-seq]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Adaptors were trimmed using trimmomatic (SE for Kasumi-1 cells, PE for CD34+ cells).

    Trimmomatic v0.39
    $ Bash example
    # Install Trimmomatic (if not already installed)
    # conda install -c bioconda trimmomatic=0.39
    
    # Define Trimmomatic path and adapter file paths
    TRIMMOMATIC_JAR="/path/to/trimmomatic-0.39.jar" # Adjust path to your Trimmomatic .jar file
    ADAPTER_FILE_PE="/path/to/Trimmomatic/adapters/TruSeq3-PE.fa" # Path to Illumina adapter file for Paired-End
    ADAPTER_FILE_SE="/path/to/Trimmomatic/adapters/TruSeq3-SE.fa" # Path to Illumina adapter file for Single-End
    THREADS=8 # Number of threads to use
    
    # --- Scenario 1: Single-End (SE) for Kasumi-1 cells ---
    # Assuming input file is kasumi1_raw.fastq.gz
    INPUT_SE_FASTQ="kasumi1_raw.fastq.gz"
    OUTPUT_SE_TRIMMED="kasumi1_trimmed.fastq.gz"
    
    echo "Trimming adapters for Single-End reads (Kasumi-1 cells)..."
    java -jar "${TRIMMOMATIC_JAR}" SE -phred33 \
        "${INPUT_SE_FASTQ}" \
        "${OUTPUT_SE_TRIMMED}" \
        ILLUMINACLIP:"${ADAPTER_FILE_SE}":2:30:10 \
        LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 \
        -threads "${THREADS}"
    
    echo ""
    
    # --- Scenario 2: Paired-End (PE) for CD34+ cells ---
    # Assuming input files are cd34_R1_raw.fastq.gz and cd34_R2_raw.fastq.gz
    INPUT_PE_R1="cd34_R1_raw.fastq.gz"
    INPUT_PE_R2="cd34_R2_raw.fastq.gz"
    OUTPUT_PE_R1_PAIRED="cd34_R1_paired.fastq.gz"
    OUTPUT_PE_R1_UNPAIRED="cd34_R1_unpaired.fastq.gz"
    OUTPUT_PE_R2_PAIRED="cd34_R2_paired.fastq.gz"
    OUTPUT_PE_R2_UNPAIRED="cd34_R2_unpaired.fastq.gz"
    
    echo "Trimming adapters for Paired-End reads (CD34+ cells)..."
    java -jar "${TRIMMOMATIC_JAR}" PE -phred33 \
        "${INPUT_PE_R1}" "${INPUT_PE_R2}" \
        "${OUTPUT_PE_R1_PAIRED}" "${OUTPUT_PE_R1_UNPAIRED}" \
        "${OUTPUT_PE_R2_PAIRED}" "${OUTPUT_PE_R2_UNPAIRED}" \
        ILLUMINACLIP:"${ADAPTER_FILE_PE}":2:30:10 \
        LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 \
        -threads "${THREADS}"
  2. 2

    Trimmed reads were aligned to the human genome (hg19) with TopHat v2.0.11.

    $ Bash example
    # Install TopHat v2.0.11
    # conda create -n tophat_env tophat=2.0.11 -y
    # conda activate tophat_env
    
    # --- Reference Genome Preparation (if not already done) ---
    # TopHat v2 uses Bowtie2 for alignment. Ensure hg19 Bowtie2 index is available.
    # Download hg19 reference genome FASTA
    # mkdir -p ref/hg19
    # wget -P ref/hg19 http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz
    # gunzip ref/hg19/hg19.fa.gz
    
    # Build Bowtie2 index for hg19
    # bowtie2-build ref/hg19/hg19.fa ref/hg19/hg19_index
    
    # --- Alignment Step ---
    # Define input and output paths
    INPUT_READS="trimmed_reads.fastq" # Placeholder for your trimmed reads file
    GENOME_INDEX_PREFIX="ref/hg19/hg19_index" # Path to the Bowtie2 index prefix for hg19
    OUTPUT_DIR="tophat_alignment_hg19"
    
    # Run TopHat alignment
    # -o: Output directory
    # -p: Number of threads (adjust as needed)
    tophat -o "${OUTPUT_DIR}" -p 8 "${GENOME_INDEX_PREFIX}" "${INPUT_READS}"
  3. 3

    Differential gene expression was determined using Cuffdiff v.2.1.1.

    $ Bash example
    # Install Cufflinks (which includes Cuffdiff)
    # conda install -c bioconda cufflinks=2.1.1
    
    # Define variables for input and output
    # TRANSCRIPTS_GTF is typically generated by Cufflinks or StringTie from aligned reads
    TRANSCRIPTS_GTF="path/to/merged_transcripts.gtf"
    
    # Comma-separated BAM files for each condition/sample group
    # Replace with actual paths to your aligned BAM files
    SAMPLE1_BAMS="path/to/sample1_rep1.bam,path/to/sample1_rep2.bam"
    SAMPLE2_BAMS="path/to/sample2_rep1.bam,path/to/sample2_rep2.bam"
    
    OUTPUT_DIR="cuffdiff_output"
    
    # Placeholder for reference genome FASTA and annotation GTF
    # Use the latest assembly (e.g., GRCh38/hg38) and corresponding annotation (e.g., Ensembl, Gencode)
    GENOME_FASTA="path/to/GRCh38.fa"
    GENOME_ANNOTATION_GTF="path/to/GRCh38.gtf"
    
    # Create output directory
    mkdir -p "${OUTPUT_DIR}"
    
    # Run Cuffdiff for differential expression analysis
    # -o: output directory
    # -L: comma-separated list of condition labels (must match the order of BAM groups)
    # -b: enable bias correction using a reference genome FASTA file
    # -u: use a reference annotation GTF to guide assembly and quantification
    cuffdiff -o "${OUTPUT_DIR}" \
             -L "ConditionA,ConditionB" \
             -b "${GENOME_FASTA}" \
             -u "${GENOME_ANNOTATION_GTF}" \
             "${TRANSCRIPTS_GTF}" \
             "${SAMPLE1_BAMS}" \
             "${SAMPLE2_BAMS}"
    

Tools Used

Raw Source Text
Adaptors were trimmed using trimmomatic (SE for Kasumi-1 cells, PE for CD34+ cells).
Trimmed reads were aligned to the human genome (hg19) with TopHat v2.0.11.
Differential gene expression was determined using Cuffdiff v.2.1.1.
Genome_build: hg19 (GRCh37)
Supplementary_files_format_and_content: *_gene_exp.diff: Cuffdiff differential gene expression output; tab delimited text file.
← Back to Analysis