GSE153264 Processing Pipeline
RNA-Seq
code_examples
3 steps
Publication
Identification of the global miR-130a targetome reveals a role for TBL1XR1 in hematopoietic stem cell self-renewal and t(8;21) AML.Cell reports (2022) — PMID 35263585
Dataset
GSE153264Definition of a Small Core Transcriptional Circuit Regulated by AML1-ETO [RNA-seq]
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Adaptors were trimmed using trimmomatic (SE for Kasumi-1 cells, PE for CD34+ cells).
Trimmomatic v0.39$ Bash example
# Install Trimmomatic (if not already installed) # conda install -c bioconda trimmomatic=0.39 # Define Trimmomatic path and adapter file paths TRIMMOMATIC_JAR="/path/to/trimmomatic-0.39.jar" # Adjust path to your Trimmomatic .jar file ADAPTER_FILE_PE="/path/to/Trimmomatic/adapters/TruSeq3-PE.fa" # Path to Illumina adapter file for Paired-End ADAPTER_FILE_SE="/path/to/Trimmomatic/adapters/TruSeq3-SE.fa" # Path to Illumina adapter file for Single-End THREADS=8 # Number of threads to use # --- Scenario 1: Single-End (SE) for Kasumi-1 cells --- # Assuming input file is kasumi1_raw.fastq.gz INPUT_SE_FASTQ="kasumi1_raw.fastq.gz" OUTPUT_SE_TRIMMED="kasumi1_trimmed.fastq.gz" echo "Trimming adapters for Single-End reads (Kasumi-1 cells)..." java -jar "${TRIMMOMATIC_JAR}" SE -phred33 \ "${INPUT_SE_FASTQ}" \ "${OUTPUT_SE_TRIMMED}" \ ILLUMINACLIP:"${ADAPTER_FILE_SE}":2:30:10 \ LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 \ -threads "${THREADS}" echo "" # --- Scenario 2: Paired-End (PE) for CD34+ cells --- # Assuming input files are cd34_R1_raw.fastq.gz and cd34_R2_raw.fastq.gz INPUT_PE_R1="cd34_R1_raw.fastq.gz" INPUT_PE_R2="cd34_R2_raw.fastq.gz" OUTPUT_PE_R1_PAIRED="cd34_R1_paired.fastq.gz" OUTPUT_PE_R1_UNPAIRED="cd34_R1_unpaired.fastq.gz" OUTPUT_PE_R2_PAIRED="cd34_R2_paired.fastq.gz" OUTPUT_PE_R2_UNPAIRED="cd34_R2_unpaired.fastq.gz" echo "Trimming adapters for Paired-End reads (CD34+ cells)..." java -jar "${TRIMMOMATIC_JAR}" PE -phred33 \ "${INPUT_PE_R1}" "${INPUT_PE_R2}" \ "${OUTPUT_PE_R1_PAIRED}" "${OUTPUT_PE_R1_UNPAIRED}" \ "${OUTPUT_PE_R2_PAIRED}" "${OUTPUT_PE_R2_UNPAIRED}" \ ILLUMINACLIP:"${ADAPTER_FILE_PE}":2:30:10 \ LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 \ -threads "${THREADS}" -
2
Trimmed reads were aligned to the human genome (hg19) with TopHat v2.0.11.
$ Bash example
# Install TopHat v2.0.11 # conda create -n tophat_env tophat=2.0.11 -y # conda activate tophat_env # --- Reference Genome Preparation (if not already done) --- # TopHat v2 uses Bowtie2 for alignment. Ensure hg19 Bowtie2 index is available. # Download hg19 reference genome FASTA # mkdir -p ref/hg19 # wget -P ref/hg19 http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz # gunzip ref/hg19/hg19.fa.gz # Build Bowtie2 index for hg19 # bowtie2-build ref/hg19/hg19.fa ref/hg19/hg19_index # --- Alignment Step --- # Define input and output paths INPUT_READS="trimmed_reads.fastq" # Placeholder for your trimmed reads file GENOME_INDEX_PREFIX="ref/hg19/hg19_index" # Path to the Bowtie2 index prefix for hg19 OUTPUT_DIR="tophat_alignment_hg19" # Run TopHat alignment # -o: Output directory # -p: Number of threads (adjust as needed) tophat -o "${OUTPUT_DIR}" -p 8 "${GENOME_INDEX_PREFIX}" "${INPUT_READS}" -
3
Differential gene expression was determined using Cuffdiff v.2.1.1.
$ Bash example
# Install Cufflinks (which includes Cuffdiff) # conda install -c bioconda cufflinks=2.1.1 # Define variables for input and output # TRANSCRIPTS_GTF is typically generated by Cufflinks or StringTie from aligned reads TRANSCRIPTS_GTF="path/to/merged_transcripts.gtf" # Comma-separated BAM files for each condition/sample group # Replace with actual paths to your aligned BAM files SAMPLE1_BAMS="path/to/sample1_rep1.bam,path/to/sample1_rep2.bam" SAMPLE2_BAMS="path/to/sample2_rep1.bam,path/to/sample2_rep2.bam" OUTPUT_DIR="cuffdiff_output" # Placeholder for reference genome FASTA and annotation GTF # Use the latest assembly (e.g., GRCh38/hg38) and corresponding annotation (e.g., Ensembl, Gencode) GENOME_FASTA="path/to/GRCh38.fa" GENOME_ANNOTATION_GTF="path/to/GRCh38.gtf" # Create output directory mkdir -p "${OUTPUT_DIR}" # Run Cuffdiff for differential expression analysis # -o: output directory # -L: comma-separated list of condition labels (must match the order of BAM groups) # -b: enable bias correction using a reference genome FASTA file # -u: use a reference annotation GTF to guide assembly and quantification cuffdiff -o "${OUTPUT_DIR}" \ -L "ConditionA,ConditionB" \ -b "${GENOME_FASTA}" \ -u "${GENOME_ANNOTATION_GTF}" \ "${TRANSCRIPTS_GTF}" \ "${SAMPLE1_BAMS}" \ "${SAMPLE2_BAMS}"
Raw Source Text
Adaptors were trimmed using trimmomatic (SE for Kasumi-1 cells, PE for CD34+ cells). Trimmed reads were aligned to the human genome (hg19) with TopHat v2.0.11. Differential gene expression was determined using Cuffdiff v.2.1.1. Genome_build: hg19 (GRCh37) Supplementary_files_format_and_content: *_gene_exp.diff: Cuffdiff differential gene expression output; tab delimited text file.