GSE181138 Processing Pipeline

GSE code_examples 2 steps

Publication

Identification of the global miR-130a targetome reveals a role for TBL1XR1 in hematopoietic stem cell self-renewal and t(8;21) AML.

Cell reports (2022) — PMID 35263585

Dataset

GSE181138

Identification of the Global miR-130a Targetome Reveals a Novel Role for TBL1XR1 in Hematopoietic Stem Cell Self-Renewal and t(8;21) AML [miR-130a OE]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Raw Data was aligned to hg38 using STAR

STAR v2.7.10a (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install STAR (if not already installed)
# conda install -c bioconda star

# Reference genome: hg38
# Before alignment, a STAR genome index for hg38 must be built.
# Example command to build index (run once):
# STAR --runMode genomeGenerate \
#      --genomeDir /path/to/hg38_star_index \
#      --genomeFastaFiles /path/to/hg38.fa \
#      --sjdbGTFfile /path/to/hg38.gtf \
#      --runThreadN 8 # Adjust threads as needed

# Assume hg38 STAR index is available at /path/to/hg38_star_index
# Assume input raw data is input.fastq.gz (for single-end reads)
# For paired-end reads, use: --readFilesIn input_R1.fastq.gz input_R2.fastq.gz

STAR --runThreadN 8 \
     --genomeDir /path/to/hg38_star_index \
     --readFilesIn input.fastq.gz \
     --outFileNamePrefix aligned_reads_ \
     --outSAMtype BAM SortedByCoordinate \
     --outBAMcompression 6

View on GitHub

HT-Seq count was used to obtain read counts over all GENCODE 32 genes

GENCODE v0.11.2

$ Bash example

# Install HTSeq if not already available
# conda install -c bioconda htseq

# Define input and output files
INPUT_BAM="aligned_reads.bam" # Placeholder for your aligned BAM file
GENCODE_GTF="gencode.v32.annotation.gtf"
OUTPUT_COUNTS="read_counts.txt"

# Download GENCODE v32 GTF if not already present
# mkdir -p references
# wget -O references/gencode.v32.annotation.gtf.gz ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_32/gencode.v32.annotation.gtf.gz
# gunzip -f references/gencode.v32.annotation.gtf.gz
# GENCODE_GTF="references/gencode.v32.annotation.gtf"

# Run htseq-count
# Parameters:
# --format=bam: Input file format is BAM.
# --stranded=no: Assumes unstranded library preparation. Adjust to 'yes' or 'reverse' if applicable.
# --mode=union: Default mode for counting reads overlapping features.
# --type=exon: Count reads overlapping 'exon' features.
# --idattr=gene_id: Use 'gene_id' attribute to group features and report counts.
htseq-count \
    --format=bam \
    --stranded=no \
    --mode=union \
    --type=exon \
    --idattr=gene_id \
    "${INPUT_BAM}" \
    "${GENCODE_GTF}" \
    > "${OUTPUT_COUNTS}"

Tools Used

STAR

Raw Source Text

Raw Data was aligned to hg38 using STAR
HT-Seq count was used to obtain read counts over all GENCODE 32 genes
Genome_build: hg38
Supplementary_files_format_and_content: tab delimited read counts

← Back to Analysis