GSE181138 Processing Pipeline

GSE code_examples 2 steps

Publication

Identification of the global miR-130a targetome reveals a role for TBL1XR1 in hematopoietic stem cell self-renewal and t(8;21) AML.

Cell reports (2022) — PMID 35263585

Dataset

GSE181138

Identification of the Global miR-130a Targetome Reveals a Novel Role for TBL1XR1 in Hematopoietic Stem Cell Self-Renewal and t(8;21) AML [miR-130a OE]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Raw Data was aligned to hg38 using STAR

    STAR v2.7.10a (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star
    
    # Reference genome: hg38
    # Before alignment, a STAR genome index for hg38 must be built.
    # Example command to build index (run once):
    # STAR --runMode genomeGenerate \
    #      --genomeDir /path/to/hg38_star_index \
    #      --genomeFastaFiles /path/to/hg38.fa \
    #      --sjdbGTFfile /path/to/hg38.gtf \
    #      --runThreadN 8 # Adjust threads as needed
    
    # Assume hg38 STAR index is available at /path/to/hg38_star_index
    # Assume input raw data is input.fastq.gz (for single-end reads)
    # For paired-end reads, use: --readFilesIn input_R1.fastq.gz input_R2.fastq.gz
    
    STAR --runThreadN 8 \
         --genomeDir /path/to/hg38_star_index \
         --readFilesIn input.fastq.gz \
         --outFileNamePrefix aligned_reads_ \
         --outSAMtype BAM SortedByCoordinate \
         --outBAMcompression 6
  2. 2

    HT-Seq count was used to obtain read counts over all GENCODE 32 genes

    GENCODE v0.11.2
    $ Bash example
    # Install HTSeq if not already available
    # conda install -c bioconda htseq
    
    # Define input and output files
    INPUT_BAM="aligned_reads.bam" # Placeholder for your aligned BAM file
    GENCODE_GTF="gencode.v32.annotation.gtf"
    OUTPUT_COUNTS="read_counts.txt"
    
    # Download GENCODE v32 GTF if not already present
    # mkdir -p references
    # wget -O references/gencode.v32.annotation.gtf.gz ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_32/gencode.v32.annotation.gtf.gz
    # gunzip -f references/gencode.v32.annotation.gtf.gz
    # GENCODE_GTF="references/gencode.v32.annotation.gtf"
    
    # Run htseq-count
    # Parameters:
    # --format=bam: Input file format is BAM.
    # --stranded=no: Assumes unstranded library preparation. Adjust to 'yes' or 'reverse' if applicable.
    # --mode=union: Default mode for counting reads overlapping features.
    # --type=exon: Count reads overlapping 'exon' features.
    # --idattr=gene_id: Use 'gene_id' attribute to group features and report counts.
    htseq-count \
        --format=bam \
        --stranded=no \
        --mode=union \
        --type=exon \
        --idattr=gene_id \
        "${INPUT_BAM}" \
        "${GENCODE_GTF}" \
        > "${OUTPUT_COUNTS}"

Tools Used

Raw Source Text
Raw Data was aligned to hg38 using STAR
HT-Seq count was used to obtain read counts over all GENCODE 32 genes
Genome_build: hg38
Supplementary_files_format_and_content: tab delimited read counts
← Back to Analysis