GSE181138 Processing Pipeline
GSE
code_examples
2 steps
Publication
Identification of the global miR-130a targetome reveals a role for TBL1XR1 in hematopoietic stem cell self-renewal and t(8;21) AML.Cell reports (2022) — PMID 35263585
Dataset
GSE181138Identification of the Global miR-130a Targetome Reveals a Novel Role for TBL1XR1 in Hematopoietic Stem Cell Self-Renewal and t(8;21) AML [miR-130a OE]
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Raw Data was aligned to hg38 using STAR
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star # Reference genome: hg38 # Before alignment, a STAR genome index for hg38 must be built. # Example command to build index (run once): # STAR --runMode genomeGenerate \ # --genomeDir /path/to/hg38_star_index \ # --genomeFastaFiles /path/to/hg38.fa \ # --sjdbGTFfile /path/to/hg38.gtf \ # --runThreadN 8 # Adjust threads as needed # Assume hg38 STAR index is available at /path/to/hg38_star_index # Assume input raw data is input.fastq.gz (for single-end reads) # For paired-end reads, use: --readFilesIn input_R1.fastq.gz input_R2.fastq.gz STAR --runThreadN 8 \ --genomeDir /path/to/hg38_star_index \ --readFilesIn input.fastq.gz \ --outFileNamePrefix aligned_reads_ \ --outSAMtype BAM SortedByCoordinate \ --outBAMcompression 6 -
2
HT-Seq count was used to obtain read counts over all GENCODE 32 genes
GENCODE v0.11.2$ Bash example
# Install HTSeq if not already available # conda install -c bioconda htseq # Define input and output files INPUT_BAM="aligned_reads.bam" # Placeholder for your aligned BAM file GENCODE_GTF="gencode.v32.annotation.gtf" OUTPUT_COUNTS="read_counts.txt" # Download GENCODE v32 GTF if not already present # mkdir -p references # wget -O references/gencode.v32.annotation.gtf.gz ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_32/gencode.v32.annotation.gtf.gz # gunzip -f references/gencode.v32.annotation.gtf.gz # GENCODE_GTF="references/gencode.v32.annotation.gtf" # Run htseq-count # Parameters: # --format=bam: Input file format is BAM. # --stranded=no: Assumes unstranded library preparation. Adjust to 'yes' or 'reverse' if applicable. # --mode=union: Default mode for counting reads overlapping features. # --type=exon: Count reads overlapping 'exon' features. # --idattr=gene_id: Use 'gene_id' attribute to group features and report counts. htseq-count \ --format=bam \ --stranded=no \ --mode=union \ --type=exon \ --idattr=gene_id \ "${INPUT_BAM}" \ "${GENCODE_GTF}" \ > "${OUTPUT_COUNTS}"
Tools Used
Raw Source Text
Raw Data was aligned to hg38 using STAR HT-Seq count was used to obtain read counts over all GENCODE 32 genes Genome_build: hg38 Supplementary_files_format_and_content: tab delimited read counts