GSE181139 Processing Pipeline

GSE code_examples 1 step

Publication

Identification of the global miR-130a targetome reveals a role for TBL1XR1 in hematopoietic stem cell self-renewal and t(8;21) AML.

Cell reports (2022) — PMID 35263585

Dataset

GSE181139

Identification of the Global miR-130a Targetome Reveals a Novel Role for TBL1XR1 in Hematopoietic Stem Cell Self-Renewal and t(8;21) AML [TBL1XR1 KD]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Raw Reads were aligned using STAR against known GENCODE 32 transcripts

STAR v2.7.10a (Inferred with models/gemini-2.5-flash)

$ Bash example

# --- Installation (example using Conda) ---
# conda create -n star_env star=2.7.10a -y
# conda activate star_env

# --- Reference Data Setup (GENCODE 32 for GRCh38) ---
# This section demonstrates how to prepare the STAR genome index for GENCODE 32.
# The actual alignment command assumes this index is already built.

# # Create a directory for the genome index
# mkdir -p star_gencode32_index
# cd star_gencode32_index

# # Download GENCODE 32 human genome (GRCh38 primary assembly) FASTA and GTF files
# wget -c ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_32/GRCh38.primary_assembly.genome.fa.gz
# wget -c ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_32/gencode.v32.annotation.gtf.gz

# # Unzip the downloaded files
# gunzip GRCh38.primary_assembly.genome.fa.gz
# gunzip gencode.v32.annotation.gtf.gz

# # Build the STAR genome index
# # Adjust --sjdbOverhang based on the maximum read length minus 1 (e.g., 74 for 75bp reads).
# # For eCLIP, read lengths can vary, often 50-75bp. Using 74 as a common example.
# STAR --runMode genomeGenerate \
#      --genomeDir . \
#      --genomeFastaFiles GRCh38.primary_assembly.genome.fa \
#      --sjdbGTFfile gencode.v32.annotation.gtf \
#      --sjdbOverhang 74 \
#      --runThreadN 8 # Adjust number of threads as needed
# cd ..

# --- Alignment Step ---
# Define variables for input and output
GENOME_DIR="star_gencode32_index" # Path to your pre-built STAR genome index
READS_R1="raw_reads_R1.fastq.gz" # Placeholder for input FASTQ file (Read 1)
READS_R2="raw_reads_R2.fastq.gz" # Placeholder for input FASTQ file (Read 2, remove for single-end)
OUTPUT_PREFIX="aligned_sample" # Prefix for output files (e.g., aligned_sample.Log.out, aligned_sample.Aligned.sortedByCoordinate.out.bam)
NUM_THREADS=8 # Adjust number of threads as needed

# Execute STAR alignment
STAR --genomeDir "${GENOME_DIR}" \
     --readFilesIn "${READS_R1}" "${READS_R2}" \
     --readFilesCommand zcat \
     --runThreadN "${NUM_THREADS}" \
     --outFileNamePrefix "${OUTPUT_PREFIX}." \
     --outSAMtype BAM SortedByCoordinate \
     --outSAMattributes All \
     --outFilterMultimapNmax 20 \
     --outFilterMismatchNmax 999 \
     --outFilterMismatchNoverLmax 0.04 \
     --alignIntronMin 20 \
     --alignIntronMax 1000000 \
     --alignMatesGapMax 1000000 \
     --sjdbScore 1 \
     --quantMode GeneCounts # Optional: Output gene counts (ReadsPerGene.out.tab)

Tools Used

STAR

Raw Source Text

Raw Reads were aligned using STAR against known GENCODE 32 transcripts
Genome_build: hg38
Supplementary_files_format_and_content: tab delimited read counts

← Back to Analysis