GSE104502 Processing Pipeline

GSE code_examples 2 steps

Publication

Short poly(A) tails are a conserved feature of highly expressed genes.

Nature structural & molecular biology (2017) — PMID 29106412

Dataset

GSE104502

Short Poly(A) Tails are a Conserved Feature of Highly Expressed Genes

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Read counts were quantified using kallisto.

    kallisto
    $ Bash example
    kallisto -h
  2. 2

    These were then aligned to C. elegans genome WS247.

    STAR (Inferred with models/gemini-2.5-flash) v2.7.10a GitHub
    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star
    
    # Define paths and reference genome for C. elegans WS247
    # The C. elegans WS247 genome FASTA and GTF/GFF3 files can be obtained from WormBase (e.g., ftp://ftp.wormbase.org/pub/wormbase/releases/WS247/)
    GENOME_DIR="/path/to/C_elegans_WS247_STAR_index" # Directory containing STAR genome index
    READS_R1="input_reads_R1.fastq.gz" # Placeholder for input forward reads
    READS_R2="input_reads_R2.fastq.gz" # Placeholder for input reverse reads (if paired-end, remove if single-end)
    OUTPUT_PREFIX="aligned_to_WS247"
    
    # --- Genome Index Generation (Run this once if the index does not exist) ---
    # Assuming you have the genome FASTA and GTF files for WS247, e.g.:
    # GENOME_FASTA="/path/to/c_elegans.PRJNA13758.WS247.genomic.fa"
    # GTF_FILE="/path/to/c_elegans.PRJNA13758.WS247.annotations.gtf" # Convert GFF3 to GTF if only GFF3 is available
    # STAR --runMode genomeGenerate \
    #      --genomeDir ${GENOME_DIR} \
    #      --genomeFastaFiles ${GENOME_FASTA} \
    #      --sjdbGTFfile ${GTF_FILE} \
    #      --runThreadN 8 # Adjust number of threads as needed
    # ---------------------------------------------------------------------------
    
    # Align reads to C. elegans genome WS247
    STAR --genomeDir ${GENOME_DIR} \
         --readFilesIn ${READS_R1} ${READS_R2} \
         --readFilesCommand zcat \
         --outFileNamePrefix ${OUTPUT_PREFIX}. \
         --outSAMtype BAM SortedByCoordinate \
         --outSAMunmapped Within \
         --outSAMattributes Standard \
         --runThreadN 8 # Adjust number of threads as needed
    
    # Note: For single-end reads, remove the ${READS_R2} from --readFilesIn.
Raw Source Text
Read counts were quantified using kallisto.
These were then aligned to C. elegans genome WS247.
Genome_build: WS247
Supplementary_files_format_and_content: Csv; Contains tpm values for each replicate
← Back to Analysis