GSE104500 Processing Pipeline

RNA-Seq code_examples 2 steps

Publication

Short poly(A) tails are a conserved feature of highly expressed genes.

Nature structural & molecular biology (2017) — PMID 29106412

Dataset

GSE104500

RNA-Seq of L4 C. elegans

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Read counts were quantified using kallisto.

    kallisto v0.46.0 GitHub
    $ Bash example
    # Install kallisto (if not already installed)
    # conda install -c bioconda kallisto=0.46.0
    
    # Define variables (replace with actual paths and filenames)
    TRANSCRIPTOME_FASTA="GRCh38_transcriptome.fasta" # Placeholder: Replace with path to your transcriptome FASTA (e.g., from Ensembl or GENCODE)
    KALLISTO_INDEX="kallisto_index.idx"
    SAMPLE_R1_FASTQ="sample_R1.fastq.gz" # Replace with your R1 FASTQ file
    SAMPLE_R2_FASTQ="sample_R2.fastq.gz" # Replace with your R2 FASTQ file (omit if single-end)
    OUTPUT_DIR="kallisto_quant_output"
    NUM_THREADS=8 # Number of threads to use
    
    # 1. Build kallisto index (if not already built)
    # This step needs to be run once for a given transcriptome.
    # kallisto index -i ${KALLISTO_INDEX} ${TRANSCRIPTOME_FASTA}
    
    # 2. Quantify read counts using kallisto
    # For paired-end reads:
    kallisto quant \
      -i ${KALLISTO_INDEX} \
      -o ${OUTPUT_DIR} \
      --bias \
      --threads ${NUM_THREADS} \
      ${SAMPLE_R1_FASTQ} \
      ${SAMPLE_R2_FASTQ}
    
    # For single-end reads (uncomment and modify if applicable):
    # kallisto quant \
    #   -i ${KALLISTO_INDEX} \
    #   -o ${OUTPUT_DIR} \
    #   --single \
    #   -l 200 \
    #   -s 20 \
    #   --bias \
    #   --threads ${NUM_THREADS} \
    #   ${SAMPLE_R1_FASTQ}
    
  2. 2

    These were then aligned to C. elegans genome WS247.

    STAR (Inferred with models/gemini-2.5-flash) v2.7.10a GitHub
    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star
    
    # Define variables
    GENOME_DIR="celegans_WS247_STAR_index"
    GENOME_FASTA="c_elegans.WS247.dna.toplevel.fa.gz" # Placeholder: Download from WormBase FTP for WS247
    GENOME_GTF="c_elegans.WS247.annotations.gtf.gz"   # Placeholder: Download from WormBase FTP for WS247
    READS_R1="input_reads_R1.fastq.gz"               # Placeholder for input forward reads
    READS_R2="input_reads_R2.fastq.gz"               # Placeholder for input reverse reads (if paired-end)
    OUTPUT_PREFIX="aligned_reads"
    
    # --- Reference Data Acquisition (Example - replace with actual download if needed) ---
    # For C. elegans genome WS247, reference files are typically found on WormBase FTP.
    # Example download commands (adjust paths and filenames as necessary):
    # wget ftp://ftp.wormbase.org/pub/wormbase/releases/WS247/species/c_elegans/PRJNA13758/c_elegans.PRJNA13758.WS247.dna.toplevel.fa.gz
    # wget ftp://ftp.wormbase.org/pub/wormbase/releases/WS247/species/c_elegans/PRJNA13758/c_elegans.PRJNA13758.WS247.annotations.gtf.gz
    
    # 1. Create STAR genome index
    mkdir -p ${GENOME_DIR}
    STAR --runMode genomeGenerate \
         --genomeDir ${GENOME_DIR} \
         --genomeFastaFiles ${GENOME_FASTA} \
         --sjdbGTFfile ${GENOME_GTF} \
         --sjdbOverhang 100 \
         --runThreadN 8 # Adjust threads as needed
    
    # 2. Align reads to the C. elegans WS247 genome
    # Assuming paired-end reads. For single-end, remove ${READS_R2}
    STAR --runMode alignReads \
         --genomeDir ${GENOME_DIR} \
         --readFilesIn ${READS_R1} ${READS_R2} \
         --outFileNamePrefix ${OUTPUT_PREFIX}_ \
         --outSAMtype BAM SortedByCoordinate \
         --outSAMunmapped Within \
         --outSAMattributes Standard \
         --runThreadN 8 # Adjust threads as needed
    
    # Rename output file for clarity
    mv ${OUTPUT_PREFIX}_Aligned.sortedByCoord.out.bam ${OUTPUT_PREFIX}.bam
Raw Source Text
Read counts were quantified using kallisto.
These were then aligned to C. elegans genome WS247.
Genome_build: WS247
Supplementary_files_format_and_content: Csv; Contains tpm values for each replicate
← Back to Analysis