GSE104500 Processing Pipeline

RNA-Seq code_examples 2 steps

Publication

Short poly(A) tails are a conserved feature of highly expressed genes.

Nature structural & molecular biology (2017) — PMID 29106412

Dataset

RNA-Seq of L4 C. elegans

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Read counts were quantified using kallisto.

kallisto v0.46.0 GitHub

$ Bash example

# Install kallisto (if not already installed)
# conda install -c bioconda kallisto=0.46.0

# Define variables (replace with actual paths and filenames)
TRANSCRIPTOME_FASTA="GRCh38_transcriptome.fasta" # Placeholder: Replace with path to your transcriptome FASTA (e.g., from Ensembl or GENCODE)
KALLISTO_INDEX="kallisto_index.idx"
SAMPLE_R1_FASTQ="sample_R1.fastq.gz" # Replace with your R1 FASTQ file
SAMPLE_R2_FASTQ="sample_R2.fastq.gz" # Replace with your R2 FASTQ file (omit if single-end)
OUTPUT_DIR="kallisto_quant_output"
NUM_THREADS=8 # Number of threads to use

# 1. Build kallisto index (if not already built)
# This step needs to be run once for a given transcriptome.
# kallisto index -i ${KALLISTO_INDEX} ${TRANSCRIPTOME_FASTA}

# 2. Quantify read counts using kallisto
# For paired-end reads:
kallisto quant \
  -i ${KALLISTO_INDEX} \
  -o ${OUTPUT_DIR} \
  --bias \
  --threads ${NUM_THREADS} \
  ${SAMPLE_R1_FASTQ} \
  ${SAMPLE_R2_FASTQ}

# For single-end reads (uncomment and modify if applicable):
# kallisto quant \
#   -i ${KALLISTO_INDEX} \
#   -o ${OUTPUT_DIR} \
#   --single \
#   -l 200 \
#   -s 20 \
#   --bias \
#   --threads ${NUM_THREADS} \
#   ${SAMPLE_R1_FASTQ}

View on GitHub

These were then aligned to C. elegans genome WS247.

STAR (Inferred with models/gemini-2.5-flash) v2.7.10a GitHub

$ Bash example

# Install STAR (if not already installed)
# conda install -c bioconda star

# Define variables
GENOME_DIR="celegans_WS247_STAR_index"
GENOME_FASTA="c_elegans.WS247.dna.toplevel.fa.gz" # Placeholder: Download from WormBase FTP for WS247
GENOME_GTF="c_elegans.WS247.annotations.gtf.gz"   # Placeholder: Download from WormBase FTP for WS247
READS_R1="input_reads_R1.fastq.gz"               # Placeholder for input forward reads
READS_R2="input_reads_R2.fastq.gz"               # Placeholder for input reverse reads (if paired-end)
OUTPUT_PREFIX="aligned_reads"

# --- Reference Data Acquisition (Example - replace with actual download if needed) ---
# For C. elegans genome WS247, reference files are typically found on WormBase FTP.
# Example download commands (adjust paths and filenames as necessary):
# wget ftp://ftp.wormbase.org/pub/wormbase/releases/WS247/species/c_elegans/PRJNA13758/c_elegans.PRJNA13758.WS247.dna.toplevel.fa.gz
# wget ftp://ftp.wormbase.org/pub/wormbase/releases/WS247/species/c_elegans/PRJNA13758/c_elegans.PRJNA13758.WS247.annotations.gtf.gz

# 1. Create STAR genome index
mkdir -p ${GENOME_DIR}
STAR --runMode genomeGenerate \
     --genomeDir ${GENOME_DIR} \
     --genomeFastaFiles ${GENOME_FASTA} \
     --sjdbGTFfile ${GENOME_GTF} \
     --sjdbOverhang 100 \
     --runThreadN 8 # Adjust threads as needed

# 2. Align reads to the C. elegans WS247 genome
# Assuming paired-end reads. For single-end, remove ${READS_R2}
STAR --runMode alignReads \
     --genomeDir ${GENOME_DIR} \
     --readFilesIn ${READS_R1} ${READS_R2} \
     --outFileNamePrefix ${OUTPUT_PREFIX}_ \
     --outSAMtype BAM SortedByCoordinate \
     --outSAMunmapped Within \
     --outSAMattributes Standard \
     --runThreadN 8 # Adjust threads as needed

# Rename output file for clarity
mv ${OUTPUT_PREFIX}_Aligned.sortedByCoord.out.bam ${OUTPUT_PREFIX}.bam

View on GitHub

Raw Source Text

Read counts were quantified using kallisto.
These were then aligned to C. elegans genome WS247.
Genome_build: WS247
Supplementary_files_format_and_content: Csv; Contains tpm values for each replicate

← Back to Analysis