GSE104500 Processing Pipeline
RNA-Seq
code_examples
2 steps
Publication
Short poly(A) tails are a conserved feature of highly expressed genes.Nature structural & molecular biology (2017) — PMID 29106412
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Read counts were quantified using kallisto.
$ Bash example
# Install kallisto (if not already installed) # conda install -c bioconda kallisto=0.46.0 # Define variables (replace with actual paths and filenames) TRANSCRIPTOME_FASTA="GRCh38_transcriptome.fasta" # Placeholder: Replace with path to your transcriptome FASTA (e.g., from Ensembl or GENCODE) KALLISTO_INDEX="kallisto_index.idx" SAMPLE_R1_FASTQ="sample_R1.fastq.gz" # Replace with your R1 FASTQ file SAMPLE_R2_FASTQ="sample_R2.fastq.gz" # Replace with your R2 FASTQ file (omit if single-end) OUTPUT_DIR="kallisto_quant_output" NUM_THREADS=8 # Number of threads to use # 1. Build kallisto index (if not already built) # This step needs to be run once for a given transcriptome. # kallisto index -i ${KALLISTO_INDEX} ${TRANSCRIPTOME_FASTA} # 2. Quantify read counts using kallisto # For paired-end reads: kallisto quant \ -i ${KALLISTO_INDEX} \ -o ${OUTPUT_DIR} \ --bias \ --threads ${NUM_THREADS} \ ${SAMPLE_R1_FASTQ} \ ${SAMPLE_R2_FASTQ} # For single-end reads (uncomment and modify if applicable): # kallisto quant \ # -i ${KALLISTO_INDEX} \ # -o ${OUTPUT_DIR} \ # --single \ # -l 200 \ # -s 20 \ # --bias \ # --threads ${NUM_THREADS} \ # ${SAMPLE_R1_FASTQ} -
2
These were then aligned to C. elegans genome WS247.
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star # Define variables GENOME_DIR="celegans_WS247_STAR_index" GENOME_FASTA="c_elegans.WS247.dna.toplevel.fa.gz" # Placeholder: Download from WormBase FTP for WS247 GENOME_GTF="c_elegans.WS247.annotations.gtf.gz" # Placeholder: Download from WormBase FTP for WS247 READS_R1="input_reads_R1.fastq.gz" # Placeholder for input forward reads READS_R2="input_reads_R2.fastq.gz" # Placeholder for input reverse reads (if paired-end) OUTPUT_PREFIX="aligned_reads" # --- Reference Data Acquisition (Example - replace with actual download if needed) --- # For C. elegans genome WS247, reference files are typically found on WormBase FTP. # Example download commands (adjust paths and filenames as necessary): # wget ftp://ftp.wormbase.org/pub/wormbase/releases/WS247/species/c_elegans/PRJNA13758/c_elegans.PRJNA13758.WS247.dna.toplevel.fa.gz # wget ftp://ftp.wormbase.org/pub/wormbase/releases/WS247/species/c_elegans/PRJNA13758/c_elegans.PRJNA13758.WS247.annotations.gtf.gz # 1. Create STAR genome index mkdir -p ${GENOME_DIR} STAR --runMode genomeGenerate \ --genomeDir ${GENOME_DIR} \ --genomeFastaFiles ${GENOME_FASTA} \ --sjdbGTFfile ${GENOME_GTF} \ --sjdbOverhang 100 \ --runThreadN 8 # Adjust threads as needed # 2. Align reads to the C. elegans WS247 genome # Assuming paired-end reads. For single-end, remove ${READS_R2} STAR --runMode alignReads \ --genomeDir ${GENOME_DIR} \ --readFilesIn ${READS_R1} ${READS_R2} \ --outFileNamePrefix ${OUTPUT_PREFIX}_ \ --outSAMtype BAM SortedByCoordinate \ --outSAMunmapped Within \ --outSAMattributes Standard \ --runThreadN 8 # Adjust threads as needed # Rename output file for clarity mv ${OUTPUT_PREFIX}_Aligned.sortedByCoord.out.bam ${OUTPUT_PREFIX}.bam
Raw Source Text
Read counts were quantified using kallisto. These were then aligned to C. elegans genome WS247. Genome_build: WS247 Supplementary_files_format_and_content: Csv; Contains tpm values for each replicate