GSE133479 Processing Pipeline

RNA-Seq code_examples 2 steps

Publication

Longitudinal assessment of tumor development using cancer avatars derived from genetically engineered pluripotent stem cells.

Nature communications (2020) — PMID 31992716

Dataset

GSE133479

Cancer avatars derived from genetically engineered pluripotent stem cells allow for longitudinal assessment of tumor development

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    RNA-seq reads were aligned to the human genome (hg19) with STAR 2.4.0h (outFilterMultimapNmax 20, outFilterMismatchNmax 999, outFilterMismatchNoverLmax 0.04, outFilterIntronMotifs RemoveNoncanonicalUnannotated, outSJfilterOverhangMin 6 6 6 6, seedSearchStartLmax 20, alignSJDBoverhangMin 1) using a gene database constructed from Gencode v19

    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star
    
    # Define variables
    READ1_FASTQ="input_R1.fastq.gz" # Replace with your R1 fastq file
    # READ2_FASTQ="input_R2.fastq.gz" # Uncomment and replace if paired-end
    OUTPUT_PREFIX="aligned_reads/sample_name" # Replace with your desired output prefix
    STAR_INDEX_DIR="/path/to/your/star_index/hg19_gencode_v19" # Replace with the path to your STAR index (built with hg19 and Gencode v19)
    NUM_THREADS=8 # Adjust as needed
    
    # Create output directory if it doesn't exist
    mkdir -p $(dirname "${OUTPUT_PREFIX}")
    
    # Run STAR alignment
    STAR \
      --runThreadN "${NUM_THREADS}" \
      --genomeDir "${STAR_INDEX_DIR}" \
      --readFilesIn "${READ1_FASTQ}" \
      # --readFilesIn "${READ1_FASTQ}" "${READ2_FASTQ}" # Uncomment if paired-end
      --readFilesCommand zcat \
      --outFileNamePrefix "${OUTPUT_PREFIX}" \
      --outFilterMultimapNmax 20 \
      --outFilterMismatchNmax 999 \
      --outFilterMismatchNoverLmax 0.04 \
      --outFilterIntronMotifs RemoveNoncanonicalUnannotated \
      --outSJfilterOverhangMin 6 6 6 6 \
      --seedSearchStartLmax 20 \
      --alignSJDBoverhangMin 1 \
      --outSAMtype BAM SortedByCoordinate \
      --outSAMunmapped Within \
      --outSAMattributes All \
      --outSAMstrandField intronMotif
  2. 2

    Reads that overlap with exon coordinates were counted using HTSeqcount (-s reverse -a 0 -t exon -i gene_id -m union)

    HTSeq (Inferred with models/gemini-2.5-flash) v0.13.5 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install HTSeq (if not already installed)
    # conda install -c bioconda htseq
    
    # Placeholder for aligned reads (BAM/SAM) and annotation (GTF/GFF)
    # Replace 'aligned_reads.bam' with your actual aligned reads file.
    # Replace 'annotation.gtf' with your actual gene annotation file (e.g., from GENCODE, Ensembl).
    # Ensure the GTF/GFF file contains 'exon' features and 'gene_id' attributes.
    
    htseq-count -s reverse -a 0 -t exon -i gene_id -m union aligned_reads.bam annotation.gtf > gene_counts.txt

Tools Used

Raw Source Text
RNA-seq reads were aligned to the human genome (hg19) with STAR 2.4.0h (outFilterMultimapNmax 20, outFilterMismatchNmax 999, outFilterMismatchNoverLmax 0.04, outFilterIntronMotifs RemoveNoncanonicalUnannotated, outSJfilterOverhangMin 6 6 6 6, seedSearchStartLmax 20, alignSJDBoverhangMin 1) using a gene database constructed from Gencode v19
Reads that overlap with exon coordinates were counted using HTSeqcount (-s reverse -a 0 -t exon -i gene_id -m union)
Genome_build: hg19
Supplementary_files_format_and_content: tab-separated file created using featureCounts v1.5.0 (number of reads mapped to each Gencode V.19 gene)
← Back to Analysis