GSE133479 Processing Pipeline
RNA-Seq
code_examples
2 steps
Publication
Longitudinal assessment of tumor development using cancer avatars derived from genetically engineered pluripotent stem cells.Nature communications (2020) — PMID 31992716
Dataset
GSE133479Cancer avatars derived from genetically engineered pluripotent stem cells allow for longitudinal assessment of tumor development
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
RNA-seq reads were aligned to the human genome (hg19) with STAR 2.4.0h (outFilterMultimapNmax 20, outFilterMismatchNmax 999, outFilterMismatchNoverLmax 0.04, outFilterIntronMotifs RemoveNoncanonicalUnannotated, outSJfilterOverhangMin 6 6 6 6, seedSearchStartLmax 20, alignSJDBoverhangMin 1) using a gene database constructed from Gencode v19
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star # Define variables READ1_FASTQ="input_R1.fastq.gz" # Replace with your R1 fastq file # READ2_FASTQ="input_R2.fastq.gz" # Uncomment and replace if paired-end OUTPUT_PREFIX="aligned_reads/sample_name" # Replace with your desired output prefix STAR_INDEX_DIR="/path/to/your/star_index/hg19_gencode_v19" # Replace with the path to your STAR index (built with hg19 and Gencode v19) NUM_THREADS=8 # Adjust as needed # Create output directory if it doesn't exist mkdir -p $(dirname "${OUTPUT_PREFIX}") # Run STAR alignment STAR \ --runThreadN "${NUM_THREADS}" \ --genomeDir "${STAR_INDEX_DIR}" \ --readFilesIn "${READ1_FASTQ}" \ # --readFilesIn "${READ1_FASTQ}" "${READ2_FASTQ}" # Uncomment if paired-end --readFilesCommand zcat \ --outFileNamePrefix "${OUTPUT_PREFIX}" \ --outFilterMultimapNmax 20 \ --outFilterMismatchNmax 999 \ --outFilterMismatchNoverLmax 0.04 \ --outFilterIntronMotifs RemoveNoncanonicalUnannotated \ --outSJfilterOverhangMin 6 6 6 6 \ --seedSearchStartLmax 20 \ --alignSJDBoverhangMin 1 \ --outSAMtype BAM SortedByCoordinate \ --outSAMunmapped Within \ --outSAMattributes All \ --outSAMstrandField intronMotif -
2
Reads that overlap with exon coordinates were counted using HTSeqcount (-s reverse -a 0 -t exon -i gene_id -m union)
HTSeq (Inferred with models/gemini-2.5-flash) v0.13.5 (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install HTSeq (if not already installed) # conda install -c bioconda htseq # Placeholder for aligned reads (BAM/SAM) and annotation (GTF/GFF) # Replace 'aligned_reads.bam' with your actual aligned reads file. # Replace 'annotation.gtf' with your actual gene annotation file (e.g., from GENCODE, Ensembl). # Ensure the GTF/GFF file contains 'exon' features and 'gene_id' attributes. htseq-count -s reverse -a 0 -t exon -i gene_id -m union aligned_reads.bam annotation.gtf > gene_counts.txt
Tools Used
Raw Source Text
RNA-seq reads were aligned to the human genome (hg19) with STAR 2.4.0h (outFilterMultimapNmax 20, outFilterMismatchNmax 999, outFilterMismatchNoverLmax 0.04, outFilterIntronMotifs RemoveNoncanonicalUnannotated, outSJfilterOverhangMin 6 6 6 6, seedSearchStartLmax 20, alignSJDBoverhangMin 1) using a gene database constructed from Gencode v19 Reads that overlap with exon coordinates were counted using HTSeqcount (-s reverse -a 0 -t exon -i gene_id -m union) Genome_build: hg19 Supplementary_files_format_and_content: tab-separated file created using featureCounts v1.5.0 (number of reads mapped to each Gencode V.19 gene)