GSE92602 Processing Pipeline
RNA-Seq
code_examples
5 steps
Publication
A role for alternative splicing in circadian control of exocytosis and glucose homeostasis.Genes & development (2020) — PMID 32616519
Dataset
GSE92602Identification of islet-enriched long non-coding RNAs contributing to beta-cell failure in type 2 diabetes
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Base calling was with Illumina GAP Pipeline Software v1.90
Illumina GAP Pipeline Software v1.90$ Bash example
# Base calling is an initial processing step performed by the Illumina sequencing instrument's onboard software. # It converts raw signal data (intensities) from the sequencer into base calls (A, T, C, G) and quality scores. # This process is typically not executed by the user via a command-line tool post-sequencing. # The specified software, Illumina GAP Pipeline Software v1.90, is proprietary to Illumina. # No user-executable command is available for this step.
-
2
Sequences were aligned to mm9 reference genome using Tophat 2.0.8 with option -g with mm9 reference GTF
$ Bash example
# Install TopHat 2.0.8 and its dependencies (e.g., Bowtie 1.x) # Note: TopHat is an older tool and may require specific environment setup. # conda create -n tophat2_env tophat=2.0.8 bowtie=1.1.2 -c bioconda -c conda-forge # conda activate tophat2_env # Define input and output files (placeholders - replace with actual paths) READS_1="input_reads_R1.fastq.gz" # Path to input FASTQ file(s) for read 1 # READS_2="input_reads_R2.fastq.gz" # Uncomment and provide path if paired-end reads OUTPUT_DIR="tophat_alignment_output" # Define reference files (placeholders - replace with actual paths) # mm9 reference genome FASTA file GENOME_FASTA="/path/to/mm9.fa" # mm9 reference GTF file GTF_FILE="/path/to/mm9.gtf" # Prefix for the Bowtie index files built from the mm9 genome BOWTIE_INDEX_PREFIX="/path/to/mm9_bowtie_index/mm9" # --- Pre-computation: Build Bowtie index if not already present --- # TopHat requires a Bowtie index for the reference genome. # If the index for mm9 is not already built at BOWTIE_INDEX_PREFIX, uncomment and run the following: # mkdir -p $(dirname "$BOWTIE_INDEX_PREFIX") # bowtie-build "$GENOME_FASTA" "$BOWTIE_INDEX_PREFIX" # --- Run TopHat 2.0.8 for alignment --- # Align sequences to the mm9 reference genome using the provided GTF for splice junction discovery. # The -g option specifies the GTF file. tophat2 \ -o "$OUTPUT_DIR" \ -g "$GTF_FILE" \ "$BOWTIE_INDEX_PREFIX" \ "$READS_1" # Add "$READS_2" here if using paired-end reads -
3
Novel transcripts were predicted with Cufflinks 2.1.1
$ Bash example
# Install Cufflinks (example using conda) # conda install -c bioconda cufflinks=2.1.1 # Define input and output paths # Replace 'path/to/aligned_reads.bam' with your actual input BAM file INPUT_BAM="path/to/aligned_reads.bam" # Replace 'path/to/reference_annotation.gtf' with your actual reference GTF/GFF file (e.g., from GENCODE or Ensembl) REFERENCE_GTF="path/to/Homo_sapiens.GRCh38.109.gtf" OUTPUT_DIR="cufflinks_novel_transcripts_output" # Create output directory if it doesn't exist mkdir -p "${OUTPUT_DIR}" # Run Cufflinks to predict novel transcripts # -o: Output directory # -g: Reference annotation to guide assembly and identify novel transcripts # --frag-bias-correct: Correct for sequence-specific bias # --multi-read-correct: Correct for reads mapping to multiple locations # -p: Number of threads (adjust as needed) cufflinks \ -o "${OUTPUT_DIR}" \ -g "${REFERENCE_GTF}" \ --frag-bias-correct \ --multi-read-correct \ -p 8 \ "${INPUT_BAM}" -
4
Novel transcripts predictions were merged with mm9 reference genome using Cuffmerge 2.1.1 with option -G with mm9 reference GTF
$ Bash example
# Install Cufflinks suite (which includes Cuffmerge) # conda install -c bioconda cufflinks=2.1.1 # Define reference paths # Placeholder paths for mm9 reference GTF and FASTA. # These files can typically be downloaded from UCSC Genome Browser (e.g., http://hgdownload.soe.ucsc.edu/goldenPath/mm9/bigZips/) # or Ensembl (e.g., ftp://ftp.ensembl.org/pub/release-54/gtf/mus_musculus/) MM9_REFERENCE_GTF="/path/to/mm9.ncbiRefSeq.gtf" MM9_REFERENCE_FASTA="/path/to/mm9.fa" # Define input file(s) for novel transcript predictions. # This should be a text file where each line is the path to a GTF file # containing novel transcript predictions (e.g., from Cufflinks assembly output). NOVEL_TRANSCRIPTS_GTF_LIST="novel_transcript_assemblies.txt" # Define output file for the merged GTF OUTPUT_MERGED_GTF="merged_novel_transcripts.gtf" # Execute Cuffmerge to merge novel transcript predictions with the mm9 reference GTF # The -g option specifies the reference annotation GTF file. # The -s option specifies the reference genome FASTA file. cuffmerge -g "${MM9_REFERENCE_GTF}" -s "${MM9_REFERENCE_FASTA}" "${NOVEL_TRANSCRIPTS_GTF_LIST}" -o "${OUTPUT_MERGED_GTF}" -
5
Counts were generated using htseq-count v0.5.4p3
$ Bash example
# Install HTSeq (if not already installed) # conda install -c bioconda htseq # Example usage of htseq-count for generating gene counts from an alignment file and a GTF annotation. # Parameters are inferred based on common usage for RNA-seq data. # -f bam: Input file format is BAM. # -r pos: Reads are sorted by position. # -s no: Data is unstranded (use 'yes' or 'reverse' for stranded data). # -a 10: Minimum alignment quality score is 10. # -t exon: Feature type to count is 'exon'. # -i gene_id: Attribute in the GTF file to use as feature ID (e.g., gene_id). # aligned_reads.bam: Placeholder for the input alignment file. # gencode.vXX.annotation.gtf: Placeholder for the GTF annotation file (e.g., for human GRCh38/hg38, use a recent Gencode version). # > gene_counts.txt: Output file for the generated counts. htseq-count \ -f bam \ -r pos \ -s no \ -a 10 \ -t exon \ -i gene_id \ aligned_reads.bam \ gencode.vXX.annotation.gtf \ > gene_counts.txt
Raw Source Text
Base calling was with Illumina GAP Pipeline Software v1.90 Sequences were aligned to mm9 reference genome using Tophat 2.0.8 with option -g with mm9 reference GTF Novel transcripts were predicted with Cufflinks 2.1.1 Novel transcripts predictions were merged with mm9 reference genome using Cuffmerge 2.1.1 with option -G with mm9 reference GTF Counts were generated using htseq-count v0.5.4p3 Genome_build: mm9 Supplementary_files_format_and_content: Raw count data for genes were normalized to the relative size of each library using R/Bioconductor package edgeR calcNormFactors function. Count data are provided in tab-delimited format