GSE214109 Processing Pipeline
RNA-Seq
code_examples
5 steps
Publication
An RNA-targeting CRISPR-Cas13d system alleviates disease-related phenotypes in Huntington's disease models.Nature neuroscience (2023) — PMID 36510111
Dataset
GSE214109RNA-Targeting CRISPR/Cas13d System Alleviates Disease-Related Phenotypes in Pre-clinical Models of Huntingtonâs Disease (Mouse).
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
RNAseq reads were adapter-trimmed using Cutadapt (v1.14) and mapped to human-specific repetitive elements from RepBase (version 18.05) by STAR (v2.4.0i) (Dobin et al., 2013).
$ Bash example
# Install STAR if not already installed # conda install -c bioconda star=2.4.0i # --- Reference Data Setup (Prerequisites) --- # The description specifies mapping to "human-specific repetitive elements from RepBase (version 18.05)". # This requires obtaining the relevant FASTA file from RepBase and building a STAR index from it. # Placeholder for RepBase 18.05 human-specific repetitive elements FASTA file. # Note: Access to RepBase often requires a license or subscription. # For demonstration, let's assume a file named 'RepBase18.05_human_repeats.fasta' is available. # Example: wget -O RepBase18.05_human_repeats.fasta "URL_TO_REPEATS_FASTA" # Create STAR index for RepBase elements (if not already done) # GENOME_DIR="repbase_index_18.05" # mkdir -p "${GENOME_DIR}" # STAR --runMode genomeGenerate \ # --genomeDir "${GENOME_DIR}" \ # --genomeFastaFiles RepBase18.05_human_repeats.fasta \ # --runThreadN 8 # Adjust threads as needed for index generation # --- STAR Mapping Command --- # Assume adapter-trimmed reads are available from the previous Cutadapt step. TRIMMED_READS="cutadapt_trimmed_reads.fastq.gz" # Placeholder for actual trimmed reads file OUTPUT_PREFIX="star_repbase_mapping_output/" GENOME_DIR="repbase_index_18.05" # Directory containing the pre-built STAR index for RepBase mkdir -p "${OUTPUT_PREFIX}" STAR --genomeDir "${GENOME_DIR}" \ --readFilesIn "${TRIMMED_READS}" \ --outFileNamePrefix "${OUTPUT_PREFIX}" \ --runThreadN 8 \ --outSAMtype BAM SortedByCoordinate \ --outReadsUnmapped Fastx # Output unmapped reads to Fastx files, often useful for further analysis -
2
Repeat-mapping reads were removed, and remaining reads were mapped to the mouse genome assembly (mm10) with STAR (v2.4.0i)
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star=2.4.0i # Define variables # Replace with the actual path to your pre-built STAR index for mm10 GENOME_DIR="/path/to/STAR_index/mm10" # Replace with your actual input FASTQ file(s). For paired-end, use "read1.fastq.gz read2.fastq.gz" READS_FILE="input_reads.fastq.gz" OUTPUT_DIR="STAR_output" # Output directory for STAR results NUM_THREADS=8 # Number of threads to use # Create output directory if it doesn't exist mkdir -p "${OUTPUT_DIR}" # Run STAR alignment # --outFilterMultimapNmax 1: Ensures only uniquely mapping reads are reported, effectively "removing" repeat-mapping reads from the output. # --outSAMtype BAM SortedByCoordinate: Outputs a sorted BAM file, which is standard for downstream analysis. # --outBAMcompression 6: Sets BAM compression level. STAR --genomeDir "${GENOME_DIR}" \ --readFilesIn "${READS_FILE}" \ --runThreadN "${NUM_THREADS}" \ --outFileNamePrefix "${OUTPUT_DIR}/" \ --outFilterMultimapNmax 1 \ --outSAMtype BAM SortedByCoordinate \ --outBAMcompression 6 -
3
Read counts for all genes annotated in GENCODE (vM20) were calculated using the read summarization program featureCounts (v1.5.3) (Liao et al., 2014).
$ Bash example
# Install subread (which includes featureCounts) if not already installed # conda install -c bioconda subread # Define variables for input and output ANNOTATION_GTF="gencode.vM20.annotation.gtf" # Placeholder for GENCODE vM20 annotation file INPUT_BAM="aligned_reads.bam" # Placeholder for your input BAM file(s) OUTPUT_COUNTS="gene_counts.txt" # Download GENCODE vM20 annotation if needed (example, adjust URL if necessary) # wget -O ${ANNOTATION_GTF}.gz "ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M20/gencode.vM20.annotation.gtf.gz" # gunzip -f ${ANNOTATION_GTF}.gz # Run featureCounts # -a: Annotation file (GTF/GFF format) # -o: Output file for read counts # -t exon: Specify feature type to count (exons are typically used for gene-level counts) # -g gene_id: Specify attribute to group features by (gene_id for gene-level counts) # -s 0: Strandedness (0=unstranded, 1=forward, 2=reverse). Adjust based on library preparation protocol. # -T 8: Number of threads (adjust as needed for parallel processing) # --primary: Only count primary alignments (recommended for RNA-seq to avoid counting multi-mapped reads multiple times) featureCounts -a "${ANNOTATION_GTF}" \ -o "${OUTPUT_COUNTS}" \ -t exon \ -g gene_id \ -s 0 \ -T 8 \ --primary \ "${INPUT_BAM}" -
4
Batch correction was completed with Combat-seq
$ Bash example
# Install R and Bioconductor if not already installed # For Bioconductor 3.19 (R 4.4): # R -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")' # R -e 'BiocManager::install("sva")' # Create dummy input files for demonstration (replace with your actual data) # Example counts.tsv echo -e "gene\tsample1\tsample2\tsample3\tsample4" > counts.tsv echo -e "geneA\t100\t120\t80\t90" >> counts.tsv echo -e "geneB\t50\t60\t40\t45" >> counts.tsv echo -e "geneC\t200\t210\t180\t190" >> counts.tsv # Example metadata.tsv echo -e "sample\tbatch\tcondition" > metadata.tsv echo -e "sample1\tbatch1\tcontrol" >> metadata.tsv echo -e "sample2\tbatch1\ttreated" >> metadata.tsv echo -e "sample3\tbatch2\tcontrol" >> metadata.tsv echo -e "sample4\tbatch2\ttreated" >> metadata.tsv # R script to perform ComBat-seq batch correction Rscript -e ' library(sva) library(data.table) # For fread/fwrite # Load count data counts_df <- fread("counts.tsv", data.table = FALSE) rownames(counts_df) <- counts_df$gene counts_matrix <- as.matrix(counts_df[, -1]) # Load metadata metadata_df <- fread("metadata.tsv", data.table = FALSE) rownames(metadata_df) <- metadata_df$sample # Ensure sample order matches between counts and metadata metadata_df <- metadata_df[colnames(counts_matrix), ] # Define batch variable batch <- metadata_df$batch # Define biological covariates to preserve (e.g., condition) # If no other covariates, set covar_mod = NULL covar_mod <- model.matrix(~condition, data = metadata_df) # Example with "condition" # Perform ComBat-seq batch correction # For more options, refer to ?ComBat_seq in R corrected_counts <- ComBat_seq( counts = counts_matrix, batch = batch, covar_mod = covar_mod, full_mod = TRUE # Use full_mod = TRUE if covar_mod is a full design matrix ) # Output corrected counts fwrite(as.data.frame(corrected_counts), "corrected_counts_combatseq.tsv", sep = "\t", row.names = TRUE) ' -
5
Differential expression was completed by DESeq2 (v1.5.3)
$ Bash example
# Install Bioconductor and DESeq2 if not already installed. # Note: DESeq2 v1.5.3 is an older version. To install it, you would typically need an older R version (e.g., R 3.0.x or 3.1.x) and an older Bioconductor release (e.g., Bioconductor 2.13 or 2.14). # The general installation method for older Bioconductor packages was: # source("http://bioconductor.org/biocLite.R") # biocLite("DESeq2") # This would install the version compatible with your R version at the time. # Create a placeholder R script for DESeq2 analysis cat << 'EOF' > run_deseq2.R # Load DESeq2 library library(DESeq2) # --- Placeholder for loading count data and sample metadata --- # Replace 'counts.tsv' and 'sample_info.tsv' with your actual input files. # 'counts.tsv' should be a tab-separated or comma-separated file with gene IDs as row names # and sample IDs as column names (e.g., output from featureCounts, HTSeq-count, or aggregated Salmon/Kallisto counts). # 'sample_info.tsv' should be a tab-separated or comma-separated file with sample IDs as row names # and experimental conditions/covariates as columns. # Example: Load count matrix # counts_matrix <- as.matrix(read.table("counts.tsv", header = TRUE, row.names = 1, sep = "\t")) # Example: Load sample metadata # sample_metadata <- read.table("sample_info.tsv", header = TRUE, row_names = 1, sep = "\t")) # Ensure sample names match and are in the same order # if (!all(rownames(sample_metadata) == colnames(counts_matrix))) { # stop("Sample names in metadata and counts matrix do not match or are not in the same order.") # } # --- Create a DESeqDataSet object --- # Replace 'condition' with the actual column name in your sample_metadata that defines your experimental groups. # dds <- DESeqDataSetFromMatrix(countData = counts_matrix, # colData = sample_metadata, # design = ~ condition) # --- Run DESeq2 analysis --- # dds <- DESeq(dds) # --- Extract results --- # Replace 'condition_treatment_vs_control' with your specific contrast. # For example, if 'condition' has levels 'control' and 'treatment': # res <- results(dds, contrast = c("condition", "treatment", "control")) # --- Optional: Apply LFC shrinkage (requires 'apeglm' or 'ashr' package) --- # resLFC <- lfcShrink(dds, coef="condition_treatment_vs_control", type="apeglm") # --- Save results --- # write.csv(as.data.frame(res), file = "deseq2_differential_expression_results.csv") # write.csv(as.data.frame(resLFC), file = "deseq2_lfc_shrinkage_results.csv") # --- Print session info for reproducibility --- sessionInfo() EOF # Execute the R script Rscript run_deseq2.R
Raw Source Text
RNAseq reads were adapter-trimmed using Cutadapt (v1.14) and mapped to human-specific repetitive elements from RepBase (version 18.05) by STAR (v2.4.0i) (Dobin et al., 2013). Repeat-mapping reads were removed, and remaining reads were mapped to the mouse genome assembly (mm10) with STAR (v2.4.0i) Read counts for all genes annotated in GENCODE (vM20) were calculated using the read summarization program featureCounts (v1.5.3) (Liao et al., 2014). Batch correction was completed with Combat-seq Differential expression was completed by DESeq2 (v1.5.3) Assembly: mm10