GSE214110 Processing Pipeline

GSE code_examples 3 steps

Publication

An RNA-targeting CRISPR-Cas13d system alleviates disease-related phenotypes in Huntington's disease models.

Nature neuroscience (2023) — PMID 36510111

Dataset

GSE214110

RNA-Targeting CRISPR/Cas13d System Alleviates Disease-Related Phenotypes in Pre-clinical Models of Huntington’s Disease

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    RNAseq reads were adapter-trimmed using Cutadapt (v1.14) and mapped to human-specific repetitive elements from RepBase (version 18.05) by STAR (v2.4.0i) (Dobin et al., 2013).

    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star
    
    # Define variables
    READS_FILE="trimmed_rnaseq_reads.fastq.gz" # Placeholder for adapter-trimmed RNAseq reads
    STAR_INDEX_DIR="repbase_star_index" # Placeholder for STAR index of RepBase v18.05 human repetitive elements
    OUTPUT_DIR="star_mapping_repbase"
    
    # Create output directory
    mkdir -p "${OUTPUT_DIR}"
    
    # Run STAR for mapping
    STAR \
      --genomeDir "${STAR_INDEX_DIR}" \
      --readFilesIn "${READS_FILE}" \
      --runThreadN 8 \
      --outFileNamePrefix "${OUTPUT_DIR}/" \
      --outSAMtype BAM SortedByCoordinate
  2. 2

    Repeat-mapping reads were removed, and remaining reads were mapped to the human genome assembly (hg19) with STAR

    $ Bash example
    # Install STAR if not already installed
    # conda install -c bioconda star
    
    # --- Prepare STAR genome index (run once) ---
    # Replace /path/to/hg19_fasta and /path/to/hg19_gtf with actual paths
    # mkdir -p /path/to/STAR_index/hg19
    # STAR --runThreadN 16 \
    #      --runMode genomeGenerate \
    #      --genomeDir /path/to/STAR_index/hg19 \
    #      --genomeFastaFiles /path/to/hg19_fasta/hg19.fa \
    #      --sjdbGTFfile /path/to/hg19_gtf/hg19.gtf \
    #      --sjdbOverhang 100 # Recommended: read_length - 1
    #      # For ENCODE-like pipelines, additional parameters might be used for genome generation,
    #      # e.g., --genomeSAindexNbases 14 for smaller genomes or specific applications.
    
    # --- Align reads with STAR ---
    # Input FASTQ file (assuming it's gzipped and pre-processed if necessary)
    INPUT_FASTQ="input_reads.fastq.gz"
    # Output directory for STAR results
    OUTPUT_DIR="star_output"
    # Prefix for output files
    OUTPUT_PREFIX="${OUTPUT_DIR}/star_aligned"
    # Path to the pre-built STAR genome index for hg19
    GENOME_DIR="/path/to/STAR_index/hg19"
    # Number of threads to use
    NUM_THREADS=16 # Adjust based on available resources
    
    mkdir -p "${OUTPUT_DIR}"
    
    STAR --genomeDir "${GENOME_DIR}" \
         --readFilesIn "${INPUT_FASTQ}" \
         --readFilesCommand zcat \
         --outFileNamePrefix "${OUTPUT_PREFIX}" \
         --runThreadN "${NUM_THREADS}" \
         --outFilterMultimapNmax 1 \
         --outSAMtype BAM SortedByCoordinate \
         --outSAMattributes NH HI AS NM MD
  3. 3

    Read counts for all genes annotated in GENCODE (hg19) were calculated using the read summarization program featureCounts (Liao et al., 2014).

    featureCounts v1.14.6 (Inferred from publication date 2014)
    $ Bash example
    # Install featureCounts (part of Rsubread package)
    # conda install -c bioconda r-rsubread
    
    # Download GENCODE hg19 annotation (release 19 is a common choice for hg19)
    # wget -O gencode.v19.annotation.gtf.gz ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz
    # gunzip gencode.v19.annotation.gtf.gz
    
    # Run featureCounts to calculate read counts for all genes
    # Assumes 'input.bam' is the aligned BAM file and 'gencode.v19.annotation.gtf' is the unzipped annotation file.
    # -a: Annotation file
    # -F GTF: Specify annotation file format as GTF
    # -t exon: Count features of type 'exon'
    # -g gene_id: Group features by 'gene_id' attribute to count reads per gene
    # -o: Output file for read counts
    featureCounts -a gencode.v19.annotation.gtf -F GTF -t exon -g gene_id -o gene_counts.txt input.bam

Tools Used

Raw Source Text
RNAseq reads were adapter-trimmed using Cutadapt (v1.14) and mapped to human-specific repetitive elements from RepBase (version 18.05) by STAR (v2.4.0i) (Dobin et al., 2013).
Repeat-mapping reads were removed, and remaining reads were mapped to the human genome assembly (hg19) with STAR
Read counts for all genes annotated in GENCODE (hg19) were calculated using the read summarization program featureCounts (Liao et al., 2014).
Assembly: hg19
Supplementary files format and content: FeatureCounts.txt contains counts across CDS regions taken from Gencode v29 annotations
← Back to Analysis