GSE214108 Processing Pipeline

RNA-Seq code_examples 3 steps

Publication

An RNA-targeting CRISPR-Cas13d system alleviates disease-related phenotypes in Huntington's disease models.

Nature neuroscience (2023) — PMID 36510111

Dataset

GSE214108

RNA-Targeting CRISPR/Cas13d System Alleviates Disease-Related Phenotypes in Pre-clinical Models of Huntington’s Disease (Human).

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    RNAseq reads were adapter-trimmed using Cutadapt (v1.14) and mapped to human-specific repetitive elements from RepBase (version 18.05) by STAR (v2.4.0i) (Dobin et al., 2013).

    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star
    
    # Define variables
    NUM_THREADS=8 # Example number of threads
    INPUT_READS="trimmed_rnaseq_reads.fastq.gz" # Placeholder for adapter-trimmed RNAseq reads
    OUTPUT_DIR="star_repbase_alignment"
    REPBASE_STAR_INDEX="repbase_18.05_star_index" # Placeholder for the STAR genome index built from RepBase v18.05
    
    # Create output directory
    mkdir -p "${OUTPUT_DIR}"
    
    # Run STAR to map RNAseq reads to human-specific repetitive elements from RepBase
    STAR \
      --runThreadN "${NUM_THREADS}" \
      --genomeDir "${REPBASE_STAR_INDEX}" \
      --readFilesIn "${INPUT_READS}" \
      --outFileNamePrefix "${OUTPUT_DIR}/" \
      --outSAMtype BAM SortedByCoordinate \
      --outReadsUnmapped Fastx # Optional: to output unmapped reads
    
  2. 2

    Repeat-mapping reads were removed, and remaining reads were mapped to the human genome assembly (hg19) with STAR

    STAR v2.7.10a (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star=2.7.10a
    
    # --- Reference Data Setup (Example Paths) ---
    # Download human genome assembly hg19 FASTA
    # wget -P /path/to/genome/hg19/ ftp://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz
    # gunzip /path/to/genome/hg19/hg19.fa.gz
    
    # Download Gencode v19 GTF annotation for hg19 (recommended for RNA-seq)
    # wget -P /path/to/genome/hg19/ ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz
    # gunzip /path/to/genome/hg19/gencode.v19.annotation.gtf.gz
    
    # Create STAR genome index (if not already created)
    # Replace /path/to/star_index/hg19 with your desired index directory
    # Replace /path/to/genome/hg19/hg19.fa with your FASTA file path
    # Replace /path/to/genome/hg19/gencode.v19.annotation.gtf with your GTF file path
    # --sjdbOverhang 100 is a common value for read lengths around 100bp. Adjust if your reads are different.
    # STAR --runMode genomeGenerate \
    #      --genomeDir /path/to/star_index/hg19 \
    #      --genomeFastaFiles /path/to/genome/hg19/hg19.fa \
    #      --sjdbGTFfile /path/to/genome/hg19/gencode.v19.annotation.gtf \
    #      --sjdbOverhang 100 \
    #      --runThreadN 8
    
    # --- Alignment Command ---
    # Assumes input_reads_R1.fastq.gz and input_reads_R2.fastq.gz are the paired-end reads
    # after repeat-mapping reads have been removed (or STAR will filter them with --outFilterMultimapNmax 1).
    # Replace /path/to/star_index/hg19 with the actual path to your STAR genome index.
    # Adjust --runThreadN based on available CPU cores.
    # Adjust --limitBAMsortRAM based on available RAM (e.g., 30GB for 30000000000 bytes).
    STAR --genomeDir /path/to/star_index/hg19 \
         --readFilesIn input_reads_R1.fastq.gz input_reads_R2.fastq.gz \
         --readFilesCommand zcat \
         --runThreadN 8 \
         --outFileNamePrefix aligned_reads_ \
         --outSAMtype BAM SortedByCoordinate \
         --outSAMattributes Standard \
         --outFilterMultimapNmax 1 \
         --outFilterType BySJout \
         --outFilterMismatchNmax 999 \
         --outFilterMismatchNoverLmax 0.04 \
         --alignIntronMin 20 \
         --alignIntronMax 1000000 \
         --alignMatesGapMax 1000000 \
         --limitBAMsortRAM 30000000000
  3. 3

    Read counts for all genes annotated in GENCODE (hg19) were calculated using the read summarization program featureCounts (Liao et al., 2014).

    featureCounts v1.14.0 (Inferred from publication year)
    $ Bash example
    # Install featureCounts (part of Rsubread package)
    # conda install -c bioconda rsubread
    
    # Download GENCODE hg19 annotation GTF file (release 19 corresponds to hg19)
    # Note: This is a large file and may take some time to download.
    # wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz
    # gunzip gencode.v19.annotation.gtf.gz
    
    # Placeholder for input BAM file (replace with your actual aligned BAM file)
    INPUT_BAM="aligned_reads.bam"
    
    # Output file for gene counts
    OUTPUT_COUNTS="gene_counts.txt"
    
    # GENCODE hg19 annotation file
    GENCODE_GTF="gencode.v19.annotation.gtf"
    
    # Run featureCounts to calculate read counts for all genes
    # -a: Annotation file (GTF/GFF format)
    # -o: Output file for read counts
    # -F GTF: Specify annotation file format as GTF
    # -t exon: Summarize reads mapping to 'exon' features (default for gene counting)
    # -g gene_id: Group features by 'gene_id' attribute to count reads per gene (default)
    # -s 0: Unstranded library (default, assuming no strandedness specified in description)
    #       Use -s 1 for stranded forward, -s 2 for stranded reverse
    featureCounts \
      -a "${GENCODE_GTF}" \
      -o "${OUTPUT_COUNTS}" \
      -F GTF \
      -t exon \
      -g gene_id \
      -s 0 \
      "${INPUT_BAM}"
    

Tools Used

Raw Source Text
RNAseq reads were adapter-trimmed using Cutadapt (v1.14) and mapped to human-specific repetitive elements from RepBase (version 18.05) by STAR (v2.4.0i) (Dobin et al., 2013).
Repeat-mapping reads were removed, and remaining reads were mapped to the human genome assembly (hg19) with STAR
Read counts for all genes annotated in GENCODE (hg19) were calculated using the read summarization program featureCounts (Liao et al., 2014).
Assembly: hg19
Supplementary files format and content: FeatureCounts.txt contains counts across CDS regions taken from Gencode v29 annotations
← Back to Analysis