GSE214110 Processing Pipeline

GSE code_examples 3 steps

Publication

An RNA-targeting CRISPR-Cas13d system alleviates disease-related phenotypes in Huntington's disease models.

Nature neuroscience (2023) — PMID 36510111

Dataset

RNA-Targeting CRISPR/Cas13d System Alleviates Disease-Related Phenotypes in Pre-clinical Models of Huntingtonâs Disease

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

RNAseq reads were adapter-trimmed using Cutadapt (v1.14) and mapped to human-specific repetitive elements from RepBase (version 18.05) by STAR (v2.4.0i) (Dobin et al., 2013).

STAR v2.4.0i GitHub

$ Bash example

# Install STAR (if not already installed)
# conda install -c bioconda star

# Define variables
READS_FILE="trimmed_rnaseq_reads.fastq.gz" # Placeholder for adapter-trimmed RNAseq reads
STAR_INDEX_DIR="repbase_star_index" # Placeholder for STAR index of RepBase v18.05 human repetitive elements
OUTPUT_DIR="star_mapping_repbase"

# Create output directory
mkdir -p "${OUTPUT_DIR}"

# Run STAR for mapping
STAR \
  --genomeDir "${STAR_INDEX_DIR}" \
  --readFilesIn "${READS_FILE}" \
  --runThreadN 8 \
  --outFileNamePrefix "${OUTPUT_DIR}/" \
  --outSAMtype BAM SortedByCoordinate

View on GitHub

Repeat-mapping reads were removed, and remaining reads were mapped to the human genome assembly (hg19) with STAR

STAR v2.5.2b GitHub

$ Bash example

# Install STAR if not already installed
# conda install -c bioconda star

# --- Prepare STAR genome index (run once) ---
# Replace /path/to/hg19_fasta and /path/to/hg19_gtf with actual paths
# mkdir -p /path/to/STAR_index/hg19
# STAR --runThreadN 16 \
#      --runMode genomeGenerate \
#      --genomeDir /path/to/STAR_index/hg19 \
#      --genomeFastaFiles /path/to/hg19_fasta/hg19.fa \
#      --sjdbGTFfile /path/to/hg19_gtf/hg19.gtf \
#      --sjdbOverhang 100 # Recommended: read_length - 1
#      # For ENCODE-like pipelines, additional parameters might be used for genome generation,
#      # e.g., --genomeSAindexNbases 14 for smaller genomes or specific applications.

# --- Align reads with STAR ---
# Input FASTQ file (assuming it's gzipped and pre-processed if necessary)
INPUT_FASTQ="input_reads.fastq.gz"
# Output directory for STAR results
OUTPUT_DIR="star_output"
# Prefix for output files
OUTPUT_PREFIX="${OUTPUT_DIR}/star_aligned"
# Path to the pre-built STAR genome index for hg19
GENOME_DIR="/path/to/STAR_index/hg19"
# Number of threads to use
NUM_THREADS=16 # Adjust based on available resources

mkdir -p "${OUTPUT_DIR}"

STAR --genomeDir "${GENOME_DIR}" \
     --readFilesIn "${INPUT_FASTQ}" \
     --readFilesCommand zcat \
     --outFileNamePrefix "${OUTPUT_PREFIX}" \
     --runThreadN "${NUM_THREADS}" \
     --outFilterMultimapNmax 1 \
     --outSAMtype BAM SortedByCoordinate \
     --outSAMattributes NH HI AS NM MD

View on GitHub

Read counts for all genes annotated in GENCODE (hg19) were calculated using the read summarization program featureCounts (Liao et al., 2014).

featureCounts v1.14.6 (Inferred from publication date 2014)

$ Bash example

# Install featureCounts (part of Rsubread package)
# conda install -c bioconda r-rsubread

# Download GENCODE hg19 annotation (release 19 is a common choice for hg19)
# wget -O gencode.v19.annotation.gtf.gz ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz
# gunzip gencode.v19.annotation.gtf.gz

# Run featureCounts to calculate read counts for all genes
# Assumes 'input.bam' is the aligned BAM file and 'gencode.v19.annotation.gtf' is the unzipped annotation file.
# -a: Annotation file
# -F GTF: Specify annotation file format as GTF
# -t exon: Count features of type 'exon'
# -g gene_id: Group features by 'gene_id' attribute to count reads per gene
# -o: Output file for read counts
featureCounts -a gencode.v19.annotation.gtf -F GTF -t exon -g gene_id -o gene_counts.txt input.bam

Tools Used

STAR

Raw Source Text

RNAseq reads were adapter-trimmed using Cutadapt (v1.14) and mapped to human-specific repetitive elements from RepBase (version 18.05) by STAR (v2.4.0i) (Dobin et al., 2013).
Repeat-mapping reads were removed, and remaining reads were mapped to the human genome assembly (hg19) with STAR
Read counts for all genes annotated in GENCODE (hg19) were calculated using the read summarization program featureCounts (Liao et al., 2014).
Assembly: hg19
Supplementary files format and content: FeatureCounts.txt contains counts across CDS regions taken from Gencode v29 annotations

← Back to Analysis