GSE214108 Processing Pipeline
RNA-Seq
code_examples
3 steps
Publication
An RNA-targeting CRISPR-Cas13d system alleviates disease-related phenotypes in Huntington's disease models.Nature neuroscience (2023) — PMID 36510111
Dataset
GSE214108RNA-Targeting CRISPR/Cas13d System Alleviates Disease-Related Phenotypes in Pre-clinical Models of Huntingtonâs Disease (Human).
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
RNAseq reads were adapter-trimmed using Cutadapt (v1.14) and mapped to human-specific repetitive elements from RepBase (version 18.05) by STAR (v2.4.0i) (Dobin et al., 2013).
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star # Define variables NUM_THREADS=8 # Example number of threads INPUT_READS="trimmed_rnaseq_reads.fastq.gz" # Placeholder for adapter-trimmed RNAseq reads OUTPUT_DIR="star_repbase_alignment" REPBASE_STAR_INDEX="repbase_18.05_star_index" # Placeholder for the STAR genome index built from RepBase v18.05 # Create output directory mkdir -p "${OUTPUT_DIR}" # Run STAR to map RNAseq reads to human-specific repetitive elements from RepBase STAR \ --runThreadN "${NUM_THREADS}" \ --genomeDir "${REPBASE_STAR_INDEX}" \ --readFilesIn "${INPUT_READS}" \ --outFileNamePrefix "${OUTPUT_DIR}/" \ --outSAMtype BAM SortedByCoordinate \ --outReadsUnmapped Fastx # Optional: to output unmapped reads -
2
Repeat-mapping reads were removed, and remaining reads were mapped to the human genome assembly (hg19) with STAR
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star=2.7.10a # --- Reference Data Setup (Example Paths) --- # Download human genome assembly hg19 FASTA # wget -P /path/to/genome/hg19/ ftp://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz # gunzip /path/to/genome/hg19/hg19.fa.gz # Download Gencode v19 GTF annotation for hg19 (recommended for RNA-seq) # wget -P /path/to/genome/hg19/ ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz # gunzip /path/to/genome/hg19/gencode.v19.annotation.gtf.gz # Create STAR genome index (if not already created) # Replace /path/to/star_index/hg19 with your desired index directory # Replace /path/to/genome/hg19/hg19.fa with your FASTA file path # Replace /path/to/genome/hg19/gencode.v19.annotation.gtf with your GTF file path # --sjdbOverhang 100 is a common value for read lengths around 100bp. Adjust if your reads are different. # STAR --runMode genomeGenerate \ # --genomeDir /path/to/star_index/hg19 \ # --genomeFastaFiles /path/to/genome/hg19/hg19.fa \ # --sjdbGTFfile /path/to/genome/hg19/gencode.v19.annotation.gtf \ # --sjdbOverhang 100 \ # --runThreadN 8 # --- Alignment Command --- # Assumes input_reads_R1.fastq.gz and input_reads_R2.fastq.gz are the paired-end reads # after repeat-mapping reads have been removed (or STAR will filter them with --outFilterMultimapNmax 1). # Replace /path/to/star_index/hg19 with the actual path to your STAR genome index. # Adjust --runThreadN based on available CPU cores. # Adjust --limitBAMsortRAM based on available RAM (e.g., 30GB for 30000000000 bytes). STAR --genomeDir /path/to/star_index/hg19 \ --readFilesIn input_reads_R1.fastq.gz input_reads_R2.fastq.gz \ --readFilesCommand zcat \ --runThreadN 8 \ --outFileNamePrefix aligned_reads_ \ --outSAMtype BAM SortedByCoordinate \ --outSAMattributes Standard \ --outFilterMultimapNmax 1 \ --outFilterType BySJout \ --outFilterMismatchNmax 999 \ --outFilterMismatchNoverLmax 0.04 \ --alignIntronMin 20 \ --alignIntronMax 1000000 \ --alignMatesGapMax 1000000 \ --limitBAMsortRAM 30000000000 -
3
Read counts for all genes annotated in GENCODE (hg19) were calculated using the read summarization program featureCounts (Liao et al., 2014).
featureCounts v1.14.0 (Inferred from publication year)$ Bash example
# Install featureCounts (part of Rsubread package) # conda install -c bioconda rsubread # Download GENCODE hg19 annotation GTF file (release 19 corresponds to hg19) # Note: This is a large file and may take some time to download. # wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz # gunzip gencode.v19.annotation.gtf.gz # Placeholder for input BAM file (replace with your actual aligned BAM file) INPUT_BAM="aligned_reads.bam" # Output file for gene counts OUTPUT_COUNTS="gene_counts.txt" # GENCODE hg19 annotation file GENCODE_GTF="gencode.v19.annotation.gtf" # Run featureCounts to calculate read counts for all genes # -a: Annotation file (GTF/GFF format) # -o: Output file for read counts # -F GTF: Specify annotation file format as GTF # -t exon: Summarize reads mapping to 'exon' features (default for gene counting) # -g gene_id: Group features by 'gene_id' attribute to count reads per gene (default) # -s 0: Unstranded library (default, assuming no strandedness specified in description) # Use -s 1 for stranded forward, -s 2 for stranded reverse featureCounts \ -a "${GENCODE_GTF}" \ -o "${OUTPUT_COUNTS}" \ -F GTF \ -t exon \ -g gene_id \ -s 0 \ "${INPUT_BAM}"
Tools Used
Raw Source Text
RNAseq reads were adapter-trimmed using Cutadapt (v1.14) and mapped to human-specific repetitive elements from RepBase (version 18.05) by STAR (v2.4.0i) (Dobin et al., 2013). Repeat-mapping reads were removed, and remaining reads were mapped to the human genome assembly (hg19) with STAR Read counts for all genes annotated in GENCODE (hg19) were calculated using the read summarization program featureCounts (Liao et al., 2014). Assembly: hg19 Supplementary files format and content: FeatureCounts.txt contains counts across CDS regions taken from Gencode v29 annotations