GSE214110 Processing Pipeline
GSE
code_examples
3 steps
Publication
An RNA-targeting CRISPR-Cas13d system alleviates disease-related phenotypes in Huntington's disease models.Nature neuroscience (2023) — PMID 36510111
Dataset
GSE214110RNA-Targeting CRISPR/Cas13d System Alleviates Disease-Related Phenotypes in Pre-clinical Models of Huntingtonâs Disease
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
RNAseq reads were adapter-trimmed using Cutadapt (v1.14) and mapped to human-specific repetitive elements from RepBase (version 18.05) by STAR (v2.4.0i) (Dobin et al., 2013).
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star # Define variables READS_FILE="trimmed_rnaseq_reads.fastq.gz" # Placeholder for adapter-trimmed RNAseq reads STAR_INDEX_DIR="repbase_star_index" # Placeholder for STAR index of RepBase v18.05 human repetitive elements OUTPUT_DIR="star_mapping_repbase" # Create output directory mkdir -p "${OUTPUT_DIR}" # Run STAR for mapping STAR \ --genomeDir "${STAR_INDEX_DIR}" \ --readFilesIn "${READS_FILE}" \ --runThreadN 8 \ --outFileNamePrefix "${OUTPUT_DIR}/" \ --outSAMtype BAM SortedByCoordinate -
2
Repeat-mapping reads were removed, and remaining reads were mapped to the human genome assembly (hg19) with STAR
$ Bash example
# Install STAR if not already installed # conda install -c bioconda star # --- Prepare STAR genome index (run once) --- # Replace /path/to/hg19_fasta and /path/to/hg19_gtf with actual paths # mkdir -p /path/to/STAR_index/hg19 # STAR --runThreadN 16 \ # --runMode genomeGenerate \ # --genomeDir /path/to/STAR_index/hg19 \ # --genomeFastaFiles /path/to/hg19_fasta/hg19.fa \ # --sjdbGTFfile /path/to/hg19_gtf/hg19.gtf \ # --sjdbOverhang 100 # Recommended: read_length - 1 # # For ENCODE-like pipelines, additional parameters might be used for genome generation, # # e.g., --genomeSAindexNbases 14 for smaller genomes or specific applications. # --- Align reads with STAR --- # Input FASTQ file (assuming it's gzipped and pre-processed if necessary) INPUT_FASTQ="input_reads.fastq.gz" # Output directory for STAR results OUTPUT_DIR="star_output" # Prefix for output files OUTPUT_PREFIX="${OUTPUT_DIR}/star_aligned" # Path to the pre-built STAR genome index for hg19 GENOME_DIR="/path/to/STAR_index/hg19" # Number of threads to use NUM_THREADS=16 # Adjust based on available resources mkdir -p "${OUTPUT_DIR}" STAR --genomeDir "${GENOME_DIR}" \ --readFilesIn "${INPUT_FASTQ}" \ --readFilesCommand zcat \ --outFileNamePrefix "${OUTPUT_PREFIX}" \ --runThreadN "${NUM_THREADS}" \ --outFilterMultimapNmax 1 \ --outSAMtype BAM SortedByCoordinate \ --outSAMattributes NH HI AS NM MD -
3
Read counts for all genes annotated in GENCODE (hg19) were calculated using the read summarization program featureCounts (Liao et al., 2014).
featureCounts v1.14.6 (Inferred from publication date 2014)$ Bash example
# Install featureCounts (part of Rsubread package) # conda install -c bioconda r-rsubread # Download GENCODE hg19 annotation (release 19 is a common choice for hg19) # wget -O gencode.v19.annotation.gtf.gz ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz # gunzip gencode.v19.annotation.gtf.gz # Run featureCounts to calculate read counts for all genes # Assumes 'input.bam' is the aligned BAM file and 'gencode.v19.annotation.gtf' is the unzipped annotation file. # -a: Annotation file # -F GTF: Specify annotation file format as GTF # -t exon: Count features of type 'exon' # -g gene_id: Group features by 'gene_id' attribute to count reads per gene # -o: Output file for read counts featureCounts -a gencode.v19.annotation.gtf -F GTF -t exon -g gene_id -o gene_counts.txt input.bam
Tools Used
Raw Source Text
RNAseq reads were adapter-trimmed using Cutadapt (v1.14) and mapped to human-specific repetitive elements from RepBase (version 18.05) by STAR (v2.4.0i) (Dobin et al., 2013). Repeat-mapping reads were removed, and remaining reads were mapped to the human genome assembly (hg19) with STAR Read counts for all genes annotated in GENCODE (hg19) were calculated using the read summarization program featureCounts (Liao et al., 2014). Assembly: hg19 Supplementary files format and content: FeatureCounts.txt contains counts across CDS regions taken from Gencode v29 annotations