GSE201897 Processing Pipeline
RNA-Seq
code_examples
3 steps
Publication
MECP2-related pathways are dysregulated in a cortical organoid model of myotonic dystrophy.Science translational medicine (2022) — PMID 35767654
Dataset
GSE201897MECP2-related pathways are dysregulated in a cortical organoid model of Myotonic dystrophy [bulk RNA-Seq]
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
RNAseq reads were adapter-trimmed using Cutadapt (v1.14) and mapped to human-specific repetitive elements from RepBase (version 18.05) by STAR (v2.4.0i) (Dobin et al., 2013).
$ Bash example
# Install STAR if not already available # conda install -c bioconda star=2.4.0i # --- Reference Data Preparation (Conceptual) --- # The description states mapping to "human-specific repetitive elements from RepBase (version 18.05)". # This implies a STAR genome index was generated using these sequences. # Example command to generate such an index (assuming 'repbase_18.05_human_repeats.fasta' is the FASTA file): # STAR --runMode genomeGenerate \ # --genomeDir /path/to/STAR_RepBase_18.05_Index \ # --genomeFastaFiles /path/to/repbase_18.05_human_repeats.fasta \ # --runThreadN 8 # Adjust thread count as needed # --- Alignment Step --- # Assuming adapter-trimmed RNAseq reads are available (e.g., from Cutadapt) # and a STAR genome index for RepBase 18.05 human repetitive elements has been generated. # Define variables for clarity STAR_GENOME_DIR="/path/to/STAR_RepBase_18.05_Index" # Placeholder for the RepBase genome index TRIMMED_READS_R1="sample_trimmed_R1.fastq.gz" # Placeholder for trimmed forward reads TRIMMED_READS_R2="sample_trimmed_R2.fastq.gz" # Placeholder for trimmed reverse reads (adjust for single-end if needed) OUTPUT_PREFIX="sample_repbase_alignment" NUM_THREADS=8 # Adjust thread count as needed STAR --genomeDir "${STAR_GENOME_DIR}" \ --readFilesIn "${TRIMMED_READS_R1}" "${TRIMMED_READS_R2}" \ --runThreadN "${NUM_THREADS}" \ --outFileNamePrefix "${OUTPUT_PREFIX}_" \ --outSAMtype BAM SortedByCoordinate -
2
Repeat-mapping reads were removed, and remaining reads were mapped to the human genome assembly (hg19) with STAR
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star # Define variables # Replace with the actual path to your pre-built hg19 STAR genome index GENOME_DIR="/path/to/STAR_index/hg19" # Replace with your input R1 FASTQ file (e.g., after repeat-mapping reads removal) READS_R1="sample_R1.fastq.gz" # Replace with your input R2 FASTQ file (remove this line if single-end reads) READS_R2="sample_R2.fastq.gz" OUTPUT_PREFIX="sample_aligned_" THREADS=16 # Adjust based on available CPU cores # Note: The description states "Repeat-mapping reads were removed". # This command uses --outFilterMultimapNmax 1 to ensure only uniquely mapping reads are reported by STAR, # aligning with the pre-processing step of removing repeat-mapping reads. # Run STAR alignment STAR \ --genomeDir "${GENOME_DIR}" \ --readFilesIn "${READS_R1}" "${READS_R2}" \ --readFilesCommand zcat \ --runThreadN "${THREADS}" \ --outFileNamePrefix "${OUTPUT_PREFIX}" \ --outSAMtype BAM SortedByCoordinate \ --outSAMattributes Standard \ --outFilterMultimapNmax 1 \ --outFilterType BySJout \ --outFilterScoreMinOverLread 0.3 \ --outFilterMatchNminOverLread 0.3 \ --alignSJDBoverhangMin 1 \ --alignSJoverhangMin 8 \ --alignIntronMin 20 \ --alignIntronMax 1000000 \ --alignMatesGapMax 1000000 \ --limitBAMsortRAM 30000000000 # Adjust based on available RAM (e.g., 30GB) -
3
Read counts for all genes annotated in GENCODE (hg19) were calculated using the read summarization program featureCounts (Liao et al., 2014).
featureCounts v1.4.6-p5$ Bash example
# Install Subread (which includes featureCounts) # conda install -c bioconda subread # Download GENCODE hg19 annotation (if not already available) # For example, from GENCODE archive: # wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz # gunzip gencode.v19.annotation.gtf.gz GENCODE_GTF="gencode.v19.annotation.gtf" # Path to your GENCODE hg19 GTF file INPUT_BAM="input.bam" # Placeholder for your aligned BAM file(s) OUTPUT_COUNTS="gene_counts.txt" # Calculate read counts for all genes using featureCounts # -a: Annotation file # -o: Output file # -F GTF: Specify GTF format for the annotation file # -t exon: Specify 'exon' as the feature type to count # -g gene_id: Specify 'gene_id' as the attribute to group features by (summarizes exon counts to gene level) # -s 0: Assume unstranded data (change to 1 for stranded, 2 for reverse stranded if applicable) # -T 8: Use 8 threads (adjust as needed for performance) featureCounts -a ${GENCODE_GTF} -o ${OUTPUT_COUNTS} -F GTF -t exon -g gene_id -s 0 -T 8 ${INPUT_BAM}
Tools Used
Raw Source Text
RNAseq reads were adapter-trimmed using Cutadapt (v1.14) and mapped to human-specific repetitive elements from RepBase (version 18.05) by STAR (v2.4.0i) (Dobin et al., 2013). Repeat-mapping reads were removed, and remaining reads were mapped to the human genome assembly (hg19) with STAR Read counts for all genes annotated in GENCODE (hg19) were calculated using the read summarization program featureCounts (Liao et al., 2014). Assembly: hg19 Supplementary files format and content: .txts with raw gene counts and RPKMs for each experimental group