GSE220845 Processing Pipeline
RNA-Seq
code_examples
1 step
Publication
Proteomic discovery of chemical probes that perturb protein complexes in human cells.Molecular cell (2023) — PMID 37084731
Dataset
GSE220845Proteomic discovery of chemical probes that perturb protein complexes in human cells II
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
For quantification of alternative RNA splicing, fastq files are adapter-trimmed by cutadapt 3.4 and aligned to GRCh38 by STAR 2.7 .6 and analyzed using rMATS (v4.1.1) (Shen et al., 2014) using the GENCODE (v35) GTF annotation for GRCh38 and the following parameters: -t paired --libType fr-unstranded âreadLength 150 --novelSS
$ Bash example
# Install STAR if not already installed # conda install -c bioconda star # Define variables for input and output FASTQ_R1="path/to/your/sample_R1.fastq.gz" # Adapter-trimmed fastq file R1 FASTQ_R2="path/to/your/sample_R2.fastq.gz" # Adapter-trimmed fastq file R2 OUTPUT_DIR="path/to/your/output_directory" SAMPLE_PREFIX="sample_name" # e.g., SRR1234567 # Reference datasets (GRCh38, GENCODE v35 GTF) # Ensure the STAR genome index for GRCh38 (with GENCODE v35) is pre-built. # Example command to build index (run once): # STAR --runMode genomeGenerate --genomeDir "path/to/your/STAR_genome_index/GRCh38_gencode_v35" \ # --genomeFastaFiles "path/to/your/references/GRCh38.primary_assembly.genome.fa" \ # --sjdbGTFfile "path/to/your/annotations/gencode.v35.annotation.gtf" \ # --sjdbOverhang 149 --runThreadN 16 # Adjust sjdbOverhang to read length - 1 (150-1=149) GENOME_DIR="path/to/your/STAR_genome_index/GRCh38_gencode_v35" # Path to pre-built STAR index GTF_FILE="path/to/your/annotations/gencode.v35.annotation.gtf" # GENCODE v35 GTF file NUM_THREADS=8 # Adjust as needed READ_LENGTH=150 # Inferred from rMATS parameter --readLength 150 # Create output directory if it doesn't exist mkdir -p "${OUTPUT_DIR}/aligned_reads" # Run STAR alignment STAR \ --genomeDir "${GENOME_DIR}" \ --readFilesIn "${FASTQ_R1}" "${FASTQ_R2}" \ --readFilesCommand zcat \ --runThreadN "${NUM_THREADS}" \ --outFileNamePrefix "${OUTPUT_DIR}/aligned_reads/${SAMPLE_PREFIX}." \ --outSAMtype BAM SortedByCoordinate \ --outSAMunmapped Within \ --quantMode GeneCounts \ --sjdbGTFfile "${GTF_FILE}" \ --sjdbOverhang $((READ_LENGTH - 1)) # Read length - 1 for splice junction database overhang
Tools Used
Raw Source Text
For quantification of alternative RNA splicing, fastq files are adapter-trimmed by cutadapt 3.4 and aligned to GRCh38 by STAR 2.7 .6 and analyzed using rMATS (v4.1.1) (Shen et al., 2014) using the GENCODE (v35) GTF annotation for GRCh38 and the following parameters: -t paired --libType fr-unstranded âreadLength 150 --novelSS Supplementary files format and content: hg38 Supplementary files format and content: tsv file represents rMATs output files . For column definitions, see https://github.com/Xinglab/rmats-turbo Supplementary files format and content: file naming is SAMPLE1_SAMPLE2_EVENTTYPE.MATS.JC.txt