GSE220845 Processing Pipeline

RNA-Seq code_examples 1 step

Publication

Proteomic discovery of chemical probes that perturb protein complexes in human cells.

Molecular cell (2023) — PMID 37084731

Dataset

GSE220845

Proteomic discovery of chemical probes that perturb protein complexes in human cells II

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    For quantification of alternative RNA splicing, fastq files are adapter-trimmed by cutadapt 3.4 and aligned to GRCh38 by STAR 2.7 .6 and analyzed using rMATS (v4.1.1) (Shen et al., 2014) using the GENCODE (v35) GTF annotation for GRCh38 and the following parameters: -t paired --libType fr-unstranded –readLength 150 --novelSS

    $ Bash example
    # Install STAR if not already installed
    # conda install -c bioconda star
    
    # Define variables for input and output
    FASTQ_R1="path/to/your/sample_R1.fastq.gz" # Adapter-trimmed fastq file R1
    FASTQ_R2="path/to/your/sample_R2.fastq.gz" # Adapter-trimmed fastq file R2
    OUTPUT_DIR="path/to/your/output_directory"
    SAMPLE_PREFIX="sample_name" # e.g., SRR1234567
    
    # Reference datasets (GRCh38, GENCODE v35 GTF)
    # Ensure the STAR genome index for GRCh38 (with GENCODE v35) is pre-built.
    # Example command to build index (run once):
    # STAR --runMode genomeGenerate --genomeDir "path/to/your/STAR_genome_index/GRCh38_gencode_v35" \
    #      --genomeFastaFiles "path/to/your/references/GRCh38.primary_assembly.genome.fa" \
    #      --sjdbGTFfile "path/to/your/annotations/gencode.v35.annotation.gtf" \
    #      --sjdbOverhang 149 --runThreadN 16 # Adjust sjdbOverhang to read length - 1 (150-1=149)
    
    GENOME_DIR="path/to/your/STAR_genome_index/GRCh38_gencode_v35" # Path to pre-built STAR index
    GTF_FILE="path/to/your/annotations/gencode.v35.annotation.gtf" # GENCODE v35 GTF file
    NUM_THREADS=8 # Adjust as needed
    READ_LENGTH=150 # Inferred from rMATS parameter --readLength 150
    
    # Create output directory if it doesn't exist
    mkdir -p "${OUTPUT_DIR}/aligned_reads"
    
    # Run STAR alignment
    STAR \
      --genomeDir "${GENOME_DIR}" \
      --readFilesIn "${FASTQ_R1}" "${FASTQ_R2}" \
      --readFilesCommand zcat \
      --runThreadN "${NUM_THREADS}" \
      --outFileNamePrefix "${OUTPUT_DIR}/aligned_reads/${SAMPLE_PREFIX}." \
      --outSAMtype BAM SortedByCoordinate \
      --outSAMunmapped Within \
      --quantMode GeneCounts \
      --sjdbGTFfile "${GTF_FILE}" \
      --sjdbOverhang $((READ_LENGTH - 1)) # Read length - 1 for splice junction database overhang
    

Tools Used

Raw Source Text
For quantification of alternative RNA splicing, fastq files are adapter-trimmed by cutadapt 3.4 and aligned to GRCh38 by STAR 2.7 .6 and analyzed using rMATS (v4.1.1) (Shen et al., 2014) using the GENCODE (v35) GTF annotation for GRCh38 and the following parameters: -t paired --libType fr-unstranded –readLength 150 --novelSS
Supplementary files format and content: hg38
Supplementary files format and content: tsv file represents rMATs output files . For column definitions, see https://github.com/Xinglab/rmats-turbo
Supplementary files format and content: file naming is SAMPLE1_SAMPLE2_EVENTTYPE.MATS.JC.txt
← Back to Analysis