GSE220845 Processing Pipeline

RNA-Seq code_examples 1 step

Publication

Proteomic discovery of chemical probes that perturb protein complexes in human cells.

Molecular cell (2023) — PMID 37084731

Dataset

GSE220845

Proteomic discovery of chemical probes that perturb protein complexes in human cells II

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

For quantification of alternative RNA splicing, fastq files are adapter-trimmed by cutadapt 3.4 and aligned to GRCh38 by STAR 2.7 .6 and analyzed using rMATS (v4.1.1) (Shen et al., 2014) using the GENCODE (v35) GTF annotation for GRCh38 and the following parameters: -t paired --libType fr-unstranded âreadLength 150 --novelSS

STAR v2.7 GitHub

$ Bash example

# Install STAR if not already installed
# conda install -c bioconda star

# Define variables for input and output
FASTQ_R1="path/to/your/sample_R1.fastq.gz" # Adapter-trimmed fastq file R1
FASTQ_R2="path/to/your/sample_R2.fastq.gz" # Adapter-trimmed fastq file R2
OUTPUT_DIR="path/to/your/output_directory"
SAMPLE_PREFIX="sample_name" # e.g., SRR1234567

# Reference datasets (GRCh38, GENCODE v35 GTF)
# Ensure the STAR genome index for GRCh38 (with GENCODE v35) is pre-built.
# Example command to build index (run once):
# STAR --runMode genomeGenerate --genomeDir "path/to/your/STAR_genome_index/GRCh38_gencode_v35" \
#      --genomeFastaFiles "path/to/your/references/GRCh38.primary_assembly.genome.fa" \
#      --sjdbGTFfile "path/to/your/annotations/gencode.v35.annotation.gtf" \
#      --sjdbOverhang 149 --runThreadN 16 # Adjust sjdbOverhang to read length - 1 (150-1=149)

GENOME_DIR="path/to/your/STAR_genome_index/GRCh38_gencode_v35" # Path to pre-built STAR index
GTF_FILE="path/to/your/annotations/gencode.v35.annotation.gtf" # GENCODE v35 GTF file
NUM_THREADS=8 # Adjust as needed
READ_LENGTH=150 # Inferred from rMATS parameter --readLength 150

# Create output directory if it doesn't exist
mkdir -p "${OUTPUT_DIR}/aligned_reads"

# Run STAR alignment
STAR \
  --genomeDir "${GENOME_DIR}" \
  --readFilesIn "${FASTQ_R1}" "${FASTQ_R2}" \
  --readFilesCommand zcat \
  --runThreadN "${NUM_THREADS}" \
  --outFileNamePrefix "${OUTPUT_DIR}/aligned_reads/${SAMPLE_PREFIX}." \
  --outSAMtype BAM SortedByCoordinate \
  --outSAMunmapped Within \
  --quantMode GeneCounts \
  --sjdbGTFfile "${GTF_FILE}" \
  --sjdbOverhang $((READ_LENGTH - 1)) # Read length - 1 for splice junction database overhang

View on GitHub

Tools Used

STAR

Raw Source Text

For quantification of alternative RNA splicing, fastq files are adapter-trimmed by cutadapt 3.4 and aligned to GRCh38 by STAR 2.7 .6 and analyzed using rMATS (v4.1.1) (Shen et al., 2014) using the GENCODE (v35) GTF annotation for GRCh38 and the following parameters: -t paired --libType fr-unstranded âreadLength 150 --novelSS
Supplementary files format and content: hg38
Supplementary files format and content: tsv file represents rMATs output files . For column definitions, see https://github.com/Xinglab/rmats-turbo
Supplementary files format and content: file naming is SAMPLE1_SAMPLE2_EVENTTYPE.MATS.JC.txt

← Back to Analysis