GSE185373 Processing Pipeline

RNA-Seq code_examples 4 steps

Publication

Proteomic discovery of chemical probes that perturb protein complexes in human cells.

Molecular cell (2023) — PMID 37084731

Dataset

GSE185373

Function-first proteomic strategies for chemical probe discovery in human cells

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    FASTQ files were first trimmed using Trim_galore (v0.6.4) to remove sequencing adapters and low quality (Q<15) reads.

    $ Bash example
    # Install Trim Galore (and its dependencies like Cutadapt and FastQC)
    # conda install -c bioconda trim-galore
    
    # Define input and output paths
    INPUT_FASTQ="input.fastq.gz" # Placeholder for your input FASTQ file
    OUTPUT_DIR="trimmed_fastq"
    
    # Create output directory if it doesn't exist
    mkdir -p "${OUTPUT_DIR}"
    
    # Run Trim Galore to remove sequencing adapters and low quality (Q<15) reads
    # --quality 15: Trims reads from the 3' end until the average quality in a window is above 15
    # --output_dir: Specifies the directory for output files
    # Trim Galore automatically detects and removes common sequencing adapters.
    # For paired-end data, use: trim_galore --paired --quality 15 --output_dir "${OUTPUT_DIR}" input_R1.fastq.gz input_R2.fastq.gz
    trim_galore --quality 15 --output_dir "${OUTPUT_DIR}" "${INPUT_FASTQ}"
  2. 2

    Trimmed sequencing reads were aligned to the human Hg19 reference genome (GENCODE, GRCh37.p13) using STAR (v2.7.5).

    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star=2.7.5
    
    # Define variables
    STAR_VERSION="2.7.5"
    # Placeholder for the STAR genome index for human Hg19 (GRCh37.p13) with GENCODE annotations.
    # This index would typically be pre-built using a command like:
    # STAR --runThreadN <threads> --runMode genomeGenerate \
    #      --genomeDir /path/to/STAR_index_hg19_gencode \
    #      --genomeFastaFiles /path/to/GRCh37.p13.genome.fa \
    #      --sjdbGTFfile /path/to/gencode.v19.annotation.gtf \
    #      --sjdbOverhang 100 # Adjust based on read length
    GENOME_DIR="/path/to/STAR_index_hg19_gencode_GRCh37.p13"
    READS_FILE="trimmed_reads.fastq.gz" # Placeholder for input trimmed reads
    OUTPUT_PREFIX="aligned_reads"
    THREADS=8 # A common default for --runThreadN
    
    # Align reads using STAR
    STAR --runThreadN ${THREADS} \
         --genomeDir ${GENOME_DIR} \
         --readFilesIn ${READS_FILE} \
         --readFilesCommand zcat \
         --outFileNamePrefix ${OUTPUT_PREFIX}_ \
         --outSAMtype BAM SortedByCoordinate \
         --outSAMunmapped Within \
         --outSAMattributes All
  3. 3

    SAM files were subsequently converted to BAM files, sorted, and indexed using samtools (v1.9).

    samtools v1.9 GitHub
    $ Bash example
    # Install samtools if not already installed
    # conda install -c bioconda samtools=1.9
    
    # Placeholder for input SAM file
    INPUT_SAM="input.sam"
    # Placeholder for output sorted BAM file
    OUTPUT_BAM_SORTED="output_sorted.bam"
    
    # Convert SAM to BAM and sort in one step
    # samtools sort can take SAM as input and output a sorted BAM file
    samtools sort "${INPUT_SAM}" -o "${OUTPUT_BAM_SORTED}"
    
    # Index the sorted BAM file
    samtools index "${OUTPUT_BAM_SORTED}"
  4. 4

    BAM files were used to generate bigwig files using bamCoverage (part of the Deeptools package; v3.3.1).

    deepTools v3.3.1 GitHub
    $ Bash example
    # Install deepTools (if not already installed)
    # conda install -c bioconda deeptools=3.3.1
    
    # Example usage: Generate a bigwig file from a BAM file
    # Replace 'input.bam' with your actual BAM file path
    # Replace 'output.bw' with your desired output bigwig file path
    bamCoverage -b input.bam -o output.bw --binSize 10 --numberOfProcessors auto

Tools Used

Raw Source Text
FASTQ files were first trimmed using Trim_galore (v0.6.4) to remove sequencing adapters and low quality (Q<15) reads.
Trimmed sequencing reads were aligned to the human Hg19 reference genome (GENCODE, GRCh37.p13) using STAR (v2.7.5).
SAM files were subsequently converted to BAM files, sorted, and indexed using samtools (v1.9).
BAM files were used to generate bigwig files using bamCoverage (part of the Deeptools package; v3.3.1).
Genome_build: HG19
Supplementary_files_format_and_content: bigWig
← Back to Analysis