GSE185373 Processing Pipeline
RNA-Seq
code_examples
4 steps
Publication
Proteomic discovery of chemical probes that perturb protein complexes in human cells.Molecular cell (2023) — PMID 37084731
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
FASTQ files were first trimmed using Trim_galore (v0.6.4) to remove sequencing adapters and low quality (Q<15) reads.
$ Bash example
# Install Trim Galore (and its dependencies like Cutadapt and FastQC) # conda install -c bioconda trim-galore # Define input and output paths INPUT_FASTQ="input.fastq.gz" # Placeholder for your input FASTQ file OUTPUT_DIR="trimmed_fastq" # Create output directory if it doesn't exist mkdir -p "${OUTPUT_DIR}" # Run Trim Galore to remove sequencing adapters and low quality (Q<15) reads # --quality 15: Trims reads from the 3' end until the average quality in a window is above 15 # --output_dir: Specifies the directory for output files # Trim Galore automatically detects and removes common sequencing adapters. # For paired-end data, use: trim_galore --paired --quality 15 --output_dir "${OUTPUT_DIR}" input_R1.fastq.gz input_R2.fastq.gz trim_galore --quality 15 --output_dir "${OUTPUT_DIR}" "${INPUT_FASTQ}" -
2
Trimmed sequencing reads were aligned to the human Hg19 reference genome (GENCODE, GRCh37.p13) using STAR (v2.7.5).
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star=2.7.5 # Define variables STAR_VERSION="2.7.5" # Placeholder for the STAR genome index for human Hg19 (GRCh37.p13) with GENCODE annotations. # This index would typically be pre-built using a command like: # STAR --runThreadN <threads> --runMode genomeGenerate \ # --genomeDir /path/to/STAR_index_hg19_gencode \ # --genomeFastaFiles /path/to/GRCh37.p13.genome.fa \ # --sjdbGTFfile /path/to/gencode.v19.annotation.gtf \ # --sjdbOverhang 100 # Adjust based on read length GENOME_DIR="/path/to/STAR_index_hg19_gencode_GRCh37.p13" READS_FILE="trimmed_reads.fastq.gz" # Placeholder for input trimmed reads OUTPUT_PREFIX="aligned_reads" THREADS=8 # A common default for --runThreadN # Align reads using STAR STAR --runThreadN ${THREADS} \ --genomeDir ${GENOME_DIR} \ --readFilesIn ${READS_FILE} \ --readFilesCommand zcat \ --outFileNamePrefix ${OUTPUT_PREFIX}_ \ --outSAMtype BAM SortedByCoordinate \ --outSAMunmapped Within \ --outSAMattributes All -
3
SAM files were subsequently converted to BAM files, sorted, and indexed using samtools (v1.9).
$ Bash example
# Install samtools if not already installed # conda install -c bioconda samtools=1.9 # Placeholder for input SAM file INPUT_SAM="input.sam" # Placeholder for output sorted BAM file OUTPUT_BAM_SORTED="output_sorted.bam" # Convert SAM to BAM and sort in one step # samtools sort can take SAM as input and output a sorted BAM file samtools sort "${INPUT_SAM}" -o "${OUTPUT_BAM_SORTED}" # Index the sorted BAM file samtools index "${OUTPUT_BAM_SORTED}" -
4
BAM files were used to generate bigwig files using bamCoverage (part of the Deeptools package; v3.3.1).
$ Bash example
# Install deepTools (if not already installed) # conda install -c bioconda deeptools=3.3.1 # Example usage: Generate a bigwig file from a BAM file # Replace 'input.bam' with your actual BAM file path # Replace 'output.bw' with your desired output bigwig file path bamCoverage -b input.bam -o output.bw --binSize 10 --numberOfProcessors auto
Tools Used
Raw Source Text
FASTQ files were first trimmed using Trim_galore (v0.6.4) to remove sequencing adapters and low quality (Q<15) reads. Trimmed sequencing reads were aligned to the human Hg19 reference genome (GENCODE, GRCh37.p13) using STAR (v2.7.5). SAM files were subsequently converted to BAM files, sorted, and indexed using samtools (v1.9). BAM files were used to generate bigwig files using bamCoverage (part of the Deeptools package; v3.3.1). Genome_build: HG19 Supplementary_files_format_and_content: bigWig