GSE215251 Processing Pipeline

GSE code_examples 5 steps

Publication

Transcriptome regulation by PARP13 in basal and antiviral states in human cells.

iScience (2024) — PMID 38495826

Dataset

Transcriptome Regulation by PARP13 in Basal and Antiviral States in Human Cells

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Reads were first trimmed of adapters and low-complexity sequences with cutadapt 1.14 (-O 5 -f fastq --match-read-wildcards --times 2 -e 0.0 --quality-cutoff 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT)

cutadapt v1.14 GitHub

$ Bash example

# Install cutadapt if not already installed
# conda install -c bioconda cutadapt=1.14

# Define input and output files (placeholders)
INPUT_FASTQ="input.fastq"
OUTPUT_FASTQ="output.fastq"

# Reads were first trimmed of adapters and low-complexity sequences
cutadapt -O 5 -f fastq --match-read-wildcards --times 2 -e 0.0 --quality-cutoff 6 -m 18 \
-b TCGTATGCCGTCTTCTGCTTG \
-b ATCTCGTATGCCGTCTTCTGCTTG \
-b CGACAGGTTCAGAGTTCTACAGTCCGACGATC \
-b GATCGGAAGAGCACACGTCTGAACTCCAGTCAC \
-b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA \
-b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT \
-o "${OUTPUT_FASTQ}" "${INPUT_FASTQ}"

View on GitHub

Trimmed reads were then sorted with fastq-tools (fastq-sort)

fastq-tools vNot specified GitHub

$ Bash example

# Install fastq-tools (example using conda, adjust as needed)
# conda install -c bioconda fastq-tools

# Sort trimmed reads
# Assuming 'trimmed_reads.fastq' is the input file
fastq-sort trimmed_reads.fastq > sorted_reads.fastq

View on GitHub

Trimmed reads were mapped against RepBase with STAR v2.4.0j to remove reads mapping to repetitive sequences (--outFilterMultimapNmax 10 --alignEndsType EndToEnd --outFilterMultimapScoreRange 1 --outSAMmode Full --outFilterType BySJout --outSAMtype BAM Unsorted --outFilterScoreMin 10 --outReadsUnmapped Fastx --outSAMattributes All)

STAR v2.4.0j GitHub

$ Bash example

# Install STAR if not already installed
# conda install -c bioconda star

# Placeholder for STAR index creation for RepBase (if not already done)
# Replace repbase.fasta with the actual RepBase FASTA file and adjust threads.
# STAR --runMode genomeGenerate --genomeDir repbase_star_index --genomeFastaFiles repbase.fasta --runThreadN <num_threads>

# Map trimmed reads against RepBase to identify and remove repetitive sequences
# Input: trimmed_reads.fastq.gz (or .fq.gz, .fasta, .fa, .bam)
# Output: repbase_filtered_Unmapped.out.mate1 (and mate2 if paired-end) containing reads that did NOT map to RepBase
# Output: repbase_filtered_Aligned.out.bam containing reads that DID map to RepBase
STAR \
  --genomeDir repbase_star_index \
  --readFilesIn trimmed_reads.fastq.gz \
  --outFileNamePrefix repbase_filtered_ \
  --outFilterMultimapNmax 10 \
  --alignEndsType EndToEnd \
  --outFilterMultimapScoreRange 1 \
  --outSAMmode Full \
  --outFilterType BySJout \
  --outSAMtype BAM Unsorted \
  --outFilterScoreMin 10 \
  --outReadsUnmapped Fastx \
  --outSAMattributes All \
  --runThreadN 8 # Example: use 8 threads, adjust as needed

View on GitHub

Remaining reads were mapped to the appropriate genome build (hg19) using STAR aligner (--outFilterMultimapNmax 10 --alignEndsType EndToEnd --outFilterMultimapScoreRange 1 --outSAMmode Full --outFilterType BySJout --outSAMtype BAM Unsorted --outFilterScoreMin 10 --outReadsUnmapped Fastx --outSAMattributes All)

STAR v2.7.10a GitHub

$ Bash example

# Install STAR (if not already installed)
# conda install -c bioconda star=2.7.10a

# Placeholder variables
STAR_INDEX_DIR="/path/to/STAR_index/hg19" # Replace with actual path to hg19 STAR index
INPUT_READS="remaining_reads.fastq.gz" # Replace with actual input FASTQ file (e.g., from a trimming step)
OUTPUT_PREFIX="aligned_reads_" # Prefix for output files
NUM_THREADS=8 # Adjust as needed for your system

# Execute STAR alignment
STAR --genomeDir "${STAR_INDEX_DIR}" \
     --readFilesIn "${INPUT_READS}" \
     --runThreadN "${NUM_THREADS}" \
     --outFileNamePrefix "${OUTPUT_PREFIX}" \
     --outFilterMultimapNmax 10 \
     --alignEndsType EndToEnd \
     --outFilterMultimapScoreRange 1 \
     --outSAMmode Full \
     --outFilterType BySJout \
     --outSAMtype BAM Unsorted \
     --outFilterScoreMin 10 \
     --outReadsUnmapped Fastx \
     --outSAMattributes All

View on GitHub

featureCounts was used to count reads according to gencode v19 annotations (-s 2 -M)

featureCounts v2.0.6 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install Subread package (includes featureCounts)
# conda install -c bioconda subread=2.0.6

# Define input and output files
INPUT_BAM="aligned_reads.bam" # Placeholder for input BAM file(s)
OUTPUT_COUNTS="gene_counts.txt" # Placeholder for output counts file
GENCODE_GTF="/path/to/gencode.v19.annotation.gtf" # Placeholder for Gencode v19 GTF file path

# Execute featureCounts
featureCounts -a "${GENCODE_GTF}" -o "${OUTPUT_COUNTS}" -s 2 -M "${INPUT_BAM}"

View on GitHub

Tools Used

STAR

Raw Source Text

Reads were first trimmed of adapters and low-complexity sequences with cutadapt 1.14 (-O 5 -f fastq --match-read-wildcards --times 2 -e 0.0 --quality-cutoff 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT)
Trimmed reads were then sorted with fastq-tools (fastq-sort)
Trimmed reads were mapped against RepBase with STAR v2.4.0j to remove reads mapping to repetitive sequences (--outFilterMultimapNmax 10 --alignEndsType EndToEnd --outFilterMultimapScoreRange 1 --outSAMmode Full --outFilterType BySJout --outSAMtype BAM Unsorted --outFilterScoreMin 10 --outReadsUnmapped Fastx --outSAMattributes All)
Remaining reads were mapped to the appropriate genome build (hg19) using STAR aligner (--outFilterMultimapNmax 10 --alignEndsType EndToEnd --outFilterMultimapScoreRange 1 --outSAMmode Full --outFilterType BySJout --outSAMtype BAM Unsorted --outFilterScoreMin 10 --outReadsUnmapped Fastx --outSAMattributes All)
featureCounts was used to count reads according to gencode v19 annotations (-s 2 -M)
Assembly: hg19
Supplementary files format and content: bigwigs contain RPM-normalized read densities of uniquely-mapped reads
Supplementary files format and content: counts text files contain output from featureCounts

← Back to Analysis