GSE215251 Processing Pipeline

GSE code_examples 5 steps

Publication

Transcriptome regulation by PARP13 in basal and antiviral states in human cells.

iScience (2024) — PMID 38495826

Dataset

GSE215251

Transcriptome Regulation by PARP13 in Basal and Antiviral States in Human Cells

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Reads were first trimmed of adapters and low-complexity sequences with cutadapt 1.14 (-O 5 -f fastq --match-read-wildcards --times 2 -e 0.0 --quality-cutoff 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT)

    cutadapt v1.14 GitHub
    $ Bash example
    # Install cutadapt if not already installed
    # conda install -c bioconda cutadapt=1.14
    
    # Define input and output files (placeholders)
    INPUT_FASTQ="input.fastq"
    OUTPUT_FASTQ="output.fastq"
    
    # Reads were first trimmed of adapters and low-complexity sequences
    cutadapt -O 5 -f fastq --match-read-wildcards --times 2 -e 0.0 --quality-cutoff 6 -m 18 \
    -b TCGTATGCCGTCTTCTGCTTG \
    -b ATCTCGTATGCCGTCTTCTGCTTG \
    -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC \
    -b GATCGGAAGAGCACACGTCTGAACTCCAGTCAC \
    -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA \
    -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT \
    -o "${OUTPUT_FASTQ}" "${INPUT_FASTQ}"
  2. 2

    Trimmed reads were then sorted with fastq-tools (fastq-sort)

    fastq-tools vNot specified GitHub
    $ Bash example
    # Install fastq-tools (example using conda, adjust as needed)
    # conda install -c bioconda fastq-tools
    
    # Sort trimmed reads
    # Assuming 'trimmed_reads.fastq' is the input file
    fastq-sort trimmed_reads.fastq > sorted_reads.fastq
  3. 3

    Trimmed reads were mapped against RepBase with STAR v2.4.0j to remove reads mapping to repetitive sequences (--outFilterMultimapNmax 10 --alignEndsType EndToEnd --outFilterMultimapScoreRange 1 --outSAMmode Full --outFilterType BySJout --outSAMtype BAM Unsorted --outFilterScoreMin 10 --outReadsUnmapped Fastx --outSAMattributes All)

    $ Bash example
    # Install STAR if not already installed
    # conda install -c bioconda star
    
    # Placeholder for STAR index creation for RepBase (if not already done)
    # Replace repbase.fasta with the actual RepBase FASTA file and adjust threads.
    # STAR --runMode genomeGenerate --genomeDir repbase_star_index --genomeFastaFiles repbase.fasta --runThreadN <num_threads>
    
    # Map trimmed reads against RepBase to identify and remove repetitive sequences
    # Input: trimmed_reads.fastq.gz (or .fq.gz, .fasta, .fa, .bam)
    # Output: repbase_filtered_Unmapped.out.mate1 (and mate2 if paired-end) containing reads that did NOT map to RepBase
    # Output: repbase_filtered_Aligned.out.bam containing reads that DID map to RepBase
    STAR \
      --genomeDir repbase_star_index \
      --readFilesIn trimmed_reads.fastq.gz \
      --outFileNamePrefix repbase_filtered_ \
      --outFilterMultimapNmax 10 \
      --alignEndsType EndToEnd \
      --outFilterMultimapScoreRange 1 \
      --outSAMmode Full \
      --outFilterType BySJout \
      --outSAMtype BAM Unsorted \
      --outFilterScoreMin 10 \
      --outReadsUnmapped Fastx \
      --outSAMattributes All \
      --runThreadN 8 # Example: use 8 threads, adjust as needed
    
  4. 4

    Remaining reads were mapped to the appropriate genome build (hg19) using STAR aligner (--outFilterMultimapNmax 10 --alignEndsType EndToEnd --outFilterMultimapScoreRange 1 --outSAMmode Full --outFilterType BySJout --outSAMtype BAM Unsorted --outFilterScoreMin 10 --outReadsUnmapped Fastx --outSAMattributes All)

    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star=2.7.10a
    
    # Placeholder variables
    STAR_INDEX_DIR="/path/to/STAR_index/hg19" # Replace with actual path to hg19 STAR index
    INPUT_READS="remaining_reads.fastq.gz" # Replace with actual input FASTQ file (e.g., from a trimming step)
    OUTPUT_PREFIX="aligned_reads_" # Prefix for output files
    NUM_THREADS=8 # Adjust as needed for your system
    
    # Execute STAR alignment
    STAR --genomeDir "${STAR_INDEX_DIR}" \
         --readFilesIn "${INPUT_READS}" \
         --runThreadN "${NUM_THREADS}" \
         --outFileNamePrefix "${OUTPUT_PREFIX}" \
         --outFilterMultimapNmax 10 \
         --alignEndsType EndToEnd \
         --outFilterMultimapScoreRange 1 \
         --outSAMmode Full \
         --outFilterType BySJout \
         --outSAMtype BAM Unsorted \
         --outFilterScoreMin 10 \
         --outReadsUnmapped Fastx \
         --outSAMattributes All
  5. 5

    featureCounts was used to count reads according to gencode v19 annotations (-s 2 -M)

    featureCounts v2.0.6 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install Subread package (includes featureCounts)
    # conda install -c bioconda subread=2.0.6
    
    # Define input and output files
    INPUT_BAM="aligned_reads.bam" # Placeholder for input BAM file(s)
    OUTPUT_COUNTS="gene_counts.txt" # Placeholder for output counts file
    GENCODE_GTF="/path/to/gencode.v19.annotation.gtf" # Placeholder for Gencode v19 GTF file path
    
    # Execute featureCounts
    featureCounts -a "${GENCODE_GTF}" -o "${OUTPUT_COUNTS}" -s 2 -M "${INPUT_BAM}"

Tools Used

Raw Source Text
Reads were first trimmed of adapters and low-complexity sequences with cutadapt 1.14 (-O 5 -f fastq --match-read-wildcards --times 2 -e 0.0 --quality-cutoff 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT)
Trimmed reads were then sorted with fastq-tools (fastq-sort)
Trimmed reads were mapped against RepBase with STAR v2.4.0j to remove reads mapping to repetitive sequences (--outFilterMultimapNmax 10 --alignEndsType EndToEnd --outFilterMultimapScoreRange 1 --outSAMmode Full --outFilterType BySJout --outSAMtype BAM Unsorted --outFilterScoreMin 10 --outReadsUnmapped Fastx --outSAMattributes All)
Remaining reads were mapped to the appropriate genome build (hg19) using STAR aligner (--outFilterMultimapNmax 10 --alignEndsType EndToEnd --outFilterMultimapScoreRange 1 --outSAMmode Full --outFilterType BySJout --outSAMtype BAM Unsorted --outFilterScoreMin 10 --outReadsUnmapped Fastx --outSAMattributes All)
featureCounts was used to count reads according to gencode v19 annotations (-s 2 -M)
Assembly: hg19
Supplementary files format and content: bigwigs contain RPM-normalized read densities of uniquely-mapped reads
Supplementary files format and content: counts text files contain output from featureCounts
← Back to Analysis