GSE215251 Processing Pipeline
Publication
Transcriptome regulation by PARP13 in basal and antiviral states in human cells.iScience (2024) — PMID 38495826
Processing Steps
Generate Jupyter Notebook-
1
Reads were first trimmed of adapters and low-complexity sequences with cutadapt 1.14 (-O 5 -f fastq --match-read-wildcards --times 2 -e 0.0 --quality-cutoff 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT)
$ Bash example
# Install cutadapt if not already installed # conda install -c bioconda cutadapt=1.14 # Define input and output files (placeholders) INPUT_FASTQ="input.fastq" OUTPUT_FASTQ="output.fastq" # Reads were first trimmed of adapters and low-complexity sequences cutadapt -O 5 -f fastq --match-read-wildcards --times 2 -e 0.0 --quality-cutoff 6 -m 18 \ -b TCGTATGCCGTCTTCTGCTTG \ -b ATCTCGTATGCCGTCTTCTGCTTG \ -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC \ -b GATCGGAAGAGCACACGTCTGAACTCCAGTCAC \ -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA \ -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT \ -o "${OUTPUT_FASTQ}" "${INPUT_FASTQ}" -
2
Trimmed reads were then sorted with fastq-tools (fastq-sort)
$ Bash example
# Install fastq-tools (example using conda, adjust as needed) # conda install -c bioconda fastq-tools # Sort trimmed reads # Assuming 'trimmed_reads.fastq' is the input file fastq-sort trimmed_reads.fastq > sorted_reads.fastq
-
3
Trimmed reads were mapped against RepBase with STAR v2.4.0j to remove reads mapping to repetitive sequences (--outFilterMultimapNmax 10 --alignEndsType EndToEnd --outFilterMultimapScoreRange 1 --outSAMmode Full --outFilterType BySJout --outSAMtype BAM Unsorted --outFilterScoreMin 10 --outReadsUnmapped Fastx --outSAMattributes All)
$ Bash example
# Install STAR if not already installed # conda install -c bioconda star # Placeholder for STAR index creation for RepBase (if not already done) # Replace repbase.fasta with the actual RepBase FASTA file and adjust threads. # STAR --runMode genomeGenerate --genomeDir repbase_star_index --genomeFastaFiles repbase.fasta --runThreadN <num_threads> # Map trimmed reads against RepBase to identify and remove repetitive sequences # Input: trimmed_reads.fastq.gz (or .fq.gz, .fasta, .fa, .bam) # Output: repbase_filtered_Unmapped.out.mate1 (and mate2 if paired-end) containing reads that did NOT map to RepBase # Output: repbase_filtered_Aligned.out.bam containing reads that DID map to RepBase STAR \ --genomeDir repbase_star_index \ --readFilesIn trimmed_reads.fastq.gz \ --outFileNamePrefix repbase_filtered_ \ --outFilterMultimapNmax 10 \ --alignEndsType EndToEnd \ --outFilterMultimapScoreRange 1 \ --outSAMmode Full \ --outFilterType BySJout \ --outSAMtype BAM Unsorted \ --outFilterScoreMin 10 \ --outReadsUnmapped Fastx \ --outSAMattributes All \ --runThreadN 8 # Example: use 8 threads, adjust as needed
-
4
Remaining reads were mapped to the appropriate genome build (hg19) using STAR aligner (--outFilterMultimapNmax 10 --alignEndsType EndToEnd --outFilterMultimapScoreRange 1 --outSAMmode Full --outFilterType BySJout --outSAMtype BAM Unsorted --outFilterScoreMin 10 --outReadsUnmapped Fastx --outSAMattributes All)
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star=2.7.10a # Placeholder variables STAR_INDEX_DIR="/path/to/STAR_index/hg19" # Replace with actual path to hg19 STAR index INPUT_READS="remaining_reads.fastq.gz" # Replace with actual input FASTQ file (e.g., from a trimming step) OUTPUT_PREFIX="aligned_reads_" # Prefix for output files NUM_THREADS=8 # Adjust as needed for your system # Execute STAR alignment STAR --genomeDir "${STAR_INDEX_DIR}" \ --readFilesIn "${INPUT_READS}" \ --runThreadN "${NUM_THREADS}" \ --outFileNamePrefix "${OUTPUT_PREFIX}" \ --outFilterMultimapNmax 10 \ --alignEndsType EndToEnd \ --outFilterMultimapScoreRange 1 \ --outSAMmode Full \ --outFilterType BySJout \ --outSAMtype BAM Unsorted \ --outFilterScoreMin 10 \ --outReadsUnmapped Fastx \ --outSAMattributes All -
5
featureCounts was used to count reads according to gencode v19 annotations (-s 2 -M)
$ Bash example
# Install Subread package (includes featureCounts) # conda install -c bioconda subread=2.0.6 # Define input and output files INPUT_BAM="aligned_reads.bam" # Placeholder for input BAM file(s) OUTPUT_COUNTS="gene_counts.txt" # Placeholder for output counts file GENCODE_GTF="/path/to/gencode.v19.annotation.gtf" # Placeholder for Gencode v19 GTF file path # Execute featureCounts featureCounts -a "${GENCODE_GTF}" -o "${OUTPUT_COUNTS}" -s 2 -M "${INPUT_BAM}"
Tools Used
Raw Source Text
Reads were first trimmed of adapters and low-complexity sequences with cutadapt 1.14 (-O 5 -f fastq --match-read-wildcards --times 2 -e 0.0 --quality-cutoff 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT) Trimmed reads were then sorted with fastq-tools (fastq-sort) Trimmed reads were mapped against RepBase with STAR v2.4.0j to remove reads mapping to repetitive sequences (--outFilterMultimapNmax 10 --alignEndsType EndToEnd --outFilterMultimapScoreRange 1 --outSAMmode Full --outFilterType BySJout --outSAMtype BAM Unsorted --outFilterScoreMin 10 --outReadsUnmapped Fastx --outSAMattributes All) Remaining reads were mapped to the appropriate genome build (hg19) using STAR aligner (--outFilterMultimapNmax 10 --alignEndsType EndToEnd --outFilterMultimapScoreRange 1 --outSAMmode Full --outFilterType BySJout --outSAMtype BAM Unsorted --outFilterScoreMin 10 --outReadsUnmapped Fastx --outSAMattributes All) featureCounts was used to count reads according to gencode v19 annotations (-s 2 -M) Assembly: hg19 Supplementary files format and content: bigwigs contain RPM-normalized read densities of uniquely-mapped reads Supplementary files format and content: counts text files contain output from featureCounts