• About
  • Research
  • Publications
  • Conference
  • Training
  • Teaching
  • Skills
  • CV

VirusTaxo: Taxonomic classification of viruses from the genome sequence using k‑mer enrichment

Research
Virology
Bioinformatics
Author

Rajan Saha Raju, Abdullah Al Nahid, Preonath Chondrow Dev, Rashedul Islam

Doi

https://doi.org/10.1016/j.ygeno.2022.110414

Abstract

Classification of viruses into their taxonomic ranks (e.g., order, family, genus) provides a framework to organize an abundant population of viruses. Next-generation metagenomic sequencing technologies lead to a rapid increase in generating sequencing data of viruses which require bioinformatics tools to analyze the taxonomy. Many metagenomic taxonomy classifiers have been developed to study microbiomes, but it is particularly challenging to assign the taxonomy of diverse virus sequences and there is a growing need for dedicated methods to be developed that are optimized to classify virus sequences into their taxa. VirusTaxo, developed using diverse (e.g., 402 DNA and 280 RNA) genera of viruses, has an average accuracy of 93% at genus level prediction in DNA and RNA viruses. VirusTaxo outperformed existing taxonomic classifiers by assigning taxonomy to a larger fraction of metagenomic contigs compared to other methods. Benchmarking of VirusTaxo on a collection of SARS-CoV-2 sequencing libraries and metavirome datasets suggests that VirusTaxo can characterize virus taxonomy from highly diverse contigs and provide a reliable decision on the taxonomy of viruses.

Keywords: Virus Taxonomy, Hierarchical Classification, k-mer, Genome

Results:

Classification of taxonomic ranks of viruses using VirusTaxo


Accuracy of VirusTaxo for order, family, and genus level classification in the pilot dataset.


Benchmarking of VirusTaxo for SARS-CoV-2 genomes

© Copyright 2023, Preonath Chondrow Dev