Tropical Diversity (2023) 3(1): 1-6.
ISSN: 2596-2388
DOI: 10.5281/zenodo.10911800
RESEARCH NOTE
© 2023 The Author
1
ACACIA: a generic conceptual schema for taxonomic databases
ACACIA: um esquema conceitual genérico para bancos de dados taxonômicos
Mauro José Cavalcanti1* https://orcid.org/0000-0003-2389-1902
Ecoinformatics Studio, PO Box 18123, CEP 20720-970, Rio de Janeiro, RJ, Brazil.
*Email: maurobio@gmail.com
Received: 3, July 2023 / Accepted: 1, November 2023 / Published: 3, November 2023
Resumo ACACIA é um esquema de dados
genérico para bancos de dados taxonômicos
relacionais e um pacote de programas que o
implementa. Tal esquema permite a representação
eficiente de todas as classes de dados requeridas
para o armazenamento e recuperação de dados de
biodiversidade, do nível taxonômico aos níveis
ecológico e genômico. O pacote de programas
ACACIA para o gerenciamento de bancos de
dados de biodiversidade, desenvolvido na
linguagem PHP, está disponível em
https://github.com/maurobio/acacia sob uma
licença livre GPL v3.
Palavras-Chave: Bancos de dados de biodiversidade,
cibertaxonomia, modelo relacional, BAOBAB,
DELTA.
Abstract ACACIA is both a generic data schema
for relational taxonomic databases and a software
package which implements it. Such schema
allows for the efficient representation of all data
classes required for the storage and retrieval of
biodiversity data, from taxonomic to ecological
and genomic levels. The ACACIA software
package for the management of biodiversity
databases, written in the PHP language, is
available from
https://github.com/maurobio/acacia under the
GPL v3 free license.
Keywords: Biodiversity databases, cybertaxonomy,
relational model, BAOBAB, DELTA.
__________________
Research supported by Conselho Nacional de
Desenvolvimento Científico e Tecnológico (CNPq) [grant
#350389/2011-0] and Fundação da Amparo à Pesquisa do
Estado do Amazonas (FAPEAM) [grant #2275/11].
Cavalcanti (2023)
Generic schema for taxonomic databases
© 2023 The Author
2
Introduction
In the last three decades, a large number of
taxonomic databases have been developed to
address curatorial management processes,
taxonomic revisions and applied biology needs
(Pankhurst, 1991) as well as the growing demand
for large-scale global biodiversity information
systems accessible over the World Wide Web
(Bisby, 2000; Curry & Humphries, 2007).
A basic step towards developing a taxonomic
database system is to build a data model to
describe the entities involved and relationships
among them. This has lead to the development of
several models for the design of taxonomic
databases (Allkin & Bisby, 1988; Allkin & White,
1982, 1988; Berendsohn, 1997; Berendsohn et al.,
1999). Although no doubt useful in clarifying
relationships among taxonomic entities and their
attributes, these models are, however, invariably
of such complexity as to make them rather
difficult to implement and manage (Morris, 2005).
More recently, this approach has been
expanded towards the development of
comprehensive solutions for taxonomic
computing, as the Scratchpad (Smith et al., 2009)
and the EDIT Platform for Cybertaxonomy
(Berendsohn, 2010; Berendsohn et al., 2011). The
former is built on top of the generic content
management system Drupal, whereas the latter
comprises both a common data model and a set of
specialized software tools for interacting with it.
These solutions, however, still do not offer the
average user an independent environment, at the
same time simple to use and to maintain, for
dealing with the complexity of taxonomic data, as
nomenclatural information, the taxonomic
hierarchy, structured and unstructured descriptive
data, geographic information, literature citations,
and ecological and genomic data.
The ACACIA design is an attempt to
overcome this difficulty, offering a simple but
flexible and extensible data model that can be
used as a framework for taxonomic information
systems. ACACIA intends to be a practical
implementation of the ideal of a "universal
biological database structure" (White & Allkin,
1993).
The name "Acacia" is an allusion to "Baobab",
a comprehensive database design for biologists
developed by Allkin & White (1982, 1988) and
partially implemented in the Alice species
diversity database management system (White &
Allkin, 1993; White et al., 1993). The ACACIA
model can be conceived of as a simpler version of
the full BAOBAB design. The genus Acacia has
also been frequently used as an example in
taxonomic database studies (Allkin et al., 1992).
Materials and Methods
As a database scheme, ACACIA is a set of 15
entities or tables based on the relational database
model (Codd, 1970), designed to convey all basic
classes of information required for taxonomic
databases and facilitate the recording of
taxonomic data from literature and other sources
(biological collections, field surveys, and other
databases). Four tables in the set are mandatory in
any ACACIA database: species (storing valid, ie.,
currently accepted, species names), synonyms
Cavalcanti (2023)
Generic schema for taxonomic databases
© 2023 The Author
3
(storing any other names by which a species may
also have been previously described), higher
taxon (storing basic taxonomic categories above
species-level, i.e., kingdom, division or phylum,
class, order, family, as well as intermediary levels
if required) and bibliography (storing
bibliographic references to each valid name and
any synonyms). These four tables constitute the
“Taxonomic Core” of the schema and are
mandatory in any ACACIA database, plus a
metadata table storing data about the database
itself (Figure 1). This generic schema can be used
in building taxon-oriented databases, using any
desired combination of computer platform,
operating system, database engine, and
application programming language. Once
implemented, the ACACIA schema can be used to
create a wide variety of taxonomic databases of
diverse content including monographic databases,
species inventories, annotated checklists or
identification keys (when used in combination
with the DELTA system; Dallwitz, 1980).
The ACACIA design is based on and fully
compatible with the International Legume
Database and Information Service (ILDIS) Type
One Data Fields (Bisby, 1989, 1993), as well as
with the Species 2000 (Bisby, 2000) and the
Catalogue of Life (Bisby & Roskov, 2010)
Standard Dataset (which is itself loosely derived
from the ILDIS standard).
The ACACIA scheme is completely neutral
and can be implemented in any relational database
management system, from Sqlite and MySQL to
PostgreSQL and Oracle. So far, all
implementations have been based on MySQL,
because of the widespread availability, ease of
installation, and low footprint of that particular
DBMS.
An integrated software package, written in the
PHP language, has been implemented to manage
databases designed with the ACACIA scheme.
This tool is an interactive data entry, querying,
and editing system for taxonomic databases based
on the generic ACACIA conceptual schema. It
combines the automated use of scientific names
and synonyms in a species checklist with online
access to geographical data, morphological
descriptors, genomics, ecology, vernacular names,
economic uses, structured notes and conservation
status about each species, with all these data being
cross-indexed to a citation list. Interactive keys
may be easily created and published online from
morphological data stored in database and
automatically translated into DELTA format
(Dallwitz, 1980), using the NaviKey Java applet
(Neubacher & Rambold, 2005). For DNA
sequence data, several manipulation routines from
the BioPHP project (http://www.biophp.org) are
provided (for computation of reversals,
complementarity, G+C content, nucleotide
composition, and conversion to RNA).
Cavalcanti (2023)
Generic schema for taxonomic databases
© 2023 The Author
4
Figure 1 The ACACIA database schema.
Results and Discussion
The software design allows for rapid
customization to suit its application to any
taxonomic group (plant, animal, or
microorganism). Since only the data tables
comprising the taxonomic core are mandatory in a
given database based on the ACACIA scheme,
every other data table can be individually
activated or deactivated when configuring the
tool, therefore allowing the database to be tailored
to better suit a specific field of application, as
taxonomy, conservation, or ethnobiology. This
also allows reducing the database server overload,
by avoiding the need to keep in the system empty
unused tables (for example, a genome table in a
database which does not store genomic data).
Three production databases have already
been implemented with the ACACIA schema and
are currently available on the web: (1) the
Neotropical Copepoda (NEOCOP) database
(http://neocop.biotupe.org); (2) the Marine-
derived Amazonian biota Research (MAR)
database (http://mar.biotupe.org); and (3) the
Brazilian Coral Reef Fish
(http://coralfish.scienceontheweb.net) database.
The development of a third database, the
Neotropical Cladocera (NEOCLAD) database
(http://neoclad.biotupe.org), has been started and
is currently in prototype stage.
Conclusions
The current version of the ACACIA software
package works in tandem with a program called
Feronia (in allusion to a goddess associated with
wildlife and fertility in Etruscan and Roman
mythology), written in the Python programming
language; this program allows the fast collation
and building of databases by using several
available web services in order to harvest and/or
check data on nomenclature (from the Catalog of
Life, http://www.catalogueoflife.org), distribution
(from the Global Biodiversity Information
Facility, http://www.gbif.org), genomics (from
GenBank, http://www.ncbi.nlm.nih.gov/genbank)
conservation status and habitats (from the IUCN
Red List (http://www.iucnredlist.org), and
literature (from http://www.itis.gov). Future
versions are expected to improve on this, both by
offering scripts for harvesting data from more
biological web services.
Cavalcanti (2023)
Generic schema for taxonomic databases
© 2023 The Author
5
Acknowledgements
Thanks to Haroldo Cavalcante de Lima
(Jardim Botânico do Rio de Janeiro) and Robert
Allkin (Royal Botanic Gardens, Kew) for
providing information on the ILDIS data standard
and the Alice database system, to Eduardo Dalcin
(Jardim Botânico do Rio de Janeiro) for earlier
discussions of the ACACIA model and useful
suggestions, and to Gustavo Martinelli (Jardim
Botânico do Rio de Janeiro) and Edinaldo Nelson
dos Santos Silva (Instituto Nacional de Pesquisas
da Amazônia) for support and interest. Veridiana
Scudeller (Universidade Federal do Amazonas)
and Valéria Gallo (Universidade do Estado do Rio
de Janeiro) provided constructive reviews and
valuable suggestions which much contributed to
the improvement of the original manuscript. This
paper is dedicated to the memory of Frank A.
Bisby (1945-2011), for his pioneering work on
taxonomic and biodiversity databases.
References
Allkin, R. & Bisby, F. A. 1988. The structure of
monographic databases. Taxon 37: 756-763.
Allkin, R. & White, R. J. 1982. Design criteria for
a computer program to facilitate the
acquisition, storage, retrieval and
reformatting of biological descriptions.
Southampton: University Research Fund
Report.
Allkin, R. & White, R. J. 1988. Data management
models for biological classification. In:
Bock, H.H. (ed.). Classification and Related
Methods of Data Analysis. Amsterdam:
Elsevier, pp. 653-660.
Allkin, R., White, R. J. & Winfield, P. J. 1992.
Handling the taxonomic structure of
biological data. Mathematical and Computer
Modelling 16: 1-9.
Berendsohn, W. G. 1997. A taxonomic
information model for botanical databases:
the IOPI model. Taxon 46: 283-309.
Berendsohn, W. G. 2010. Devising the EDIT
Platform for Cybertaxonomy. In: Nimis, P.
L & Vignes Lebbe, R. (eds.), Tools for
Identifying Biodiversity: Progress and
Problems. Trieste: Edizioni Università di
Trieste, pp. 1-6.
Berendsohn, W. G., Anagnostopoulos, A.,
Hagedorn, G., Jakupovic, J., Nimis, P. L,
Valdés, B., Güntsch, A., Pankhurst, R. J. &
White, R. J. 1999. A comprehensive
reference model for biological collections
and surveys. Taxon 48: 511-562.
Berendsohn, W. G., Güntsch, A., Hoffmann, N.,
Kohlbecker, A., Luther, K., & ller, A.
2011. Biodiversity information platforms:
From standards to interoperability. ZooKeys
150: 71-87.
Bisby, F. A. 1989. Databases, information
systems, and legume research. Monographs
in Systematic Botany from the Missouri
Botanical Garden 29: 811-825.
Bisby, F. A. 1993. Species diversity knowledge
systems. The ILDIS prototype for legumes.
Annals of the New York Academy of
Sciences 700: 159-164.
Cavalcanti (2023)
Generic schema for taxonomic databases
© 2023 The Author
6
Bisby, F. A. 2000. The quiet revolution:
biodiversity informatics and the internet.
Science 289: 2309-2312.
Bisby, F. A. & Roskov, Y. R. 2010. The
Catalogue of Life: towards an integrative
taxonomic backbone for biodiversity. In:
Nimis, P. L & Vignes Lebbe, R. (eds.).
Tools for Identifying Biodiversity: Progress
and Problems. Trieste: Edizioni Università
di Trieste, pp. 37-42.
Codd, E. F. 1970. A relational model of data for
large shared data banks. Communications of
the ACM 13: 377-387.
Curry, G. B. & Humphries, C. J. 2007.
Biodiversity Databases: Techniques,
Politics, and Applications. Boca Raton:
CRC Press.
Dallwitz, M. J. 1980. A general system for coding
taxonomic descriptions. Taxon 29: 41-46.
Morris, P. J. 2005. Relational database design and
implementation for biodiversity informatics.
PhyloInformatics 7: 1-66.
Neubacher, D. & Rambold, G. 2005. NaviKey a
Java applet and application for accessing
descriptive data coded in DELTA format.
http://www.navikey.net.
Pankhurst, R. J. 1991. Practical Taxonomic
Computing. Cambridge: Cambridge
University Press.
Smith, V. S., Rycroft, S. D., Harman, K. T., Scott,
B. & Roberts, D. 2009. Scratchpads: a data-
publishing framework to build, share and
manage information on the diversity of life.
BMC Bioinformatics 10 (Suppl 14): S6.
White, R. J. & Allkin, R. 1993. A strategy for the
evolution of database designs. In: Bisby, F.
A., Russell, G. F. & Pankhurst, R. J. (eds.).
Designs for a Global Plant Species
Information System. Oxford: Oxford
University Press, pp. 284-303.
White, R. J., Allkin, R. & Winfield, P. J. 1993.
Systematic databases: the BAOBAB design
and the ALICE system. In: Fortuner, R.
(ed.). Advances in Computer Methods for
Systematic Biology: Artificial Intelligence,
Databases, Computer Vision. Baltimore:
Johns Hopkins University Press, pp. 297-3