Managing vocabularies used for the description of biodiversity resources

Share |

Vocabulary Management Task Group Charter A Task Group of the TAG Interest Group

February 7, 2012 by Dag Endresen   Comments (4)

, , , , , ,



Biodiversity Information Standards


Vocabulary Management Task Group Charter
A Task Group of the TAG Interest Group

Draft version, February 2012


1. Convenor

  • Dag Endresen (GBIF) <dendresen@gbif.org>
  • Éamonn Ó Tuama (GBIF) <eotuama@gbif.org>
  • David Remsen (GBIF) <dremsen@gbif.org>

Global Biodiversity Information Facility (GBIF)
Universitetsparken 15, DK-2100 Copenhagen


2. Core Members

  • Steve Baskauf (Vanderbilt University, Tennessee, USA)
  • Gregor Hagedorn (Julius Kühn-Institute, Germany)
  • ... Please join us!


3. Motivation

Recognising the need for a standards architecture that would provide some basic commonality to enable interoperability across open systems, the TDWG Technical Roadmap 2006[1], 2007[2] and 2008[3] identified community supported vocabularies and ontologies (expressing shared semantics of data) as one of three required components. The other two required components being common exchange protocols and use of persistent identifiers for the data. However, the development, management and governance of such vocabularies remain a challenging area for TDWG and the broader biodiversity community. For example, the GBIF LSID-GUID Task Group highlighted the need for GBIF to work with others to identify on-going support mechanisms for essential shared vocabularies[4], and the GBIF commissioned white paper Recommendations on the use of Knowledge Organization Systems by GBIF[5], while proposing several specific tasks for GBIF in advancing Knowledge Organization Systems (KOS) for biodiversity, separated the need for ontology management in such tools as BioPortal[6] from the lifecycle management of flat vocabularies. The TDWG Darwin Core vocabulary of terms is amongst the most widely deployed of biodiversity vocabularies and both its management and relation to the TDWG Ontologies can help us develop best practices and adopt appropriate tools for other or new vocabularies. In addition, although GBIF has begun to address some gaps by building the GBIF Vocabulary Server that focuses on developing flat extensions to the Darwin Core schema, there is a need to further expand such mechanisms to support general vocabulary creation and maintenance.

The Vocabularies Management Task Group (VOMAG)[7] will complement the RDF/OWL Task Group whose charter[8] goals include “leveraging existing vocabularies, ontologies (...) developing a consensus on the appropriate class/rdf:type designations for biodiversity resources (...) Identify specific ontologies and vocabularies for use in the domain” by helping to ensure a robust KOS framework is in place to support them. The VOMAG will be based on the work initiated by the TDWG Ontology[9].


4. Goals Outputs and Outcomes

The deliverables from this task group will be:

  • Report: requirements for a KOS architecture for biodiversity informatics.
  • Report: principal components of a KOS architecture for biodiversity informatics.
  • Deliver prototype information systems for the collaborative development of basic vocabularies for the description of terms (including Semantic MediaWiki[10]; ISOcat[11]; GBIF Vocabulary Server[12]).
  • The software tools for vocabulary management are to be delivered within one year after the formation of the task group.
  • Develop technical guidelines for management of TDWG vocabularies of basic terms and the best practices for maintaining these fundamental resources.
  • Develop a proposal including technical guidelines for a new TDWG resources repository to provide access to the ratified TDWG vocabularies of terms.
  • Develop technical guidelines and best practices for the management of the Darwin Core Archive schema including extensions and code lists.
  • Guidelines for the relationship and links between the TDWG vocabularies of terms, TDWG ontologies and the Darwin Core Archive schema.


5. Strategy

  • The GBIF Vocabulary Server is available. Eventual new functionalities and further improvements will be evaluated.
  • Prototype implementations of the Semantic MediaWiki and ISOcat are under development.
  • The task group members will evaluate the prototype implementations of these software systems for vocabulary development and maintenance and provide feedback on their suitability, strengths and weaknesses.
  • The evaluation of the prototype software tools will be based on the requirements for the KOS architecture to identify requirements not satisfactorily met by these tools.

What is in the scope?

  • Extract terms and concepts including their definitions from the existing TDWG standards and provide URIs for those terms that are not individually identifiable by an URI.
  • Build a resources discovery and resolution system based on existing resources and services in the TDWG and GBIF network.

What is not in the scope?

  • Development of new vocabularies, terms and concepts.
  • Development of new ontologies.


6. Becoming Involved

  • Anyone with practical experience in developing/managing vocabularies/ontologies.
  • Domain experts with knowledge of biodiversity.
  • You are invited to contact one of the conveners to register your interest.


7. History/Context

  • The TDWG Technical Architecture Group (TAG)[13] was formed at the TDWG 2005 meeting in St Petersburg to develop a common standards-development-architecture for TDWG and to provide compatibility between TDWG standards and standards maintained by other groups.
  • The first TAG meeting[14] identified the persistent identifiers (GUID) (1), the development of a TDWG ontology (2), and the data exchange protocols (3) as the three required components of the TDWG architecture.
  • The TDWG ontology was initiated with three conceptual layers: base (1), core (2) and domain (3) ontologies. The base ontology included the higher-level abstract classes such as persistent identifier, name, title, actor, space, defined term and media object. The core ontologies extend the base ontology to define classes and properties that are common for resources defined by the domain ontologies. And the objects defined by domain ontologies should be typified to the core ontology or as a minimum to the base ontology.
  • In 2011 the TDWG RDF/OWL task group was formed under the TDWG TAG interest group with the focus on mobilizing biodiversity data using semantic web technologies.


8. Summary

  • This proposal concerns the formation of a task group under TAG, focused on the creation and management of basic vocabularies for biodiversity concepts. The underlying principle is to provide a framework, tools, best practices and routines starting with the definition of individual terms and concepts. This task group will not include the development of ontological relationships between the terms and concepts, but form a basic resource to support such activities. The overall objective is not to support activities to define new terms and concepts, but to develop basic vocabularies as far as possible on reusing terms defined by the existing TDWG standards and other related standards from other groups. The Vocabulary Management Task Group (VOMAG) will develop the requirement for a Knowledge Organization System (KOS) for biodiversity information resources and evaluate the best practices for management of vocabularies with basic terms. Some selected vocabulary management software tools such as the Semantic MediaWiki, ISOcat, and the GBIF Vocabulary Server will be implemented as prototypes and evaluated by the group.


9. Resources


NOTE: The TDWG Executive Committee has stipulated that by default, products of TDWG will use the Creative Commons CC-BY license. This outcome is also reflected in the standards specification. As Convener, it is your responsibility to ensure that group members are aware of this when the Task Group is formed.


Note: This is a draft charter to propose a new Vocabulary Management Task Group. Please consider joining this task group! You are also most welcome to help us to develop the final charter by providing your comments below or participating in the other discussion taking part here. You may of course also contact any of the convenors by email. Thank you for your assistance.

Greg Whitbread (TDWG TAG) wrote:

"Personally, I would like to see the deliverables focussed more around recommendations as to appropriate technologies for evolution, naming and management of vocabularies - including proposal/guidelines for the establishment of a TDWG repository and the usability statements covering the ground-rules for participation and reuse. I'm not so sure about best practice for application schema and software development - even though these may be a necessary part of the process. If the scope is a system of vocabularies independent of the language or constraining grammar we might use to deploy them. It might also be useful to clearly state the relationship between the TDWG vocabularies and any ontologies that we develop to use them."

Dag Endresen 2209 days ago

I have now updated the charter to reflect some of the proposed improvements from Greg. Thanks!

Dag Endresen 2208 days ago

Comment from Steve Baskauf to the proposed VoMaG charter at the TDWG TAG mailing list: http://lists.tdwg.org/pipermail/tdwg-tag/2012-February/002429.html

Dag Endresen 2204 days ago