This current community page is also accessible as http:/
The current GBIFKOS Draft White paper is also accessible as
http:/
The GBIFKOS White Paper is a set of recommendations to GBIF for the deployment and community development of Knowledge Organization Systems (e.g. Controlled Vocabularies, Ontologies, Thesauri, etc., and their management) for biodiversity information systems.
You can fetch a copy of the draft White Paper HERE. Instructions for commenting on it are below.
Your comments are meant to inform the White Paper authors and may or may not be incorporated or addressed in the final draft. However, all will be available to GBIF and become part of the public record. In email comments you can indicate whether you wish that your comments not become public.
Regrettably the time for initial public comment is very short. Anything not received by December 7, 2010 will not influence the final draft of the White Paper, though will remain available for GBIF to consider as it acts on the White Paper recommendations.
There are several ways you can comment:
We appreciate your comments and look forward to considering them for our advice to GBIF
Bob Morris, for the GBIFKOS White Paper authors.
kos, thesaurus, ontology, knowledge
Last updated 551 days ago by Bob Morris
The content of your report does not seem to match the scope specifed in GBIF's request for proposals.
First, you've included suggestions that do not follow from the report guidelines. If I understand the RFP correctly, it's asking you to address the problem of effective production and use of resources (vocabularies, ontologies, etc. - I am not sure what 'KOS' means so I will say 'resources') *in general* so that we get insight that can be applied across the board. You have a number of items that relate particular resources, and I think these are out of place. In particular I'm referring to the section 'GBIF participation in KOS standards development' items b, c, d e, f, g. While these are worthy efforts, they all face common problems. It is the common problems that should be the subject of the report and its recommendations. Forming a bunch of separate incubation groups is not necessarily going to help achieve the kind of economies hinted at in the call.
Second, GBIF has asked for recommendations on governance, multi-lingual vocabularies, persistent identifiers, and dealing with heterogeneity of "modeling" approaches, but you don't touch on any of these. These topics are difficult and important. There is much to say, and much to be figured out.
In fact many of the areas listed in the bullets under 'guidelines' in the call are not discussed at all in your report. For best service to GBIF and its community I would ask that you go over these and make sure that all topics are addressed, if only to say that more work is needed. Here's the location again:
Jonathan Rees 540 days ago
Could you provide a definition of KOS? I'm a bit confused as to whether, say, DwC and ABCD are KOSes, or for that matter RDFS, OWL, XML Schema, or SKOS. Before reading the GBIF call I thought only the latter set were KOSes, but now I wonder if only the former set is. Your report seems to apply it in both senses.
I'm not sure how a set of definitions or a thesaurus would qualify as "knowledge", since definitions aren't falsifiable. If you want simple vocabularies or fiat classifications to be KOSes I suggest you define KOS as a term of art.
Personally I would not call say KOSes are a "discipline", but I'm sort of ornery that way. A little evidence to back up this characterization would be helpful.
Definitions of as many other terms as possible would be nice, e.g. "information", "semantic", "concept map".
Best
Jonathan
Jonathan Rees 540 days ago
Saying that 'linked data' is "a somewhat ill-defined concept" seems out of place in this report (see previous comment). Besides, compared to many terms in this arena I think 'linked data' is exceptionally well-defined - the idea is laid out simply here: http:/
Best
Jonathan
Jonathan Rees 540 days ago
The bar graph in the 'KOS familiarity' section is intriguing but I found it hard to match the bars up with the text ("familiar or very familiar") and I found the key to be confusing. One or more what?
Best
Jonathan
Jonathan Rees 540 days ago
I thought that it might be useful to provide some initial comments on the GBIF KOS Report.
There are several issues but I will mention only a few in this email.
The first is "There appear to be no systematic attempts to develop use cases, competency questions, or other goals for use of KOS in the biodiversity informatics community."
What about these resources and efforts that have been going on for several years?
http:/
http:/
http:/
Note that this seems to be the only open SPARQL endpoint that is devoted to biodiversity informatics.
http:/
It is also the SPARQL endpoint for a number of the data sets that are mentioned.
It also has the only examples which use the "IETF scheme for URIs for geographic locations" mentioned in the report.
Also this: "there appears to be no semantically enabled discovery of these resources. Work across subdisciplines is hampered by this, as scientists haphazardly locate resources which may or may not be the most fit for their purpose. For example, a field biologist made aware of ITIS might never become aware of its relationship to the Catalog of Life."
This RDF snippet is from this record ( http:/
By querying one of the various LOD services a human or machine would find this interlinking.
<skos:closeMatch rdf:resource="urn:lsid:ubio.org:namebank:105509"/>
<skos:closeMatch rdf:resource="urn:lsid:catalogueoflife.org:taxon:24e7d624-60a7-102d-be47-00304854f810:ac2010"/>
<skos:closeMatch rdf:resource="http:/
<skos:closeMatch rdf:resource="http:/
<rdfs:seeAlso rdf:resource="http:/
<skos:closeMatch rdf:resource="http:/
<rdfs:seeAlso rdf:resource="http:/
<skos:closeMatch rdf:resource="http:/
<skos:closeMatch rdf:resource="http:/
<skos:closeMatch rdf:resource="http:/
<rdfs:seeAlso rdf:resource="http:/
<geospecies:hasGBIF>13815711</geospecies:hasGBIF>
<geospecies:hasGBIFPage rdf:resource="http:/
<foaf:page rdf:resource="http:/
<geospecies:hasITIS>552479</geospecies:hasITIS>
<foaf:page rdf:resource="http:/
<geospecies:hasNCBI>9696</geospecies:hasNCBI>
<foaf:page rdf:resource="http:/
<geospecies:hasBioLib>id1995</geospecies:hasBioLib>
<geospecies:hasBioLibPage rdf:resource="http:/
<foaf:page rdf:resource="http:/
<geospecies:hasBBCPage rdf:resource="http:/
<foaf:page rdf:resource="http:/
<geospecies:hasGNI>505310</geospecies:hasGNI>
<geospecies:hasGNIPage rdf:resource="http:/
<geospecies:hasWikipediaArticle rdf:resource="http:/
<foaf:page rdf:resource="http:/
<geospecies:hasWikispeciesArticle rdf:resource="http:/
<foaf:page rdf:resource="http:/
<geospecies:hasToLPage rdf:resource="http:/
<foaf:page rdf:resource="http:/
The Bio2RDF data set is over 15 billion triples on it's own http:/
Pete DeVries 540 days ago
Being a member of W3C has costs and benefits, as does being a participant in HCLS. If you are going to recommend participation I think you should make a better case and acknowledge the downside. W3C is a financial commitment, and HCLS is a time commitment. Maybe you can point to accomplishments of HCLS relevant to the GBIF community that would argue that participation is a good idea, or W3C working groups whose recommendations are relevant.
Best
Jonathan
Jonathan Rees 540 days ago
You touch on the issue of ease of KOS (resource) creation vs. ease of use. I think you should bring this distinction to the fore. As these affect different (but overlapping) communities it would be worth analyzing which community is experiencing the most pain, since different interventions will benefit the two communities differentially.
Related to this is the issue of feedback from resource users to developers. Some of the OBO ontologies have paid a lot of attention to this, with well articulated mechanism for term submission and review. A survey of practices in and out of the GBIF sphere would be helpful and I think is along the lines of what GBIF asked for.
And related to this (sorry for free associating) is the issue of resources with infrequent major revisions, vs. frequent and fine-grained (term level) update. We see both of these forms happening in practice, and it might be helpful to list the pros and cons of each. (In fact the best run projects seem to do both, with an ongoing frequently updated "current" together with periodic snapshots that are given specific version numbers.)
Best
Jonathan Rees
Jonathan Rees 540 days ago
The recommendation to invest in Bioportal does not seem to be well motivated. It is offered as a "resource repository and directory and as part of the life cycle management of ontologies". I am skeptical that Bioportal would handle the discovery example given ('general gaps' i.). Assuming that ontology life cycle tools that are easy for domain scientists to use (ii.) is a sensible goal - and I'm not sure that it is - you would need to explain how Bioportal addresses this. It doesn't match any of the other gaps listed. So what would it be for?
By way of explaining Bioportal's value proposition, seven features are listed. Most of these are not relevant to the GBIF community, much less related to identified needs (your gap analysis) or the report guidelines in the CFP. Mentioning benefits that are not related to recognized needs makes the entire recommendation suspect.
Perhaps setting up Bioportal, and populating it with known resources, would help with resource discovery. But this process would need to be spelled out in more detail for the case to be made. How much effort is it to set it up? How will it be populated and kept up to date? It may be free, but is that "free as in puppy"? How many user visits would be expected? Would community benefit outweight costs?
A report on exactly how MMI uses Bioportal, what investment they've made in it, and what benefit they get from it would be helpful.
The authors of the report seem to favor OWL and, by implication, linked data (http: URIs as vocabulary terms). Bioportal should therefore be examined for its support for OWL. For example, does it display logical axioms, or follow links?
To make the case better, you also need to enumerate what alternatives you considered as solutions to whatever problems you think Bioportal would solve - including doing nothing, as well as the ones listed in the CFP. We know that many current ontology efforts seem to be doing fine without Bioportal, and choose not to use it. What are they missing?
I have to admit I am probably prejudiced as my attempts to use Bioportal in my own work have never turned out well. I'm not saying "don't recommend Bioportal", but rather "make it more convincing and focussed".
Best
Jonathan
Jonathan Rees 540 days ago
"Its structure makes it less flexible for new applications than DwC" - could you explain this for those of us not familiar with ABCD? Do you mean the fact that it's a schema means it's hard to build on it and combine it with other things?
Jonathan Rees 540 days ago
"the notion of 'scientific observations' is gaining traction as a useful data modeling abstraction" -- could you support this claim (e.g. with a citation)? Thanks
Jonathan Rees 540 days ago
EOL is certainly interested in working with TDWG and GBIF and ViBRANT and others on vocabulary development and management, particularly with respect to SPM and how it interfaces with other kinds of data of interest to the biodiversity community. I finally had a chance to read through the draft white paper. A comprehensive review of the complex landscape is daunting, and I applaud the authoring group for its collective efforts. However, I admit that I found the organization (full of laundry lists) frustrating and some of the recommendations premature. My notes are available on the document which I will try to attach somewhere.
A more general issue is what audiences GBIF (and/or TDWG) would intend to serve: Knowledge systems can involve both developers as well as domain experts who build vocabularies, and also domain experts who use them (perhaps without knowing it) to achieve their analysis goals. I'm afraid these audience distinctions don't come clearly through in this document.
If there were some tabular way to summarize the feature sets and pros and cons of all the vocabularies, tools, and systems that would be a huge step forward.
Cyndy Parr 539 days ago
The following is posted with his permission, from email of Garry.Jolley-Rogers, CSIRO
Overall comments.
Useful Review. A really good survey and summary.
IMHO, whole of biodiversity domain KOS UNachievable. But piecewise KOS achievable.
Good review of the gaps. especially Lifecycle, validation, and need for citations/ground truthing.
Comments
1. Biodiversity KOS is not as simple as the data triangle analogy would imply. Indeed Knowledge in one Biodiversity domain may be data in another context.
The survey
2. It would help interpretation and analysis to know more about the cohort who answered the survey.
Who? How were they recruited? Representative of? Information scientists, managers, biologists? without context and any details of the survey cohort, I could not contextualize figures, numbers, or make meaningful comparisons.
Impediments to adoption
3. You cannot underplay the limitations arising from insufficient funding and technical support.
** A whole new level of bureaucracy, QC, and data entry are necessary to make this enterprise work. Biodiversity is in no way resourced to do it.
4. Tools are immature / still in development
** This is still the field of the latest new method. Methodology and form remains immature and subject to change. Not yet a firm basis on which to build. Biologists have other more pressing domains to master (e.g. Molecular methods and analysis) and so put their efforts elsewhere. This leaves the domain to "second tier and non biological experts" (myself included in this context).
Identification of Needs
5. Agee. The problem posed by the proliferation of vocabularies/ontologies is profound.
** Factual errors are problem but even moreso is the problems that arise when a vocabulary ontology is used outside of its original context (domain, basis, ). We all do it. But how do we not mangle meaning?
6. change/variability in taxon concepts or phylogenetic trees should not prove a problem in they are applied appropriately... but it is sometimes difficult to do this. and often the necessary information for correct application is omitted.
Current Status OF KOS
7. Do the multiplicity of renderings for knowledge pose problems in transition / translation?
8. Our (TRIN's) critique of SPM is overstated. Our systematic survey of past and current practices demonstrates that Taxonomic knowledge "categorisations" do not generalize across taxon disciplines and applications (works). Not only do vocabularies vary with taxa but also the relevant knowledge/facts. However, We think SPM and other such "taxon profiles" can be unified by a simple crosswalking tool (and a lot of grunt work). This is where we are working now and then (collaborate) with Cyndy Parr.
Bob Morris 539 days ago
749 users
GBIF Website
GBIF Data Portal

Bob Morris
Profile
Friends
Friends of
Pages
Files
Photo Albums
Blog
Thank your for the report - very interesting.
I notice, however, that under the Heading "Current State of Biodiversity KOS" that you haven't mentioned the considerable wok being done by the FAO - especially with respect to their AGROVOC Thesaurus (see for example: http:/ / aims.fao.org/ website/ Ontology-relationships/ sub) and their KOS Registry (see http:/ / aims.fao.org/ en/ website/ KOS-Registry/ sub). You do mention the Bioversity International work of Crop Wild Relatives, and much of this was done in conjunction with the FAO as part of a UNEP-GEF project.
Also in the Tool Development area, you may wish to look at the AGROVOC Concept Server Workbench (http:/ / aims.fao.org/ website/ AGROVOC-Workbench/ sub2) which they cite as a "Tool that shall help to build and structure multilingual ontologies and terminology systems in a distributed and collaborative environment."
Also in the Linking area, you may wish to look at the NEON project (http:/ / aims.fao.org/ website/ NeON/ sub2) €14.7 million project. To quote: "The aim of NeOn is to advance the state of the art in using ontologies for large-scale semantic applications in distributed organizations; and to create the first ever service-oriented, open infrastructure, and associated methodology, to support the development life-cycle of this new generation of semantic applications with economically viable solutions."
I am not sure how any these may fit into the GBIF projects, but they should at least be cited.
Hope this helps
Arthur D. Chapman
Bob Morris 550 days ago