This group supports the activities addressed to the Nodes associated to the 22nd meeting of the GBIF Governing Board in Oct 2015

Share |
Group discussion > PRE-COURSE ACTIVITY: Experiences with sample data

PRE-COURSE ACTIVITY: Experiences with sample data

Mélianie Raymond
907 days ago

Before the training in Madagascar we would like to start a discussion with Nodes and their teams on experiences with sample-based data:

What are your previous experiences with sample-data? Have you had contacts with communites that hold or produce this type of data? Have you received requests from data users for this type of information? What are your expectations for this new data publishing opportunity?

Cees Hof
903 days ago

Hi all,

Especially the / our research community and the (national) monitoring organisations are very pleased with this development. For many years we have heard that the "occurrence" data format of GBIF does not allow to include more specific information on the whereabouts of organisms in nature and that providers find it a pity that all kind of additional (sample) data they have, can not be included in GBIF or only in a rather make-shift way.

We are now planning to extend some of our datasets together with the providers and turn them into sample-based datasets. Pilots will be focussing on:

  • Vegetation data (to include sample methodology, plant coverage and plant community information)
  • Systematic benthic sampling of littoral (marine) systems
  • Standardised monitoring of aquatic biotic and a-biotic characteristic of freshwater systems
  • (Maybe) some more complex bird breeding / territory data


903 days ago

Thank you, Cees, for your comment! Nice to hear that there is indeed interest in this new functionality! In fact the examples you mentioned are the cases we are planning to talk about!

By the way, let me introduce myself - I'm Larissa Smirnova and I'll be giving an introduction to sample-based data publishing. I'm looking forward to seeing you all and discussing this matter!

Andre Heughebaert
886 days ago

Hi all,

Scientific expeditions produce a lot of 'samples' that does not necessary link to species occurrences.

Two recent examples I have in mind : 2010 Congo River expedition and Antarctic on board observations. 

In the first example, the multi-disciplinary expedition did not only involved botanists and zoologists but also meteorologists, eco-toxicologists, ethnologists, linguists, bio-chemists... All these scientists gather plenty of information that are only indirectly related to specimens captures or observations. So far, it was not possible to map those to DwC occurrence data.

The second example is about Birds and Marine mammals observations done while traveling to and back from Antarctica. At regular intervals, environmental measurements(boat position, speed, salinity, T°...) are recorded. During the same boat trip, birds and marine mammals are observed by scientists. Publishing this environmental data as events data, with related human observations is now possible. Several observations will share to the same event ( and same environmental data).

I'm very exited by the new possibilties offered by the sample based data.


Hanna Koivula
886 days ago

I'm very exited about the possibility of publishing multiple observations under one event. We are investigating this data model ffor monitoring invasive alien species. There data is colleted from a site where conditions are recorded along with presence/absence observations of species. Interested in any use-cases about this topic!



886 days ago

Thank you, André for these nice examples!

The first one (the Congo expedition)  is particularly interesting because if it works it will affect the very exciting field of historical expeditions and diaries of famous travelers. Many of them are already digitized  (see The Field Book Project) and if published in GBIF and combined for example with itinerary visualization techniques (eg. CartoDB) and illustrated with images it should give very nice result and attract wider user audience to biodiversity problematic!

The second example looks to be suitable too. Remember that sample-based data should preferably have a fixed location, be repeatable and follow certain protocol.

I hope that trying out publishing different use cases will enrich our understanding of how it works, what are restrictions, issues, problems and benefits and will lead to a fruitful discussion!

Anne-Sophie Archambeau
882 days ago

Hi all,

yes, it is a big step and it was really expected by our research community. At national level, we already work with the Fondation for Research on Biodiversity which includes a network of biodiversity research obervatories. We organised a training 2 weeks ago on data publication and we talk about this new type of data. The participants were really interested in.

Sophie Pamerlon is currently working on the connexion of new data sets, she will explain.

see you


Sophie Pamerlon
882 days ago

Hi all!

Indeed, we had a lot of positive feedback for more help on sample-based data publishing during our training event last week. I already have some contacts in several institutions that have or are ready to send me sample-based data for testing the new IPT extension: the data are pretty diverse, from vegetation plots to ornithology listening stations and coral-reefs abundance samplings. I will be able to test at least one of them (the ornithology dataset) during the nodes training, and I already have some questions about the mapping to ask Larissa :)

Depending on the nature of the dataset, I think a few more DwC terms about abundance and other types of information (details on the weight and size of samples for example) will be needed, as well as technical documentation for some of our data publishers who are already familiar with the IPT and know how to use it.

I really look forward to learn new things abour sample-based data publishing during the GB22 so I can put them in immediate use for helping our data publishers!

Best regards and see you soon in Madagascar,


881 days ago

Hi Sophie! Happy to see you again! You can always ask me, not sure I'll find the answer:)!

But I'm pretty sure that all together we will find a solution! There are indeed several new DwC terms, for example  "sampleSize"/"sampleSizeUnit" and "organismQuantity"/"organismQuantityType". We will speak about it during the training, but you already can find more information on our group page.

All the best,


875 days ago

This new facility is really interesting for us (ARCOS). We conduct for instance freshwater biodiversity surveys under our monitoring programme and so far we couldn't publish data from these expeditions. We record data on water physical and chemical properties and conduct inventory of aquatic biodiveristy found at these sites. Being able to publish such kind of data will be a new added value to the work being carried by many stakeholders out there.

Cees Hof
875 days ago

 Maybe at the training we should also see if we can discuss some of the more general issues about sample based data, like how to avoid that everybody in the (GBIF) network is going to use these new possibilities in a slightly different way... ending up with datasets that you can not compare properly. I know this is a challenge but we should prepare for this.

Cees Hof
875 days ago

What I find also a challenge for sample based data is data from fisheries...  Fishing techniques, net types, mesh sizes, duration, currents, salinity, tides, upstream vs. downstream, length classes, weight classes.... we find it difficult to get these fisheries data into GBIF in a sensible way. It is also almost impossible to not discuss common vocabularies in this perspective. And what would be the ideal set-up of sample based fisheries data in GBIF when considering common practices provided in, for example, the EU Water Framework Directive?

Gautam Talukdar
871 days ago

In our institute we record data from point counts, quadrats as well as transects. I hope to get these kinds of data published through the sample based framework.

Dairo Escobar
869 days ago

Hi all!

At SiB Colombia we are standardizing a resource of vegetation plots that are going to be monitored for several years, the data includes both measurements of the event and the organisms. So, we are currently discussing how to address this kind of data through the occurrenceCore and the new eventCore.

We propose to create two related resources, the first describing the biotic and abiotic data of each plot (eventCore + meassurementOrFacts + relevé), and the second describing the organisms and it’s functional traits (occurrenceCore + meassurementOrFacts). The resources will be related through eventID and parentEventID.


We would like to know in the trainnig what do you think about this approach and if anyone is dealing with the same kind of data and how did you address this issue, or if we also will need to relate the resources through the resourceRelationship extension?