This group supports the activities addressed to the Nodes associated to the 22nd meeting of the GBIF Governing Board in Oct 2015

Share |

Pages home > Work Programme Groups in GB22 Global Nodes Meeting > Work plan group - BLUE - Improving the quality of GBIF mediated data

Work plan group - BLUE - Improving the quality of GBIF mediated data

Improving the quality of GBIF mediated data



1. Description of the NC work plan stream

This stream focuses on fitness-for-use, improved metadata and data completeness. Nodes have various experiences on this subject. Sharing tools and mechanisms as well as best practices on one side and extending the data trends of the global portal on the other side will eventually be of benefit to both data publishers and data users.
GBIF has put up jointly with TDWG an interest group that aims to advance in the discussions about quality of biodiversity data, how to improve that quality through data cleaning techniques and how to evaluate and to increase the data fitness-for-use.

2. Suggested outcomes of the NC work plan stream

Please improve the possible outcomes after a two years implementation of the Nodes Committee work plan:

  • Improved information resources on best pratices on Data Quality

3. Suggested activities for the NC work plan stream

Please expand the list of possible activities for the two year Nodes Committee work plan, for example:

  • Use-cases of Quality Assurance
  • Improvements of datasets (quality/completeness issues) reporting on GBIF.org
  • e-algorithms or work flows (for example BioVEL and ALA projects)
  • Agreement on how to collaboratively update the inventory of data tools

4. Actions suggested by regional work plans or the NSG

The following list of activities could be used as a resources to draw activities from for the stream on promoting best practices in Data Quality for Nodes activities of the Nodes Committee work plan.

  • Europe
    • Checks Algorithms sharing
    • Documents pre-publishing checks
    • Expose post-publishing checks
    • Make internal Data Gap Analysis
  • Latin America
    • Regional DQ workshops
    • Support DQ Tools development
    • Beta testing of GBIF DQ tools
    • Publish DQ estimated index of resources
  • NSG
    • Data cleaning tools, follow up on tools inventory
    • Training on data quality 
    • Determining recommended data quality checks at Node level
    • Data completeness (filling the gaps)/ Data Enrichment
    • Promoting data quality tools

5. Information resources already available in GBIF



6. Instructions for the NC work plan group at GNM13

The five work plan groups of streams 4-8 will work with the initial content on the topic of each stream at the community site and will consolidate and expand this content per stream. The results of the work plan groups will form the basis of the NC work plan descriptions for the relevant streams.

The description of the Nodes Committee work plan 2016-2017 can be found here.

Last updated 872 days ago by Andre Heughebaert

WP Group Blueimage


Improving the quality of GBIF mediated data



Facilitated by: André Heughebaert (Belgium)



Dag Terje Filip Endresen (Norway)

Faustin Gashakamba (ARCOS)

Innocent Akampurira (Uganda)

Moulaye Mohamed Baba Ainina (Mauritania)

Razfimpahanana Andriamandimbisoa (Madagascar)

Sophie Pamerlon (France)

Wouter Addink (Species 2000 & Naturalis)


WP activities

Cleaning automatically has issues

·      Promote tools to use for data cleaning by data owners (fix issues at the source)

·      Integrate some tools into the GBIF portal indexing routines for identification of data quality improvement potentials

·      Data improvement to be decided by the data publisher/owner

·      Tools (inside the GBIF portal and elsewhere) can assist the data owners in improving data quality

·      Documentation of the data cleaning processes, capture data provenance

·      Responsibility of the node to report back to the data publisher


Data cleaning guidelines

·      Having documentations on how to clean datasets to be captured somewhere and be improved by more advanced nodes.

·      Possibilities of making an update of the Arthur Chapman data quality cleaning guidelines.


GBIF data portal can show data quality improvement potentials

·      Some now on taxonomy, time, localities – for all data mediated by the …

·      Indicators of data completeness inside the GBIF data portal

  • Indicators at Dataset, Data publisher and Node level

·      Color codes or highlighting with star system for showing data quality

·      More visibility of data quality/completeness in the data portal

·      Highlight longitude latitude reversed

·      Data quality indicators to be integrated in the GBIF portal.

·      Feedback to the data provider that the data has been cleaned by the GBIF indexing routines…?

·      Partnerships with other portals, e.g. Openrefine



Data cleaning data standards – TDWG standardization

·      Prioritize Darwin Core terms for addressing data cleaning

·      Guidelines for how to map specific types of datasets

·      Let practices developed and established

·      Highlight the most important Darwin Core terms to address in data cleaning – and to measure completeness of reported data values

·    Update the IPT with color-coded mandatory/strongly recommended DwC terms



One year actions


Nodes self-assessment

·      Map the capabilities of nodes to offer data cleaning services for respective data publishers/owners

·      Ask/survey nodes to make a self-assessment of capabilities


Data use cases for the data that is included in the GBIF network

·      Describe data cleaning approaches

·      Approaches for how to make identification of data gaps


Dataset metadata completeness and update routines

·      Routines for how to follow up with data publishers on metadata up-to-date

·      Refine the mandatory (and highly recommended) metadata for each data type (occurrence, taxon, sample-based datasets)

·      Guidelines and best practices for how to work with data publishers to capture and maintain dataset metadata


Training activities/materials on data quality

·      Based on the Arthur Chapman guide – to update with more recent tools

·      Develop data cleaning guidelines within one year – update of the Arthur Chapman document





·    Focus on data quality during BID/BIFA related trainings


Dag Endresen 869 days ago