This stream focuses on fitness-for-use, improved metadata and data completeness. Nodes have various experiences on this subject. Sharing tools and mechanisms as well as best practices on one side and extending the data trends of the global portal on the other side will eventually be of benefit to both data publishers and data users.
GBIF has put up jointly with TDWG an interest group that aims to advance in the discussions about quality of biodiversity data, how to improve that quality through data cleaning techniques and how to evaluate and to increase the data fitness-for-use.
Please improve the possible outcomes after a two years implementation of the Nodes Committee work plan:
Please expand the list of possible activities for the two year Nodes Committee work plan, for example:
The following list of activities could be used as a resources to draw activities from for the stream on promoting best practices in Data Quality for Nodes activities of the Nodes Committee work plan.
The five work plan groups of streams 4-8 will work with the initial content on the topic of each stream at the community site and will consolidate and expand this content per stream. The results of the work plan groups will form the basis of the NC work plan descriptions for the relevant streams.
The description of the Nodes Committee work plan 2016-2017 can be found here.
Last updated 570 days ago by Andre Heughebaert
Notes from the WP Blue discussion:
Dag Endresen 568 days ago
WP Group Blue
Facilitated by: André Heughebaert (Belgium)
Dag Terje Filip Endresen (Norway)
Faustin Gashakamba (ARCOS)
Innocent Akampurira (Uganda)
Moulaye Mohamed Baba Ainina (Mauritania)
Razfimpahanana Andriamandimbisoa (Madagascar)
Sophie Pamerlon (France)
Wouter Addink (Species 2000 & Naturalis)
Cleaning automatically has issues
· Promote tools to use for data cleaning by data owners (fix issues at the source)
· Integrate some tools into the GBIF portal indexing routines for identification of data quality improvement potentials
· Data improvement to be decided by the data publisher/owner
· Tools (inside the GBIF portal and elsewhere) can assist the data owners in improving data quality
· Documentation of the data cleaning processes, capture data provenance
· Responsibility of the node to report back to the data publisher
Data cleaning guidelines
· Having documentations on how to clean datasets to be captured somewhere and be improved by more advanced nodes.
· Possibilities of making an update of the Arthur Chapman data quality cleaning guidelines.
GBIF data portal can show data quality improvement potentials
· Some now on taxonomy, time, localities – for all data mediated by the …
· Indicators of data completeness inside the GBIF data portal
Indicators at Dataset, Data publisher and Node level
· Color codes or highlighting with star system for showing data quality
· More visibility of data quality/completeness in the data portal
· Highlight longitude latitude reversed
· Data quality indicators to be integrated in the GBIF portal.
· Feedback to the data provider that the data has been cleaned by the GBIF indexing routines…?
· Partnerships with other portals, e.g. Openrefine
Data cleaning data standards – TDWG standardization
· Prioritize Darwin Core terms for addressing data cleaning
· Guidelines for how to map specific types of datasets
· Let practices developed and established
· Highlight the most important Darwin Core terms to address in data cleaning – and to measure completeness of reported data values
· Update the IPT with color-coded mandatory/strongly recommended DwC terms
One year actions
· Map the capabilities of nodes to offer data cleaning services for respective data publishers/owners
· Ask/survey nodes to make a self-assessment of capabilities
Data use cases for the data that is included in the GBIF network
· Describe data cleaning approaches
· Approaches for how to make identification of data gaps
Dataset metadata completeness and update routines
· Routines for how to follow up with data publishers on metadata up-to-date
· Refine the mandatory (and highly recommended) metadata for each data type (occurrence, taxon, sample-based datasets)
· Guidelines and best practices for how to work with data publishers to capture and maintain dataset metadata
Training activities/materials on data quality
· Based on the Arthur Chapman guide – to update with more recent tools
· Develop data cleaning guidelines within one year – update of the Arthur Chapman document
· Focus on data quality during BID/BIFA related trainings
Dag Endresen 568 days ago