Home
UsernamePassword
Interest group about data quality, cleaning and fitness-for-use. TDWG data quality interest group.

Share |

Pages home > Guideline for listing mechanisms for DQ Validation, Measurement and Improvement

Guideline for listing mechanisms for DQ Validation, Measurement and Improvement

 

Version: 0.1 (Draft)


If you wold like to share your ideas, codes, web service and/or other resources for checking, measuring or improve DQ, please enter them here: https://docs.google.com/spreadsheets/d/1D8tjfk2ZwubuN5leGAeH674Cz2aX5bdHd-nnOAmE9JM/edit#gid=2045540928   

OVERVIEW:

The goal of this spreadsheet is to list and briefly describe DQ Mechanisms which may be directly or indirectly reused by Biodiversity Informatics community.

Mechanism in this contexts is anything (algorthim, technique, tool, precedure, protocol, workflows) used to validate (check), measure or improve (error correction and prevention) DQ. 

In this scope, we organize mechanisms in three categories:

  • DQ Validation: Checks if a data resource is in compliance to a DQ Assertion (describes how a data resource must be presented to have quality).
  • DQ Measurement: Assign a quantitative or qualitative value to a DQ Dimension (measurable quality aspect) related to a data resource (dataset or record).
  • DQ Improvement: Prevent data errors or recommend data corrections.

In this context, the data resource is what the mechanism will actually validate, measure and improve.  It could be a single record or a dataset.

Data resource can not be confused with the information element. Information element is some valuable information in the data resource. It could be single or composed. For instance, the coordinates of species occurrence records can be defined as an information element composed by decimal latitude and longitude DwC terms, and basis of record can be defined as an information element related only with the basis of record term of DwC. 

Mechanisms will validate, measure and improve the DQ related to an information element of a data resource. For example: 

  1. a mechanism will validate/check if the taxonomic hierarchy (information element) of a species occurrence record (data resource) is filled until the family level or more specific (assertion);
  2. a mechanism will measure the completeness (dimension) of coordinates (information element) of a species occurrence dataset (data resource). 
  3. a mechanism will improve scientific name (information element) by recommending valid names from CoL database using a fuzzy matching algorithm (improvement) in records of a taxon checklist (data resource). 
The proposed spreadsheet will contain descriptions about how of mechanisms perform validations, measurments and improvements, like those listed above. That spreadsheet isn't complete, but considering making it quick and as simple as possible, it is enouth for this purpose

GUIDELINE:
The spreadsheet has three sheets: DQ Validation Mechanisms, DQ Measurement Mechanisms and DQ Improvement Mechanisms.
Some directions will be provided for filling each field of each sheet.
DQ Validation Mechanisms
  • Mechanism ID: Identifier of the mechanism, preferably, persistent identifier.
  • DQ Assertion (required): Specification of how data must be presented to have quality in the context of the institution. 
  • Asserion Description: Detailing the assertion.
  • Schema: Metadata schema that the mechanism will handle.
  • Terms: Terms from the defined metadata that the mechanism will handle. If the information element is single (basis of record), there will be only one term in this field, if the information element is composed (coordinates), there will be serveral terms.
  • Type: Type of data that mechanism will handle: single record or dataset.
  • Mechanism description (required): Description how mechanism works to validates the assertion. Use natural language, preferably, English.
  • Input: What the mechanism expect as input.
  • Output: What the mechanism will return.
  • Mechanism Implementation: Whenever possible, put the mechanism code in formal language. Even if the code is very complex and consists of several parts, try to simplify it.
  • Language: Formal language used of mechanism implementation. For instance, SQL, Java, Pseudocode or Natural Language.
  • Code Repository URL: URL of code or workflow repository, when exist, for it be reused.  
  • Documentation URL: URL of the documentation about the mechanism or how to use the code.
  • Available Mechanism URL: URL of an available web service or tool that implements the mechanism, ready to use.
  • Application Constraints: Description in what condition the mechanism can be applied. 
  • Reference: Bibliographic reference related to the mechanism.
  • Authorship: Names of mechanism's authors.
  • Where It Has Been Used: Tools or systems that has been used the mechanism.
  • Institution: Name of mechanism's institution.
DQ Measurement Mechanisms
  • Mechanism ID: Identifier of the mechanism, preferably, persistent identifier.
  • DQ Dimension (required): Defintion of the dimension tha will be measured. 
  • Dimension Description: Detailing the dimension.
  • Schema: Metadata schema that the mechanism will handle.
  • Terms: Terms from the defined metadata that the mechanism will handle. If the information element is single (basis of record), there will be only one term in this field, if the information element is composed (coordinates), there will be serveral terms.
  • Type: Type of data that mechanism will handle: single record or dataset.
  • Mechanism description (required): Description how mechanism works (validates). Use natural language, preferably, English.
  • Input: What the mechanism expect as input.
  • Output: What the mechanism will return.
  • Mechanism Implementation: Whenever possible, put the mechanism code in formal language. Even if the code is very complex and consists of several parts, try to simplify it.
  • Language: Formal language used of mechanism implementation. For instance, SQL, Java, Pseudocode or Natural Language.
  • Code Repository URL: URL of code or workflow repository, when exist, for it be reused.  
  • Documentation URL: URL of the documentation about the mechanism or how to use the code.
  • Available Mechanism URL: URL of an available web service or tool that implements the mechanism, ready to use.
  • Application Constraints: Description in what condition the mechanism can be applied. 
  • Reference: Bibliographic reference related to the mechanism.
  • Authorship: Names of mechanism's authors.
  • Where It Has Been Used: Tools or systems that has been used the mechanism.
  • Institution: Name of mechanism's institution.
DQ Improvement Mechanisms
  • Mechanism ID: Identifier of the mechanism, preferably, persistent identifier.
  • DQ Improvement (required): Defintion of the improvement deeds. 
  • Improvement Description: Detailing the improvement deeds.
  • Schema: Metadata schema that the mechanism will handle.
  • Terms: Terms from the defined metadata that the mechanism will handle. If the information element is single (basis of record), there will be only one term in this field, if the information element is composed (coordinates), there will be serveral terms.
  • Type: Type of data that mechanism will handle: single record or dataset.
  • Mechanism description (required): Description how mechanism works (validates). Use natural language, preferably, English.
  • Input: What the mechanism expect as input.
  • Output: What the mechanism will return.
  • Mechanism Implementation: Whenever possible, put the mechanism code in formal language. Even if the code is very complex and consists of several parts, try to simplify it.
  • Language: Formal language used of mechanism implementation. For instance, SQL, Java, Pseudocode or Natural Language.
  • Code Repository URL: URL of code or workflow repository, when exist, for it be reused.  
  • Documentation URL: URL of the documentation about the mechanism or how to use the code.
  • Available Mechanism URL: URL of an available web service or tool that implements the mechanism, ready to use.
  • Application Constraints: Description in what condition the mechanism can be applied. 
  • Reference: Bibliographic reference related to the mechanism.
  • Authorship: Names of mechanism's authors.
  • Where It Has Been Used: Tools or systems that has been used the mechanism.
  • Institution: Name of mechanism's institution.

Last updated 997 days ago by Allan Koch Veiga