This interest group gathers people interested in the topic of persistent identifiers applied to the world of biodiversity informatics.

Share |

PID Course: draft agenda

Tuesday 7 February 2012

09:00 - Preparatory meeting for trainers

Trainers will meet to familiarize with the venue, and discuss and fine-tune the contents.

18:00 - Ice-breaker for all participants (Royal Botanical Garden - CSIC)

20:00 - End of the ice-breaker.

Wednesday 8 February 2012: Introduction and basic concepts

08:30 - Registration

09:00 - Welcome, practicalities, introductions (Alberto, host)
Background information and rationale for the course. Introductions by speakers AND participants (very important to understand who attends the course, their experience and expectations) . Review of the programme. Practical information regarding coffee and lunch breaks, schedule, internet access, resources, etc.

10:30 - Coffee/tea break

11:00 - Information architecture and the benefits of persistent identifiers (Greg)
Basic concepts will be reviewed with the group of participants to ensure that everyone has the same understanding. We will start with concepts of information management, information architecture and information workflows. At the end we will review some concepts used in databases and data management that will be used through the course. Includes 20-30 min. discussion.

12:30 - Lunch break

13:30 - Introduction to persistent identifiers (Nicky)

Definition of PID. Which objects can have an identifier assigned. Proliferation of Identifiers. What do the persistent identifiers allow us to do. The challenge of REAL persistence. Social contract implications of entering the Persistent Identifiers world. Present, real examples of robust systems using persistent identifiers. Includes 20 min. discussion, where participants will identify which kind of objects they work with.

14:30 - Coffee/tea break

15:00 - Types of persistent identifiers (Kevin)
Characteristics of an ideal identifier. Commonly used Persistent Identifiers: URI, PURL, DOI, LSID. Dealing with events that impact an existing Persistent Identifier scheme. Includes 30 min. discussion.

17:00 - Close of the session.

Thursday 9 February 2012: Implementation of Persistent Identifiers

09:00 - Implementing and managing Persistent Identifiers
Choosing a Persistent Identifier scheme. PIDs in local databases. Classes of data for PID assignment. When to change the PID if the data changes. Versioning of PID.  Deprecation of data records and the effect on the PID. Includes 30 min. discussion.

10:30 - Coffee/tea break

11:00 - Publishing Persistent Identifiers and their data (Nicky)

Recommendations. Vocabularies. Reuse of PIDs. Scope of data. Transferring datasets. Linked data. Web services.  Granularity of data - e.g. does a PID refer to a very specific, fine grained record, or to the "group" of data associated with that record (affects the content of the resolution of the PID). Includes 20 min. discussion.

12:30 - Lunch break

13:30 - Vocabularies and their expression in RDF (Kevin)
Recommended vocabularies. Introduction to RDF (syntax, semantics). Tools. Includes 20 min. discussion.

14:30 - Coffee/tea break

15:00 - Resolution and web services (Kevin)
HTTP redirects; SOAP; WSDL. LSID resolution. Includes 30 min. discussion.

17:00 - Close of the session.

Friday 10 February 2012: Practical sessions

09:00 - Database concepts (optional)
Those who need to review some of the database-related concepts will have the opportunity to see them in practice.

10:30 - Coffee/tea break

11:00 - Assigning (and maintaining!) persistent identifiers to your data (Greg could do it instead of the 9:00 session)
Different ways
of assigning Persistent Identifiers will be discussed and applied. The most common problems will be identified and common solutions proposed. Software developers will have the opportunity to present proposals of how they plan to modify their software.

12:30 - Lunch break

13:30 - How to set up resolution mechanisms for your Persistent Identifiers (Kevin)
The main aspects of setting up a resolution service for both URIs and LSIDS will be demonstrated and the participants will have the opportunity to set up their own basic resolution services (URL resolver).

14:30 - Coffee/tea break

15:00 - Summary and evaluation of the course (Alberto)

15:30 - Free practice session
The participants will have the opportunity to practice with their own systems and data what they learnt during the course, and they will be given exercises to reinforce the most important concepts.

17:00 - End of the course.


Last updated 1066 days ago by Nicky Nicolson

Often asked questions:
- what does a PID refer to (the DB record, the physical object, etc)?
- what is the difference between data and metadata?
- when do we need to change the version of a PID?
- does a PID have to be resolvable/actionable?
- how do we know which PID to reuse?  Or when we need to create our own PID?
- what namespace should we use for our PIDs?


Kevin Richards 1124 days ago

A discussion in a related group: http://community.gbif.org/pg/forum/topic/18971/real-world-examples-of-lsids/

Also and email conversation that is related (I think LSIDs and their resolution, etc may be an important thing to cover in the training course - to at least let people get the idea of how much is involved with working with them):

LSID resolution service is required because be default most software applications do not know how to resolve, using standard HTTP, a URN in the format "urn:lsid:...".

So a service and a protocol has been defined for handling this resolution.  The owner of the LSIDs must provide this service and the caller must use the protocol to resolve the (meta-)data associated with an LSID.  This is a little limiting for applications like a standard web browser, as they do not have this built in.  For this reason we have decided, in TDWG, to recommend everyone implements the standard http resolution method for LSIDs, so that the following will work:

convert to -> http://example.org/authority/?lsid= urn:lsid:example.org:name:123, which will return the LSID (meta-)data, and will also work by default in web browsers.

What the exact return is depends on the LSID you are resolving, but they return should be XML/RDF, and there are several recommended RDF ontologies and vocabularies that could be used, eg DwC RDF, http://rs.tdwg.org/dwc/terms/ and TDWG ontology http://wiki.tdwg.org/twiki/bin/view/TAG/LsidVocs, and others in development.

LSIDs are a URN specification (Uniform Resource Name).  They were defined more than a decade ago in attempt to standardise the way Identifiers are used in the Life Sciences communities.  They provide all the desired features of an Identifier/GUID, eg persistence, resolution, opacity...  I agree that in these modern times we would probably all just use Linked Data practices, restful URIs and standard HTTP, but this is also a fine approach and recommended by TDWG, GBIF.  Which one you choose depends on the circumstances really.

Examples of LSID services include the TDWG resolver service, http://lsid.tdwg.org, IPNI, e.g. http://lsid.ipni.org/authority/metadata/?lsid=urn:lsid:ipni.org:names:205294-1, Index Fungorum, e.g. http://lsid.indexfungorum.org/authority/metadata/?lsid=urn:lsid:indexfungorum.org:names:213645, Landcare Research, e.g. , ZooBank, e.g. http://zoobank.org/?lsid=urn:lsid:zoobank.org:act:8BDC0735-FEA4-4298-83FA-D04F67C3FBEC

Unfortunately a lot of the LSID services are currently broken.  We are working on getting them back up.

Kevin Richards 1112 days ago