This group aims to be a platform for general discussion on issues related to biodiversity data publishing

Share |
Group discussion > Publishing statistics

Publishing statistics

Hannu Saarenmaa
1572 days ago

The new portal is nice, but I am missing one thing:  Data sharing statistics.  How much data is each GBIF Participant sharing?   This used to be on front page and it created peer pressure in positive way.

Would be nice to do this also along the lines of taxonomy. Who is leading in sharing, for example, bird, plant, and butterfly data?

Also statistics related to data "quality" would be nice.  Whose data is most complete and without issues?

Also statistics of data use.  Whose data has been downloaded most?

Some competition, please.

Andrea Hahn
1572 days ago

Thank you for introducing this topic, Hannu. Just to let everybody know, we linked it from our issue tracking / feature request system at http://dev.gbif.org/issues/browse/PF-1300, to make sure we keep the issue in mind and capture more details as they become available.

It would be very interesting for us to learn which of such comparative statistics people from the GBIF community would consider essential / important / nice to have / maybe risky, and in what kind of situations you would be most likely to want to have any specific one of them (or others?) available. Please let us know!

Cees Hof
1570 days ago

Hi Hannu, Andrea; indeed I mailed the same question to portal@gbif.....

The ranking is important, I can say from experience that a clear top-ten position helped us to gain a more stable financial position. Some competition amongst the highly ranked is good, keeps us sharp.

Getting the old statistics back would be good (ranking by country / organisation, including providers per country, including datasets per provider including the size of the datasets).

Adding even some more characteristics such as no. of IPTs, and other technical installations, would be nice.

On the other hand I can see the problem of always having the same top providers in the picture, some Nodes will never reach the top 10 or 20, as they simply don't have that amount of data available at all. We therefore would need some other statistics also:

Data density would be nice, no. of records per m2. but separating marine from terrestrial data obviously so that requires some additional data processing.

Relative growth, absolute growth, per month or other time unit.

Statistics by region would also level-out the large differences we have in the entire community.

We need to figure out what is technical possible, assuming we are not going to spend too much programming time on this issue.

I agree with Hannu on the more in depth statistic for datasets but that might be quite a challenge.

Visualizing (real time) data traffic.... maybe something for a supplementary project?

Greetings, Cees




1568 days ago

This exercise can be more difficult then would expected. Because, doing some competition (which I like) neesds some rules. And, when I look at the portal, I clearly see that there are no rules regarding "What is a dataset exactly". 

To become a top publisher, i could cut my "Florabank" dataset in the number of sampled grid squares and make every square a dataset... Have a look at the numer of published datasets by provider and you will see what I mean...

but, like Cees says, there might be other, easier metrics. 

But indeed, this stuff needs a little more thought :)




Hannu Saarenmaa
1559 days ago

Cees' point that it is always the same top-ten participants is good and warrants some thought. 

In number-of-records you can only score, if there is a big birdwatcher community behind you.  We need other metrics and filters as well.  Filtering by taxonomy might work.  Who is leading in butterflies?  Or by basisOfRecord: Only show specimens.

I once divided records/GNP.  Guess which participant was leading?  The little Iceland!  It is not only because of size only:  They will soon be done digitising all of their collections.

Hannu Saarenmaa
1551 days ago

To  continue on this theme, I just heard a remarks from a big natural history museum:  "Data from museums disappears into GBIF.  Museums should build their own portal to present only collection data."

I think that it would be a shame if GBIF can't do it for them.  Why not build views in portal, for the various communities?

Tim Robertson
1550 days ago

Hi Hannu.  It is certainly possible, and ALA have done this to good effect with:



These are both sites hosted and run by ALA and all point at the same backend which is http://ala.org.au/

This is something the GBIFS could consider with enough demand, or there is nothing stopping thematic communities doing the same, starting by using the GBIF API


The GBIF website is almost exclusively driven from the API, so it is quite feasible others could do similar scoped examples.  

Alan Williams
1550 days ago

You could use a framework similar to Scratchpads, so the data owners could host and specialize their own website(s) but there would be common capabilities to show GBIF-related information for the data owner.