Hi Tim,thanks that's all useful information. You confirmed what I suspected regarding the dates of publication.As I was working with the data I noticed several problems and biases that confuse the results. A subset of 100 species is really too small to start breaking down into different subsets. As you noted the over representation of birdsPerhaps, the only take home message we should make at the moment is that publishing of these data is not fast and that it would need to be improved if GBIF were used for early warning.I would not make any conclusions about the improvement in speed of publication. This is an small and biased subset.RegardsQuentinDr. Quentin Groom(Botany and Information Technology)Botanic Garden MeiseDomein van BouchoutB-1860 MeiseBelgiumLandline; +32 (0) 226 009 20 ext. 364FAX: +32 (0) 226 009 45Skype name: qgroomWebsite: www.botanicgarden.beOn 26 September 2016 at 09:34, Tim Robertson <trobertson@gbif.org> wrote:Hi Quentin,[I am not sure this message will be delivered to the TG as it is a closed group – please consider forwarding it on if you think it is worthwhile]
It’s great to see these charts and a little saddening at the same time that things aren’t speeding up.
I’d just like to make sure you are aware of a possible issue in your analysis. You may already have safeguarded against this but I raise it just in case. It might be insignificant since you are dealing with a small group of species but under certain circumstances could introduce a large inaccuracy and should be verified.
At GBIF.org we repeatedly index datasets. You are probably aware that data management practice varies wildly across institutions, and it is a common thing that people delete a dataset and republish it e.g. 1. moving from DiGIR to IPT without following recommended guidelines or 2. changing record identifiers for the same original record - all UK NBN data have done this over the years. When this happens, the original individual records are deleted, and new ones created receiving a new created_date but the event date will remain the same. The result is you'd get a larger lag than was the reality.
The only feasibly way I can see to safeguard against this would be to analyse the the gbif ID over the snapshots and identify cases where records are then deleted and recreated (perhaps considering combination of location,species,date?). I have raised this with groups doing this kind of study before but to my knowledge it’s been ignored.
Just for ideas:It might be worthwhile considering individual datasets and the publishing protocol. I strongly suspect that a very few datasets could be recognised as key conduits for this kind of data – e.g. iNaturalist and Artdatabankem (Sweden) would presumably be less than 1 week and I suspect IPTs might be faster - similar to observation data versus specimen. The publishing country might be interesting too and whether there are few who are clearly more active than others. That would be an interesting metric on countries in general of course. Then there is eBird, which is a huge dataset and will skew results for birds heavily because it is an annual publication with an inherent lag of around 0.5 – 1.5 years.
I know all too well that as soon as you’ve shared some metrics, you immediately get a ton more work to do – I hope this is useful information though.
Many thanks,Tim
From: Quentin Groom <quentin.groom@plantentuinmeise.be >
Date: Sunday 25 September 2016 at 13:53
To: Task Group on Data Fitness for Use in Research on Invasive Alien Species <dffu_ias@lists.gbif.org>, Tim Robertson <trobertson@gbif.org>, Jan Legind <jlegind@gbif.org>
Subject: Re: time from observation to publication on GBIF
One further graph...
These observations are separated into the main classes. Clearly birds are by far the most common observation, note the logarithmic scale. The data hints that the publication rate for mammals is slower, but it is not clear.RegardsQuentin
Dr. Quentin Groom(Botany and Information Technology)
Botanic Garden MeiseDomein van BouchoutB-1860 MeiseBelgium
Landline; +32 (0) 226 009 20 ext. 364FAX: +32 (0) 226 009 45
Skype name: qgroomWebsite: www.botanicgarden.be
On 25 September 2016 at 13:19, Quentin Groom <quentin.groom@plantentuinmeise.be > wrote:
I've had a look at the GBIF data for the 100 Worst Aliens. Just to remind you, the intention is to see how long it takes for the records of invasive species take to reach GBIF.
This first graph is a histogram showing the days between observation (eventdate) and publication date (date_created) for each publication year. What is notable is that there is no indication that the time to publication is reducing with time. The GBIF analytics suggest that data are getting to GBIF faster, but this is not clear from this subset (http://www.gbif.org/analytics/global ).
Grouping all the observations together this histogram summarises the rate of publication. Clearly, most observations take longer than a year to be published.
This is the same as above except for specimens. The numbers of records are a lot less and I assume they take longer to get to GBIF as a result of the the current digitisation of herbaria and museums.
RegardsQuentin
Dr. Quentin Groom(Botany and Information Technology)
Botanic Garden MeiseDomein van BouchoutB-1860 MeiseBelgium
Landline; +32 (0) 226 009 20 ext. 364FAX: +32 (0) 226 009 45
Skype name: qgroomWebsite: www.botanicgarden.be
_______________________________________________
DFFU_IAS mailing list
DFFU_IAS@lists.gbif.org
http://lists.gbif.org/mailman/listinfo/dffu_ias