Hi Quentin
Many thanks for the results and work on this. I agree with your suggestion about simply pointing out that publication is not fast, and perhaps suggesting that this is something that gets tracked?
Best
Melodie
PS. to all - there is a folder on the G drive with Melbourne logistics in case you need them soon/want to check in the mean time. Just waiting for Dmitry to confirm a few things and we will send you a single e-mail with links to all the info.


On 26 September 2016 at 18:05, Quentin Groom <quentin.groom@plantentuinmeise.be> wrote:
Hi Tim,
thanks that's all useful information. You confirmed what I suspected regarding the dates of publication.

As I was working with the data I noticed several problems and biases that confuse the results. A subset of 100 species is really too small to start breaking down into different subsets. As you noted the over representation of birds

Perhaps, the only take home message we should make at the moment is that publishing of these data is not fast and that it would need to be improved if GBIF were used for early warning.

I would not make any conclusions about the improvement in speed of publication. This is an small and biased subset.
Regards
Quentin



Dr. Quentin Groom
(Botany and Information Technology)

Botanic Garden Meise
Domein van Bouchout
B-1860 Meise
Belgium


Landline; +32 (0) 226 009 20 ext. 364
FAX:      +32 (0) 226 009 45

Skype name: qgroom
Website:    www.botanicgarden.be


On 26 September 2016 at 09:34, Tim Robertson <trobertson@gbif.org> wrote:
Hi Quentin,
[I am not sure this message will be delivered to the TG as it is a closed group – please consider forwarding it on if you think it is worthwhile]

It’s great to see these charts and a little saddening at the same time that things aren’t speeding up.

I’d just like to make sure you are aware of a possible issue in your analysis.  You may already have safeguarded against this but I raise it just in case.  It might be insignificant since you are dealing with a small group of species but under certain circumstances could introduce a large inaccuracy and should be verified.

At GBIF.org we repeatedly index datasets.  You are probably aware that data management practice varies wildly across institutions, and it is a common thing that people delete a dataset and republish it e.g. 1. moving from DiGIR to IPT without following recommended guidelines or 2. changing record identifiers for the same original record - all UK NBN data have done this over the years.  When this happens, the original individual records are deleted, and new ones created receiving a new created_date but the event date will remain the same.  The result is you'd get a larger lag than was the reality.

The only feasibly way I can see to safeguard against this would be to analyse the  the gbif ID over the snapshots and identify cases where records are then deleted and recreated (perhaps considering combination of location,species,date?).  I have raised this with groups doing this kind of study before but to my knowledge it’s been ignored.

Just for ideas:
It might be worthwhile considering individual datasets and the publishing protocol.  I strongly suspect that a very few datasets could be recognised as key conduits for this kind of data – e.g. iNaturalist and Artdatabankem (Sweden) would presumably be less than 1 week and I suspect IPTs might be faster - similar to observation data versus specimen.  The publishing country might be interesting too and whether there are few who are clearly more active than others.  That would be an interesting metric on countries in general of course. Then there is eBird, which is a huge dataset and will skew results for birds heavily because it is an annual publication with an inherent lag of around 0.5 – 1.5 years.

I know all too well that as soon as you’ve shared some metrics, you immediately get a ton more work to do – I hope this is useful information though.

Many thanks,
Tim


From: Quentin Groom <quentin.groom@plantentuinmeise.be>
Date: Sunday 25 September 2016 at 13:53
To: Task Group on Data Fitness for Use in Research on Invasive Alien Species <dffu_ias@lists.gbif.org>, Tim Robertson <trobertson@gbif.org>, Jan Legind <jlegind@gbif.org>
Subject: Re: time from observation to publication on GBIF

One further graph...

These observations are separated into the main classes. Clearly birds are by far the most common observation, note the logarithmic scale. The data hints that the publication rate for mammals is slower, but it is not clear.
Regards
Quentin

Inline images 1





Dr. Quentin Groom
(Botany and Information Technology)

Botanic Garden Meise
Domein van Bouchout
B-1860 Meise
Belgium


Landline; +32 (0) 226 009 20 ext. 364
FAX:      +32 (0) 226 009 45

Skype name: qgroom
Website:    www.botanicgarden.be


On 25 September 2016 at 13:19, Quentin Groom <quentin.groom@plantentuinmeise.be> wrote:
I've had a look at the GBIF data for the 100 Worst Aliens. Just to remind you, the intention is to see how long it takes for the records of invasive species take to reach GBIF. 

This first graph is a histogram showing the days between observation (eventdate) and publication date (date_created) for each publication year. What is notable is that there is no indication that the time to publication is reducing with time. The GBIF analytics suggest that data are getting to GBIF faster, but this is not clear from this subset (http://www.gbif.org/analytics/global).

Inline images 4

Grouping all the observations together this histogram summarises the rate of publication. Clearly, most observations take longer than a year to be published.
Inline images 2

This is the same as above except for specimens. The numbers of records are a lot less and I assume they take longer to get to GBIF as a result of the the current digitisation of herbaria and museums.
Inline images 3

Regards
Quentin


Dr. Quentin Groom
(Botany and Information Technology)

Botanic Garden Meise
Domein van Bouchout
B-1860 Meise
Belgium


Landline; +32 (0) 226 009 20 ext. 364
FAX:      +32 (0) 226 009 45

Skype name: qgroom
Website:    www.botanicgarden.be




_______________________________________________
DFFU_IAS mailing list
DFFU_IAS@lists.gbif.org
http://lists.gbif.org/mailman/listinfo/dffu_ias




--
 
Melodie A. McGeoch
 
Associate Professor, School of Biological Sciences, Monash University, Clayton Campus (Bld 18, Innovation Walk, Rm 121), Melbourne, Victoria 3800, Australia. Tel: +61 3 99020464, Mobile: +61 (0) 499 954180, E-mail: melodie.mcgeoch@monash.edu, ISI Researcher ID F-8353-2011, Research Group website: http://melodiemcgeoch.com/