Hello,
Context:
We are currently developing an Early Alert system for invasive species where all data transits through GBIF:
- one one side, we are helping multiple data providers to push their observations to GBIF at a high frequency (probably daily) - on the other side, we have a web application https://github.com/riparias/early-warning-webapp that allows users to be alerted of specific species in specific locations. To do so, that web applicatio*n refreshes its internal database daily, by dropping and recreating all occurrences based on an automated GBIF data download*. Data in the gbifId field is used as a long-term occurrence identifier in this web application.
It is crucial for our system that we can unambiguously distinguish new occurrences from existing occurrences that were recently republished (and maybe updated), and also detect occurrences that have been updated in the source dataset.
Finally, the question:
in the GBIF infrastructure, when exactly is a new occurrence Gbif ID issued (vs reused). Is it just a matter of following the publisher's occurrenceID, or is the algorithm that chooses Gbif IDs more complex?
Or, more practically: what should I say to our data providers to guarantee all new occurrences they publish get a new gbifId and existing occurrences that get republished keep the same gbifId?
I'd be also interested to know if you think the concept of using GBIF ids as stable identifiers in an external tool is fundamentally flawed (but I hope not!)
Thanks a lot, don't hesitate to redirect me if this is not the right place to ask this question.
Nicolas
My apologies if this quest