API for batch calling GBIF id -> checklist ID
Hi
I have a large number of GBIF ids (~ 2 million), for which I wish to know any existing International Plant Names Index IDs (IPNI s). What’s the easiest way to do this? I can use the standard API, e.g.
http://api.gbif.org/v1/species/2705290/related
for 2705290 = Avena sativa, which gives me an output that contains several IPNIs (in this case, 164949-3, 391732-1, 27200-2, of which the number 391732-1 seems to be the canonical one, although I can’t figure out how to identify it as such from the GBIF API output)
It will obviously be tedious (and unwarranted) to make 2 million API calls. What’s the recommended way to do this?
Yan
Hi Yan,
the way you describe is the current way to go. IPNI unfortunately has 2 problems you should be aware of. The dataset in GBIF is a somewhat outdated and has not been updated since 2009. We are working with Kew to publish a new version for a long time, but things progress very slowly.
The other issue is that IPNI really is 3 datasets combined all of which can cover the same name. You should therefore be prepared to find multiple ids for the same name - even on their current website. See http://www.ipni.org/about_the_index.html
For example Avena sativa gives 3 hits from IK, GCI & APNI: http://www.ipni.org/ipni/simplePlantNameSearch.do?find_wholeName=Avena+sativ...
In the future we would like to add a scientificNameID property to the GBIF backbone taxa and populate that with IPNI, IF or ZooBank ids. But that is not available so far.
Best, Markus
On 22 Feb 2017, at 12:03, Yan Wong <yan@pixie.org.ukmailto:yan@pixie.org.uk> wrote:
Hi
I have a large number of GBIF ids (~ 2 million), for which I wish to know any existing International Plant Names Index IDs (IPNI s). What’s the easiest way to do this? I can use the standard API, e.g.
http://api.gbif.org/v1/species/2705290/related
for 2705290 = Avena sativa, which gives me an output that contains several IPNIs (in this case, 164949-3, 391732-1, 27200-2, of which the number 391732-1 seems to be the canonical one, although I can’t figure out how to identify it as such from the GBIF API output)
It will obviously be tedious (and unwarranted) to make 2 million API calls. What’s the recommended way to do this?
Yan _______________________________________________ API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
Hi,
Adding to what Markus says:
Many (perhaps most, or even almost all) duplicate IPNI records have been linked in IPNI, so 391732-1 is indeed the correct one. This work is more recent than the 2009 export to GBIF, so GBIF still has the duplicates. The IPNI website displays the links, but unfortunately doesn't show it in the RDF format.
You can run 2 million queries against our API, this should not be a problem. When we are crawling large datasets, we average around 500 requests/second to the species API. Try using several parallel connections, if it's too slow otherwise.
Cheers,
Matt
On 22/02/17 12:14, Markus Döring wrote:
Hi Yan,
the way you describe is the current way to go. IPNI unfortunately has 2 problems you should be aware of. The dataset in GBIF is a somewhat outdated and has not been updated since 2009. We are working with Kew to publish a new version for a long time, but things progress very slowly.
The other issue is that IPNI really is 3 datasets combined all of which can cover the same name. You should therefore be prepared to find multiple ids for the same name - even on their current website. See http://www.ipni.org/about_the_index.html
For example Avena sativa gives 3 hits from IK, GCI & APNI: http://www.ipni.org/ipni/simplePlantNameSearch.do?find_wholeName=Avena+sativ...
In the future we would like to add a scientificNameID property to the GBIF backbone taxa and populate that with IPNI, IF or ZooBank ids. But that is not available so far.
Best, Markus
On 22 Feb 2017, at 12:03, Yan Wong <yan@pixie.org.uk mailto:yan@pixie.org.uk> wrote:
Hi
I have a large number of GBIF ids (~ 2 million), for which I wish to know any existing International Plant Names Index IDs (IPNI s). What’s the easiest way to do this? I can use the standard API, e.g.
http://api.gbif.org/v1/species/2705290/related
for 2705290 = Avena sativa, which gives me an output that contains several IPNIs (in this case, 164949-3, 391732-1, 27200-2, of which the number 391732-1 seems to be the canonical one, although I can’t figure out how to identify it as such from the GBIF API output)
It will obviously be tedious (and unwarranted) to make 2 million API calls. What’s the recommended way to do this?
Yan _______________________________________________ API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
On 22 Feb 2017, at 11:14, Markus Döring mdoering@gbif.org wrote:
the way you describe is the current way to go.
Thanks a lot
We are working with Kew to publish a new version for a long time, but things progress very slowly.
OK - looking forward to this.
The other issue is that IPNI really is 3 datasets combined all of which can cover the same name. You should therefore be prepared to find multiple ids for the same name - even on their current website. See http://www.ipni.org/about_the_index.html
I’m assuming that is the purpose of the -1, -2 and -3 suffixes on their ID numbers - to specify which data set the name comes from.
In the future we would like to add a scientificNameID property to the GBIF backbone taxa and populate that with IPNI, IF or ZooBank ids. But that is not available so far.
Is there an ETA for this? 1 year? 2 years? longer?!
Cheers
Yan
participants (3)
-
Markus Döring
-
Matthew Blissett
-
Yan Wong