Thanks Raffael,
There is a parameter "min" that defines how many sources must have the same name before the name gets included in the results. By default this is just one, so either iNat or COL. If you only want those where both identifiers exist you would POST to this:
POST https://api.checklistbank.org/nidx/export?datasetKey=3LR&datasetKey=1398...
On first glance I can lots of species names with 2 versions. One with authorship (COL) and one without (iNat). These are different entries in the names index, as the "canonical" one without authorship acts as a superset of all such names with any authorship.
For example these first rows in the result:
species Aa achalensis Schltr. 8H9J species Aa achalensis https://www.inaturalist.org/taxa/602054 species Aa argyrolepis Rchb.f. 8H9K species Aa argyrolepis https://www.inaturalist.org/taxa/829647
So this is expected behavior. To align also these names one would have to use the regular name matching instead.
To match all of iNats names to COL one would do this instead:
POST https://api.checklistbank.org/dataset/3LR/match/nameusage/job?sourceDatasetK...
In a similar manner one can also upload CSV files with names to be matched against any of the datasets in ChecklistBank.
Best, Markus
PS: here are the results of above's calls
NIDX EXPORT, MIN=1: https://download.checklistbank.org/job/34/34a68b4c-3a6a-41fc-a681-654dfcdbf2... [81.5 MB] NIDX EXPORT, MIN=2: https://download.checklistbank.org/job/0a/0af45acd-f723-4b27-8189-cdacda64b7... [1.9 MB] NAME MATCH iNat->COL: https://download.checklistbank.org/job/6b/6bbb3e5f-2a99-4d56-a87d-a5d2a3af6b... [36.6 MB]
With the following dataset keys: 9923: Catalogue of Life Checklist, version 2023-08-17 139831: iNaturalist Taxonomy, version 2023-08-01
On 14. Sep 2023, at 14:20, MANCINI Raffael via COL-Users col-users@lists.gbif.org wrote:
Dear list,
I'm using the very handy NamesIndex in order to translate between different taxonomic sources. In the case of CoL and iNaturalist, I found that only roughly half of the entries (from a recent export) have both the CoL ID and iNaturalist ID set. For the entries pertaining to the rank "species" the situation is particularly bad, only roughly 10% (4353 of 41109) of the entries do any real translation.
• Why does the nidx export contain rows with either of the IDs empty at all? • Why are so many of the species in particular not correctly mapped? Is this due to missing authorship in the iNaturalist checklist coupled with a strict matching algorithm?
Best regards,
-- Raffael Mancini IT administrator and developer Service d'information digital sur le patrimoine naturel (SIDPNAT) Musée National d'Histoire Naturelle Luxembourg T: +352 247 66667 - https://mnhn.lu
COL-Users mailing list COL-Users@lists.gbif.org https://lists.gbif.org/mailman/listinfo/col-users