[COL-Users] NamesIndex missing translations?

Markus Döring mdoering at gbif.org
Fri Sep 15 06:41:03 UTC 2023

Thanks Raffael,

There is a parameter "min" that defines how many sources must have the same name before the name gets included in the results.
By default this is just one, so either iNat or COL. If you only want those where both identifiers exist you would POST to this:

POST https://api.checklistbank.org/nidx/export?datasetKey=3LR&datasetKey=139831&min=2

On first glance I can lots of species names with 2 versions. One with authorship (COL) and one without (iNat).
These are different entries in the names index, as the "canonical" one without authorship acts as a superset of all such names with any authorship.

For example these first rows in the result:

species	Aa achalensis	Schltr.	8H9J	
species	Aa achalensis			https://www.inaturalist.org/taxa/602054
species	Aa argyrolepis	Rchb.f.	8H9K	
species	Aa argyrolepis			https://www.inaturalist.org/taxa/829647

So this is expected behavior. 
To align also these names one would have to use the regular name matching instead.

To match all of iNats names to COL one would do this instead:

POST https://api.checklistbank.org/dataset/3LR/match/nameusage/job?sourceDatasetKey=139831

In a similar manner one can also upload CSV files with names to be matched against any of the datasets in ChecklistBank.


PS: here are the results of above's calls

NIDX EXPORT, MIN=1: https://download.checklistbank.org/job/34/34a68b4c-3a6a-41fc-a681-654dfcdbf2a6.zip [81.5 MB]
NIDX EXPORT, MIN=2:  https://download.checklistbank.org/job/0a/0af45acd-f723-4b27-8189-cdacda64b727.zip [1.9 MB]
NAME MATCH iNat->COL: https://download.checklistbank.org/job/6b/6bbb3e5f-2a99-4d56-a87d-a5d2a3af6bf4.zip [36.6 MB]

With the following dataset keys:
 9923: Catalogue of Life Checklist, version 2023-08-17
 139831: iNaturalist Taxonomy, version 2023-08-01

> On 14. Sep 2023, at 14:20, MANCINI Raffael via COL-Users <col-users at lists.gbif.org> wrote:
> Dear list,
> I'm using the very handy NamesIndex in order to translate between different taxonomic sources. In the case of CoL and iNaturalist, I found that only roughly half of the entries (from a recent export) have both the CoL ID and iNaturalist ID set. For the entries pertaining to the rank "species" the situation is particularly bad, only roughly 10% (4353 of 41109) of the entries do any real translation.
>     • Why does the nidx export contain rows with either of the IDs empty at all?
>     • Why are so many of the species in particular not correctly mapped? Is this due to missing authorship in the iNaturalist checklist coupled with a strict matching algorithm?
> Best regards,
> --
> Raffael Mancini
> IT administrator and developer
> Service d'information digital sur le patrimoine naturel (SIDPNAT)
> Musée National d'Histoire Naturelle Luxembourg
> T: +352 247 66667 - https://mnhn.lu
> _______________________________________________
> COL-Users mailing list
> COL-Users at lists.gbif.org
> https://lists.gbif.org/mailman/listinfo/col-users

More information about the COL-Users mailing list