[COL-Users] NamesIndex missing translations?

MANCINI Raffael Raffael.MANCINI at mnhn.lu
Fri Sep 15 08:09:17 UTC 2023

Dear Markus,

thanks for the quick reply. I still have some questions left:

  1.  How does the NameIndex match names? Only by strict correspondence of the scientific name and authorship (modulo white-space and special characters)?
  2.  Do I understand correctly that the min parameter only relates to the fact that a name gets included in the result, not if it gets "collapsed"/"matched" into a single row? The second NIDX export with min=2 still has a lot of rows with only one of the IDs set.
  3.  The name "Abelona gigliotosi" shows up in your export with min=2 but it's only present in CoL not iNat (confirmed by a manual search on CLB). How come? To my understanding it should only show up in the min=1 export.
  4.  Are either of the matching algorithms (NameIndex and the regular /match/nameusages/job matching) documented somewhere?

In general, if I want some rather liberal matching, should I just go for the matching api instead of the NameIndex facility?

Best regards!


Raffael Mancini

IT administrator and developer

Service d'information digital sur le patrimoine naturel (SIDPNAT)

Musée National d'Histoire Naturelle Luxembourg

T: +352 247 66667 - https://mnhn.lu

From: Markus Döring <mdoering at gbif.org>
Sent: 15 September 2023 08:41:03
To: MANCINI Raffael; Catalogue of Life user announcements and discussion
Cc: Tim Robertson
Subject: Re: [COL-Users] NamesIndex missing translations?

Thanks Raffael,

There is a parameter "min" that defines how many sources must have the same name before the name gets included in the results.
By default this is just one, so either iNat or COL. If you only want those where both identifiers exist you would POST to this:

POST https://api.checklistbank.org/nidx/export?datasetKey=3LR&datasetKey=139831&min=2

On first glance I can lots of species names with 2 versions. One with authorship (COL) and one without (iNat).
These are different entries in the names index, as the "canonical" one without authorship acts as a superset of all such names with any authorship.

For example these first rows in the result:

species Aa achalensis   Schltr. 8H9J
species Aa achalensis                   https://www.inaturalist.org/taxa/602054
species Aa argyrolepis  Rchb.f. 8H9K
species Aa argyrolepis                  https://www.inaturalist.org/taxa/829647

So this is expected behavior.
To align also these names one would have to use the regular name matching instead.

To match all of iNats names to COL one would do this instead:

POST https://api.checklistbank.org/dataset/3LR/match/nameusage/job?sourceDatasetKey=139831

In a similar manner one can also upload CSV files with names to be matched against any of the datasets in ChecklistBank.


PS: here are the results of above's calls

NIDX EXPORT, MIN=1: https://download.checklistbank.org/job/34/34a68b4c-3a6a-41fc-a681-654dfcdbf2a6.zip [81.5 MB]
NIDX EXPORT, MIN=2:  https://download.checklistbank.org/job/0a/0af45acd-f723-4b27-8189-cdacda64b727.zip [1.9 MB]
NAME MATCH iNat->COL: https://download.checklistbank.org/job/6b/6bbb3e5f-2a99-4d56-a87d-a5d2a3af6bf4.zip [36.6 MB]

With the following dataset keys:
 9923: Catalogue of Life Checklist, version 2023-08-17
 139831: iNaturalist Taxonomy, version 2023-08-01

> On 14. Sep 2023, at 14:20, MANCINI Raffael via COL-Users <col-users at lists.gbif.org> wrote:
> Dear list,
> I'm using the very handy NamesIndex in order to translate between different taxonomic sources. In the case of CoL and iNaturalist, I found that only roughly half of the entries (from a recent export) have both the CoL ID and iNaturalist ID set. For the entries pertaining to the rank "species" the situation is particularly bad, only roughly 10% (4353 of 41109) of the entries do any real translation.
>     • Why does the nidx export contain rows with either of the IDs empty at all?
>     • Why are so many of the species in particular not correctly mapped? Is this due to missing authorship in the iNaturalist checklist coupled with a strict matching algorithm?
> Best regards,
> --
> Raffael Mancini
> IT administrator and developer
> Service d'information digital sur le patrimoine naturel (SIDPNAT)
> Musée National d'Histoire Naturelle Luxembourg
> T: +352 247 66667 - https://mnhn.lu
> _______________________________________________
> COL-Users mailing list
> COL-Users at lists.gbif.org
> https://lists.gbif.org/mailman/listinfo/col-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.gbif.org/pipermail/col-users/attachments/20230915/db86d497/attachment-0001.html>

More information about the COL-Users mailing list