Thanks Raffael,
There is a parameter "min" that defines how many sources must have the same name before the name gets included in the results.
By default this is just one, so either iNat or COL. If you only want those where both identifiers exist you would POST to this:
POST
https://api.checklistbank.org/nidx/export?datasetKey=3LR&datasetKey=139831&min=2
On first glance I can lots of species names with 2 versions. One with authorship (COL) and one without (iNat).
These are different entries in the names index, as the "canonical" one without authorship acts as a superset of all such names with any authorship.
For example these first rows in the result:
species Aa achalensis Schltr. 8H9J
species Aa achalensis
https://www.inaturalist.org/taxa/602054
species Aa argyrolepis Rchb.f. 8H9K
species Aa argyrolepis
https://www.inaturalist.org/taxa/829647
So this is expected behavior.
To align also these names one would have to use the regular name matching instead.
To match all of iNats names to COL one would do this instead:
POST
https://api.checklistbank.org/dataset/3LR/match/nameusage/job?sourceDatasetKey=139831
In a similar manner one can also upload CSV files with names to be matched against any of the datasets in ChecklistBank.
Best,
Markus
PS: here are the results of above's calls
NIDX EXPORT, MIN=1:
https://download.checklistbank.org/job/34/34a68b4c-3a6a-41fc-a681-654dfcdbf2a6.zip [81.5 MB]
NIDX EXPORT, MIN=2:
https://download.checklistbank.org/job/0a/0af45acd-f723-4b27-8189-cdacda64b727.zip [1.9 MB]
NAME MATCH iNat->COL:
https://download.checklistbank.org/job/6b/6bbb3e5f-2a99-4d56-a87d-a5d2a3af6bf4.zip [36.6 MB]
With the following dataset keys:
9923: Catalogue of Life Checklist, version 2023-08-17
139831: iNaturalist Taxonomy, version 2023-08-01
> On 14. Sep 2023, at 14:20, MANCINI Raffael via COL-Users <col-users@lists.gbif.org> wrote:
>
> Dear list,
>
> I'm using the very handy NamesIndex in order to translate between different taxonomic sources. In the case of CoL and iNaturalist, I found that only roughly half of the entries (from a recent export) have both the CoL ID and iNaturalist ID set. For the entries
pertaining to the rank "species" the situation is particularly bad, only roughly 10% (4353 of 41109) of the entries do any real translation.
>
> • Why does the nidx export contain rows with either of the IDs empty at all?
> • Why are so many of the species in particular not correctly mapped? Is this due to missing authorship in the iNaturalist checklist coupled with a strict matching algorithm?
>
> Best regards,
>
> --
> Raffael Mancini
> IT administrator and developer
> Service d'information digital sur le patrimoine naturel (SIDPNAT)
> Musée National d'Histoire Naturelle Luxembourg
> T: +352 247 66667 -
https://mnhn.lu
>
> _______________________________________________
> COL-Users mailing list
> COL-Users@lists.gbif.org
>
https://lists.gbif.org/mailman/listinfo/col-users