<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">

<meta name="Generator" content="Microsoft Exchange Server">

<!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>

</head>

<body>

<meta content="text/html; charset=UTF-8">

<style type="text/css" style="">

<!--

p

        {margin-top:0;

        margin-bottom:0}

-->

</style>

<div dir="ltr">

<div id="x_divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:#000000; font-family:Calibri,Helvetica,sans-serif">

<p>Dear Markus,</p>

<p><br>

</p>

<p>thanks for the quick reply. I still have some questions left:</p>

<p><br>

</p>

<ol style="margin-bottom:0px; margin-top:0px">

<li>How does the NameIndex match names? Only by strict correspondence of the scientific name and authorship (modulo white-space and special characters)?<br>

</li><li>Do I understand correctly that the min parameter only relates to the fact that a name gets included in the result, not if it gets "collapsed"/"matched" into a single row? The second NIDX export with min=2 still has a lot of rows with only one of the IDs

 set.</li><li>The name "<span>Abelona gigliotosi</span>" shows up in your export with min=2 but it's only present in CoL not iNat (confirmed by a manual search on CLB). How come? To my understanding it should only show up in the min=1 export.<br>

</li><li>Are either of the matching algorithms (NameIndex and the regular /match/nameusages/job matching) documented somewhere?</li></ol>

<div><br>

</div>

<div>In general, if I want some rather liberal matching, should I just go for the matching api instead of the NameIndex facility?<br>

</div>

<div><br>

</div>

<div>Best regards!<br>

</div>

<p><br>

</p>

<div id="x_Signature">

<div id="x_divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:rgb(0,0,0); font-family:Calibri,Helvetica,sans-serif,"EmojiFont","Apple Color Emoji","Segoe UI Emoji",NotoColorEmoji,"Segoe UI Symbol","Android Emoji",EmojiSymbols">

<p></p>

<div>

<p>--<br>

</p>

<p>Raffael Mancini</p>

<p>IT administrator and developer</p>

<p>Service d'information digital sur le patrimoine naturel (SIDPNAT)<br>

</p>

<p>Musée National d'Histoire Naturelle Luxembourg</p>

T: +352 247 66667 - https://mnhn.lu</div>

<p><br>

</p>

</div>

</div>

</div>

<hr tabindex="-1" style="display:inline-block; width:98%">

<div id="x_divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Markus Döring <mdoering@gbif.org><br>

<b>Sent:</b> 15 September 2023 08:41:03<br>

<b>To:</b> MANCINI Raffael; Catalogue of Life user announcements and discussion<br>

<b>Cc:</b> Tim Robertson<br>

<b>Subject:</b> Re: [COL-Users] NamesIndex missing translations?</font>

<div> </div>

</div>

</div>

<font size="2"><span style="font-size:10pt;">

<div class="PlainText">Thanks Raffael,<br>

<br>

There is a parameter "min" that defines how many sources must have the same name before the name gets included in the results.<br>

By default this is just one, so either iNat or COL. If you only want those where both identifiers exist you would POST to this:<br>

<br>

POST <a href="https://api.checklistbank.org/nidx/export?datasetKey=3LR&datasetKey=139831&min=2">

https://api.checklistbank.org/nidx/export?datasetKey=3LR&datasetKey=139831&min=2</a><br>

<br>

On first glance I can lots of species names with 2 versions. One with authorship (COL) and one without (iNat).<br>

These are different entries in the names index, as the "canonical" one without authorship acts as a superset of all such names with any authorship.<br>

<br>

For example these first rows in the result:<br>

<br>

species Aa achalensis   Schltr. 8H9J    <br>

species Aa achalensis                   <a href="https://www.inaturalist.org/taxa/602054">

https://www.inaturalist.org/taxa/602054</a><br>

species Aa argyrolepis  Rchb.f. 8H9K    <br>

species Aa argyrolepis                  <a href="https://www.inaturalist.org/taxa/829647">

https://www.inaturalist.org/taxa/829647</a><br>

<br>

So this is expected behavior. <br>

To align also these names one would have to use the regular name matching instead.<br>

<br>

To match all of iNats names to COL one would do this instead:<br>

<br>

POST <a href="https://api.checklistbank.org/dataset/3LR/match/nameusage/job?sourceDatasetKey=139831">

https://api.checklistbank.org/dataset/3LR/match/nameusage/job?sourceDatasetKey=139831</a><br>

<br>

In a similar manner one can also upload CSV files with names to be matched against any of the datasets in ChecklistBank.<br>

<br>

<br>

Best,<br>

Markus<br>

<br>

<br>

<br>

PS: here are the results of above's calls<br>

<br>

NIDX EXPORT, MIN=1: <a href="https://download.checklistbank.org/job/34/34a68b4c-3a6a-41fc-a681-654dfcdbf2a6.zip">

https://download.checklistbank.org/job/34/34a68b4c-3a6a-41fc-a681-654dfcdbf2a6.zip</a> [81.5 MB]<br>

NIDX EXPORT, MIN=2:  <a href="https://download.checklistbank.org/job/0a/0af45acd-f723-4b27-8189-cdacda64b727.zip">

https://download.checklistbank.org/job/0a/0af45acd-f723-4b27-8189-cdacda64b727.zip</a> [1.9 MB]<br>

NAME MATCH iNat->COL: <a href="https://download.checklistbank.org/job/6b/6bbb3e5f-2a99-4d56-a87d-a5d2a3af6bf4.zip">

https://download.checklistbank.org/job/6b/6bbb3e5f-2a99-4d56-a87d-a5d2a3af6bf4.zip</a> [36.6 MB]<br>

<br>

With the following dataset keys:<br>

 9923: Catalogue of Life Checklist, version 2023-08-17<br>

 139831: iNaturalist Taxonomy, version 2023-08-01<br>

<br>

<br>

<br>

<br>

> On 14. Sep 2023, at 14:20, MANCINI Raffael via COL-Users <col-users@lists.gbif.org> wrote:<br>

> <br>

> Dear list,<br>

> <br>

> I'm using the very handy NamesIndex in order to translate between different taxonomic sources. In the case of CoL and iNaturalist, I found that only roughly half of the entries (from a recent export) have both the CoL ID and iNaturalist ID set. For the entries

 pertaining to the rank "species" the situation is particularly bad, only roughly 10% (4353 of 41109) of the entries do any real translation.<br>

> <br>

>     • Why does the nidx export contain rows with either of the IDs empty at all?<br>

>     • Why are so many of the species in particular not correctly mapped? Is this due to missing authorship in the iNaturalist checklist coupled with a strict matching algorithm?<br>

> <br>

> Best regards,<br>

> <br>

> --<br>

> Raffael Mancini<br>

> IT administrator and developer<br>

> Service d'information digital sur le patrimoine naturel (SIDPNAT)<br>

> Musée National d'Histoire Naturelle Luxembourg<br>

> T: +352 247 66667 - <a href="https://mnhn.lu">https://mnhn.lu</a><br>

> <br>

> _______________________________________________<br>

> COL-Users mailing list<br>

> COL-Users@lists.gbif.org<br>

> <a href="https://lists.gbif.org/mailman/listinfo/col-users">https://lists.gbif.org/mailman/listinfo/col-users</a><br>

<br>

<br>

</div>

</span></font>

</body>

</html>