Hi Scott,

 

I’m sure that someone with more direct knowledge of the GBIF taxonomy backbone will answer more specifically.  But in general, essentially all large taxonomic databases have these sorts of duplicate records due to spelling variations, etc.  Most such databases began by harvesting lists of (messy) text-string names from various sources, with the early emphasis being on quantity rather than quality.  In recent years, the emphasis has shifted towards improving quality, and to greater or lesser degrees, most large databases and aggregators have made tremendous progress in reconciling and correcting these sorts of issues.  However, these kind of lexical variants (i.e., two slightly different spellings being mistakenly represented as separate names) continue to exist, and probably will continue for quite some time (especially in large taxonomic aggregators, such as GIBIF).  The Global Names Architecture has current NSF funding (PI: Dima Mozzherin) to develop tools to help reconcile these sorts of lexical variants, and we have another NSF grant pending that will flesh those cleaned/reconciled text-string names out into metadata-rich names and name-usages… so there is some additional hope of accelerated clean-up in the next few years.  But until then, I’m afraid these kinds of duplicates will continued to be discovered and addressed on a case-by-case basis.

 

Not sure if that helps…. But if you do restrict to a single source (like CoL), you’re less likely to encounter these kinds of duplicates, and the presumption is that linking to either one will eventually get straightened out.

 

Aloha,

Rich

 

Richard L. Pyle, PhD
Database Coordinator for Natural Sciences | Associate Zoologist in Ichthyology | Dive Safety Officer
Department of Natural Sciences, Bishop Museum, 1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org
http://hbs.bishopmuseum.org/staff/pylerichard.html

 

 

 

From: API-users [mailto:api-users-bounces@lists.gbif.org] On Behalf Of Scott Chamberlain
Sent: Wednesday, May 11, 2016 11:23 AM
To: api-users@lists.gbif.org
Cc: juli g. pausas
Subject: [API-users] Scientific names questions

 

HI all, 

 

Not sure where is best to ask this... so here goes. Let me know if there's a better place.  

 

The following are examples some users have highlighted for me as leading to confusion when searching for taxa.

 

1. Macrozamia platyrachis (http://www.gbif.org/species/4928834) vs. Macrozamia platyrhachis (http://www.gbif.org/species/2683551)

 

Here, the two spellings (with/without h) are accepted, and exact matches. The sci. authority seems to differ with F. M. Bailey vs. F.M.Bailey. The first is from GRIN taxonomy and the second from COL. 

 

Anyway, for users e.g., of the R client, this is a bit confusing. I had thought the backbone taxonomy would only have one master taxon key and name for each real taxon, but here it seems like there's two?

 

2. Cycas circinalis (http://www.gbif.org/species/2683264 ) vs. Cycas circinnalis (http://www.gbif.org/species/3594916 )


Here, the two spellings (with 1 or 2 "n"'s) are accepted, and exact matches. The sci. authorities here are exactly the same. The first is from COL and the second from IPNI taxonomy. 

 

3. Isolona perrieri (http://www.gbif.org/species/3648546 ) vs Isolona perrierii (http://www.gbif.org/species/6308376 )


Here, the two spellings (with 1 or 2 "i"'s) are accepted, and exact matches. The sci. authorities here are exactly the same. The first is from TPL and the second from COL 

 

--------

 

Should I advise users to when searching on the backbone taxonomy to limit to COL to avoid any confusion about names?  

 

Best, 

Scott Chamberlain