Is there a GBIF specific LSID that can be used?
Working with a range of web services, I have found myself making extensive use of the LSIDs that are specific to each data source. For example, for ITIS, I use the Ecsenius bicolor LSID: *urn:lsid:itis.gov:itis_tsn:636326*
For WoRMS, the LSID for Ecsenius bicolor is: *urn:lsid:marinespecies.org:taxname:277652*
For Atlas of living Australia the LSID for Ecsenius bicolor is: *urn:lsid:biodiversity.org.au:afd.taxon:99c29e7c-5b04-4e57-8e6b-82aa442a801a*
Is there a GBIF LSID that can similarly be used as a unique identifier for a taxon? I have come across the various GBIF unique keys but these are not unique outside of the GBIF environment and within the Gaia Guide systems I am deciding how best to work with these, ensuring their uniqueness, alongside identifiers from other data sources.
Thanks again for your assistance.
Geoff Shuetrim Gaia Guide Association http://www.gaiaguide.info/
Hello Geoff,
GBIF uses simples integers as taxon identifiers, for example 2396049 for Ecsenius bicolor. These ids are stable, but obviously not globally unique. If you need a URI right now I would recommend for now to use our restful portal URL: http://www.gbif.org/species/2396049
For the future I could imagine us assigning DOIs to taxa reusing the current integer ids, but that has to be carefully evaluated first. Would a taxon DOI be a valuable feature for you?
Cheers, Markus
-- Markus Döring Software Developer Global Biodiversity Information Facility (GBIF) mdoering@gbif.org http://www.gbif.org
On 05 Aug 2014, at 05:17, Geoff Shuetrim geoff@galexy.net wrote:
Working with a range of web services, I have found myself making extensive use of the LSIDs that are specific to each data source. For example, for ITIS, I use the Ecsenius bicolor LSID: urn:lsid:itis.gov:itis_tsn:636326
For WoRMS, the LSID for Ecsenius bicolor is: urn:lsid:marinespecies.org:taxname:277652
For Atlas of living Australia the LSID for Ecsenius bicolor is: urn:lsid:biodiversity.org.au:afd.taxon:99c29e7c-5b04-4e57-8e6b-82aa442a801a
Is there a GBIF LSID that can similarly be used as a unique identifier for a taxon? I have come across the various GBIF unique keys but these are not unique outside of the GBIF environment and within the Gaia Guide systems I am deciding how best to work with these, ensuring their uniqueness, alongside identifiers from other data sources.
Thanks again for your assistance.
Geoff Shuetrim Gaia Guide Association http://www.gaiaguide.info/ _______________________________________________ API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
Markus --- I think the answer to the question: "Would a taxon DOI be a valuable feature for you?" really depends on some of the details. With a taxon name, you are putting a DOI on a string and one that has been dissociated from its source(s). I would think more valuable would be a DOI linked to the checklist that contained the name, and maybe a passthrough (a la suffix passthroughs in the EZID system) to the individual name. That way I can resolve that taxon name to the source from whence it came.
Best, Rob
On Mon, Aug 18, 2014 at 3:44 AM, Markus Döring mdoering@gbif.org wrote:
Hello Geoff,
GBIF uses simples integers as taxon identifiers, for example 2396049 for Ecsenius bicolor. These ids are stable, but obviously not globally unique. If you need a URI right now I would recommend for now to use our restful portal URL: http://www.gbif.org/species/2396049
For the future I could imagine us assigning DOIs to taxa reusing the current integer ids, but that has to be carefully evaluated first. Would a taxon DOI be a valuable feature for you?
Cheers, Markus
-- Markus Döring Software Developer Global Biodiversity Information Facility (GBIF) mdoering@gbif.org http://www.gbif.org
On 05 Aug 2014, at 05:17, Geoff Shuetrim geoff@galexy.net wrote:
Working with a range of web services, I have found myself making
extensive use of the LSIDs that are specific to each data source. For example, for ITIS, I use the Ecsenius bicolor LSID: urn:lsid:itis.gov: itis_tsn:636326
For WoRMS, the LSID for Ecsenius bicolor is: urn:lsid:marinespecies.org:
taxname:277652
For Atlas of living Australia the LSID for Ecsenius bicolor is: urn:lsid:biodiversity.org.au:
afd.taxon:99c29e7c-5b04-4e57-8e6b-82aa442a801a
Is there a GBIF LSID that can similarly be used as a unique identifier
for a taxon? I have come across the various GBIF unique keys but these are not unique outside of the GBIF environment and within the Gaia Guide systems I am deciding how best to work with these, ensuring their uniqueness, alongside identifiers from other data sources.
Thanks again for your assistance.
Geoff Shuetrim Gaia Guide Association http://www.gaiaguide.info/ _______________________________________________ API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
Hi Rob,
At the risk of opening the whole taxon/name/concept can of worms, I’d see this a little differently.
For me a taxon name is a name + the original publication, rather than simply a text string. A taxon is different again, being essentially a statement about a collection of things that belong to the same taxon, and a statement of what to call them.
Taxon databases (e.g., GBIF) tend use strings for names, when it would be more elegant to use identifiers for names + publications. We could go some way towards cleaning the mess we’ve accumulated if we adopted (and reused) identifiers for these things. For a start, name strings that don’t map to identifiers in nomenclators would immediately be under suspicion as being potentially erroneous. it also links names to evidence, which is something we’re spectacularly bad at doing at the moment.
For example, "Pristimantis vilcabambae” is a text string which isn’t terribly useful. But if we combine that with details on where and when it was published we get something a bit more useful:
"Pristimantis vilcabambae Lehr 2007 published in DOI http://dx.doi.org/10.3099/0027-4100(2007)159%5B145:NEFLPP%5D2.0.CO;2 “ This is the information I’m accumulating in BioNames, by combining metadata from ION LSIDs with data from CrossRef and BioStor , see http://bionames.org/names/cluster/1949681
Should this "name string + publication” get a DOI? Sure. Then I’d want GBIF (and other taxon databases) to link to this name on their taxon pages. In other words, http://www.gbif.org/species/2425396 should have an identifier for the taxon name, instead of simply using a text string.
I’m beginning to sound like Rich Pyle, and he and I would a lost certainly model these things differently, but name strings <> taxon names <> taxa
Regards
Rod
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: Roderic.Page@glasgow.ac.ukmailto:Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com ORCID: http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
On 18 Aug 2014, at 14:29, Robert Guralnick <Robert.Guralnick@colorado.edumailto:Robert.Guralnick@colorado.edu> wrote:
Markus --- I think the answer to the question: "Would a taxon DOI be a valuable feature for you?" really depends on some of the details. With a taxon name, you are putting a DOI on a string and one that has been dissociated from its source(s). I would think more valuable would be a DOI linked to the checklist that contained the name, and maybe a passthrough (a la suffix passthroughs in the EZID system) to the individual name. That way I can resolve that taxon name to the source from whence it came.
Best, Rob
On Mon, Aug 18, 2014 at 3:44 AM, Markus Döring <mdoering@gbif.orgmailto:mdoering@gbif.org> wrote: Hello Geoff,
GBIF uses simples integers as taxon identifiers, for example 2396049 for Ecsenius bicolor. These ids are stable, but obviously not globally unique. If you need a URI right now I would recommend for now to use our restful portal URL: http://www.gbif.org/species/2396049
For the future I could imagine us assigning DOIs to taxa reusing the current integer ids, but that has to be carefully evaluated first. Would a taxon DOI be a valuable feature for you?
Cheers, Markus
-- Markus Döring Software Developer Global Biodiversity Information Facility (GBIF) mdoering@gbif.orgmailto:mdoering@gbif.org http://www.gbif.orghttp://www.gbif.org/
On 05 Aug 2014, at 05:17, Geoff Shuetrim <geoff@galexy.netmailto:geoff@galexy.net> wrote:
Working with a range of web services, I have found myself making extensive use of the LSIDs that are specific to each data source. For example, for ITIS, I use the Ecsenius bicolor LSID: urn:lsid:itis.gov:itis_tsn:636326
For WoRMS, the LSID for Ecsenius bicolor is: urn:lsid:marinespecies.org:taxname:277652
For Atlas of living Australia the LSID for Ecsenius bicolor is: urn:lsid:biodiversity.org.au:afd.taxon:99c29e7c-5b04-4e57-8e6b-82aa442a801a
Is there a GBIF LSID that can similarly be used as a unique identifier for a taxon? I have come across the various GBIF unique keys but these are not unique outside of the GBIF environment and within the Gaia Guide systems I am deciding how best to work with these, ensuring their uniqueness, alongside identifiers from other data sources.
Thanks again for your assistance.
Geoff Shuetrim Gaia Guide Association http://www.gaiaguide.info/ _______________________________________________ API-users mailing list API-users@lists.gbif.orgmailto:API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
_______________________________________________ API-users mailing list API-users@lists.gbif.orgmailto:API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
_______________________________________________ API-users mailing list API-users@lists.gbif.orgmailto:API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
Since Rod opened the can of worms, Ill dig in to it an feast along with the others.
Here is what seven years of NOMINA (http://globalnames.org/Nomina) meetings, plus millions of conversations at TDWG, Pro-iBiosphere, ICZN, ICB, iDigBio and many other regional, national, and international conferences, plus millions of dollars of targeted funding from various sources to drive the Global Names initiative .has led us to.
First, the biodiversity informatics realm is full of name-strings. These are strings of text characters, usually encoded as UTF-8, purported to represent taxon names of organisms. They may or may not include authorships, and/or abbreviations, and/or qualifiers of various sorts. These are the things that are indexed in GNI (http://gni.globalnames.org)
I completely agree with Rod that a taxon name is much more than just the string of UTF-8 characters used to render it. For clarity of communication (as if that were even possible in these kinds of discussions), I refer to these as name objects. They are conceptual (abstract) constructs, and are uniquely represented by a rich suite of metadata (publication metadata in which the name was originally established in accordance with a nomenclatural Code, authorship metadata, type specimen or type taxon metadata, etc.). A single taxon name might be represented via different name-strings (e.g., different alternate spellings, different genus combinations, etc.), and a single name-string might be applied to different name-objects (homonyms & homographs).
And, again, I completely agree with Rod that a taxon (=taxon concept, = taxonomic circumscription) is something else it is another conceptual (abstract) construct, typically represented by a broader collection of metadata, including things like included child taxa, included synonym taxa, biological characters, and possibly other stuff such as geographic distribution. A single taxon might have more than one taxon name applied to it (synonyms), and a single taxon name (in the name-object sense, not just the name-string sense) might have been used to represent different taxon concepts (e.g., sensu stricto vs. sensu lato senses of the same name-object). The most practical way to refer to a taxon is the combination of a name-object (as described above), plus usage instance, e.g. Aus bus Linnaeus 1758 sec. Pyle 2014 (the part before the sec. represents the name-object, and the part after the sec. refers to the specific usage instance that applies the name-object to a taxon concept).
Classifications (per se) are a little bit different, but are often included in the taxon concept space, even though they are technically not (logically) part of the taxon concept. The taxon concept is really the circumscribed set of organisms included within the concept. Changing the higher classification, by itself, has no impact on the circumscribed set of organisms included within the concept. However, thats a topic for another can-of-worms discussion.
So . The seven years of NOMINA meetings, millions of conversations and millions of dollars has revealed that the notion of a Taxon Name Usage instances (TNU), as indexed in the Global Names Usage Bank (GNUB), is an extremely powerful unit that addresses taxon names (name-objects), taxon concepts, and classifications; all with a single domain of identifiers (minted for TNUs). Rob Whitton and I have functioning prototypes that demonstrate the power of TNUs for managing nomenclatural, taxonomic, and classification data; and we just last week submitted a proposal to NSF to expand these prototypes into full-function services.
The seven years and millions of conversations and dollars has also taught us that the most practical way to manage this information in biodiversity informatics-land is through two nodes: a dirty bucket (GNI name-strings), and a clean bucket (GNUB). Dima Mozzherin has new funding from NSF to begin developing the service workflows to bridge name-strings (as they exist in most biodiversity databases) to Protonyms (the subset of TNUs that represent name-objects). Starting in October, we will begin to bridge our respective prototypes (funded by NSF through the Global Names project) into a seamless tool. We hope to have something more meaningful to say about this at TDWG; but one of the key things to keep in mind is that GNA (which includes GNI & GNUB) are low-level cross-linking tools and services NOT replacements for CoL, ITIS, EOL, GBIF, WoRMS, NCBI taxonomy, etc., etc., etc. These other initiatives provide the information that end-users actually want. The role of GNA is to provide a core infrastructure (analogous to DNS) that most people use every day without ever knowing it.
The DOI thing is a bit of a misdirection. The identifiers (sensu non-LOD world) for name-strings are managed by GNI, and for TNUs by GNUB. Both are UUIDs, and as such are pure identifiers (i.e., not actionable by themselves). DOI is one of many possible identifier dereferencing services (ARC is another, and there are a host of others). DOI happens to be a particularly robust and useful dereferencing services, and as such it makes perfect sense to me to represent TNU identifiers as DOIs, as long as someone has the funding to make it happen.
So to follow on Rods example, the TNU representing the name-object for the species epithet vilcabambae, as originally established in the publication Lehr 2007, is:
4B913B74-E880-4EC9-B0A9-F3AB9F02288B
Alone, that UUID does even less for you than the text-string Pristimantis vilcabambae does. However, combining it with a dereferencing service, such as http://zoobank.org/, you can start doing some more interesting things:
http://zoobank.org/4B913B74-E880-4EC9-B0A9-F3AB9F02288B
For example, you can get to the original publication as registered in ZooBank (http://zoobank.org/37BFC245-DDD6-4AB4-B4B1-DD6826B86873), which gets you a link to the DOI and the ResearchGate page for this reference. You can also get a link to the GBIF page, ITIS page, EOL page, ION page, and a few others (youd also get links to the ASW site, if they had continued to expose their internal identifiers; though now it seems that they dont anymore). You also see a call to BHLs OpenURL service to automagically get the page image of the original description. And you get a resultset from GNI to see links to other datasets.
And thats all from just ONE metadata dereferencing service (ZooBank). I think it would be WONDERFUL to have this identifier represented within DOI-space as well (e.g., http://dx.doi.org/10.XXXXX/4B913B74-E880-4EC9-B0A9-F3AB9F02288B), but someone needs to step forward as the XXXXX domain to mint the DOI. By doing so, not only would you be plugged into the GNA infrastructure (as described above), but also the CrossRef infrastructure and all the whizbang services that it provides. PLAZI and GNA have agreed that a taxon treatment = a TNU, and hence will share the same UUIDs for them (thus opening up the PLAZI services for use with the same identifiers).
In summary, Taxon name-strings, name-objects, concepts (and also classifications) are very different things, with different implied properties, and different implied meanings. GNA is well on its way to serving robust services based on persistent identifiers that are actionable through multiple dereferencing services. Including more dereferencing services (like DOI) is a GOOD thing! Re-using identifiers is a GOOD thing. Unnecessarily re-inventing wheels is NOT a particularly good thing.
Aloha,
Rich
P.S The astute among you will have noticed that the GNA cross-links and services (including ZooBank registrations) described above did not exist before I started replying to this email. And that is the POINT. GNA is an INFRASTRUCTURE to allow *US* (we the biodiversity practitioners of the world) to cross-link content. The fact that I was able to use the EXISTING GNA infrastructure to cross-link all these resources associated with the text-string name Pristimantis vilcabambae in FAR LESS time than it took me to compose this email message, speaks volumes about the potential that such an infrastructure can have.
From: api-users-bounces@lists.gbif.org [mailto:api-users-bounces@lists.gbif.org] On Behalf Of Roderic Page Sent: Monday, August 18, 2014 3:53 AM To: Rob Guralnick Cc: api-users@lists.gbif.org Subject: Re: [API-users] Is there a GBIF specific LSID that can be used?
Hi Rob,
At the risk of opening the whole taxon/name/concept can of worms, Id see this a little differently.
For me a taxon name is a name + the original publication, rather than simply a text string. A taxon is different again, being essentially a statement about a collection of things that belong to the same taxon, and a statement of what to call them.
Taxon databases (e.g., GBIF) tend use strings for names, when it would be more elegant to use identifiers for names + publications. We could go some way towards cleaning the mess weve accumulated if we adopted (and reused) identifiers for these things. For a start, name strings that dont map to identifiers in nomenclators would immediately be under suspicion as being potentially erroneous. it also links names to evidence, which is something were spectacularly bad at doing at the moment.
For example, "Pristimantis vilcabambae is a text string which isnt terribly useful. But if we combine that with details on where and when it was published we get something a bit more useful:
"Pristimantis vilcabambae Lehr 2007 published in DOI http://dx.doi.org/10.3099/0027-4100(2007)159%5B145:NEFLPP%5D2.0.CO;2 This is the information Im accumulating in BioNames, by combining metadata from ION LSIDs with data from CrossRef and BioStor , see http://bionames.org/names/cluster/1949681
Should this "name string + publication get a DOI? Sure. Then Id want GBIF (and other taxon databases) to link to this name on their taxon pages. In other words, http://www.gbif.org/species/2425396 should have an identifier for the taxon name, instead of simply using a text string.
Im beginning to sound like Rich Pyle, and he and I would a lost certainly model these things differently, but name strings <> taxon names <> taxa
Regards
Rod
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com ORCID: http://orcid.org/0000-0002-7101-9767
Citations: http://scholar.google.co.uk/citations?hl=en http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ &user=4Z5WABAAAAAJ
On 18 Aug 2014, at 14:29, Robert Guralnick Robert.Guralnick@colorado.edu wrote:
Markus --- I think the answer to the question: "Would a taxon DOI be a valuable feature for you?" really depends on some of the details. With a taxon name, you are putting a DOI on a string and one that has been dissociated from its source(s). I would think more valuable would be a DOI linked to the checklist that contained the name, and maybe a passthrough (a la suffix passthroughs in the EZID system) to the individual name. That way I can resolve that taxon name to the source from whence it came.
Best, Rob
On Mon, Aug 18, 2014 at 3:44 AM, Markus Döring mdoering@gbif.org wrote:
Hello Geoff,
GBIF uses simples integers as taxon identifiers, for example 2396049 for Ecsenius bicolor. These ids are stable, but obviously not globally unique. If you need a URI right now I would recommend for now to use our restful portal URL: http://www.gbif.org/species/2396049
For the future I could imagine us assigning DOIs to taxa reusing the current integer ids, but that has to be carefully evaluated first. Would a taxon DOI be a valuable feature for you?
Cheers, Markus
-- Markus Döring Software Developer Global Biodiversity Information Facility (GBIF) mdoering@gbif.org http://www.gbif.org http://www.gbif.org/
On 05 Aug 2014, at 05:17, Geoff Shuetrim geoff@galexy.net wrote:
Working with a range of web services, I have found myself making extensive
use of the LSIDs that are specific to each data source. For example, for ITIS, I use the Ecsenius bicolor LSID: urn:lsid:itis.gov:itis_tsn:636326
For WoRMS, the LSID for Ecsenius bicolor is:
urn:lsid:marinespecies.org:taxname:277652
For Atlas of living Australia the LSID for Ecsenius bicolor is:
urn:lsid:biodiversity.org.au:afd.taxon:99c29e7c-5b04-4e57-8e6b-82aa442a801a
Is there a GBIF LSID that can similarly be used as a unique identifier for
a taxon? I have come across the various GBIF unique keys but these are not unique outside of the GBIF environment and within the Gaia Guide systems I am deciding how best to work with these, ensuring their uniqueness, alongside identifiers from other data sources.
Thanks again for your assistance.
Geoff Shuetrim Gaia Guide Association http://www.gaiaguide.info/ _______________________________________________ API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
_______________________________________________ API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
_______________________________________________ API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
Hi Rich, Rod & Rob,
thanks for this interesting taxonomic / GNA discussion. It might be a little confusing and boring to GBIF API users, so maybe we continue privately and restrict discussions on this list to GBIF API related topics.
The original question raised was if GBIF provides LSIDs or other globally unique identifiers for GBIF backbone taxa. As we only have local ids now and GBIF will be able to issue DataCite DOIs very soon I wondered if it helps to mint DOIs on top of the local ids to make them globally unique. Any thoughts on this would be appreciated.
A checklist bank id refers to a "name usage" and is similar to a TNU in GNUB I suppose. It identifies a taxon name being used within a certain (taxonomic) dataset and can refer to either an accepted taxon or a synonym. Identifiers are stable over different versions of the backbone, but the exact classification and list of synonyms for an accepted taxon is allowed to change. In the near future I would also like to allow the name string to change in case of misspellings and other small variations.
For concrete implementations it is quite a challenge to come up with a clear definition when a *taxon* identifier should change and when it should remain the same. Would users like to see true taxon concept identifiers for the GBIF backbone that remain stable as long as GBIF regards the taxon still the same whatever scientific name is used as the currently accepted label? If we had better information about types and original name usages (protonyms, basionyms) we could try to assign stable ids to a fixed set of protonyms in the GBIF backbone. Does that sound reasonable?
Cheers, Markus
On 18 Aug 2014, at 23:20, Richard Pyle deepreef@bishopmuseum.org wrote:
Since Rod opened the can of worms, I’ll dig in to it an feast along with the others.
Here is what seven years of NOMINA (http://globalnames.org/Nomina) meetings, plus millions of conversations at TDWG, Pro-iBiosphere, ICZN, ICB, iDigBio and many other regional, national, and international conferences, plus millions of dollars of targeted funding from various sources to drive the Global Names initiative….has led us to.
First, the biodiversity informatics realm is full of name-strings. These are strings of text characters, usually encoded as UTF-8, purported to represent taxon names of organisms. They may or may not include authorships, and/or abbreviations, and/or qualifiers of various sorts. These are the things that are indexed in GNI (http://gni.globalnames.org)
I completely agree with Rod that a “taxon name” is much more than just the string of UTF-8 characters used to render it. For clarity of communication (as if that were even possible in these kinds of discussions), I refer to these as “name objects”. They are conceptual (abstract) constructs, and are uniquely represented by a rich suite of metadata (publication metadata in which the name was originally established in accordance with a nomenclatural Code, authorship metadata, type specimen or type taxon metadata, etc.). A single taxon name might be represented via different name-strings (e.g., different alternate spellings, different genus combinations, etc.), and a single name-string might be applied to different name-objects (homonyms & homographs).
And, again, I completely agree with Rod that a “taxon” (=taxon concept, = taxonomic circumscription) is something else – it is another conceptual (abstract) construct, typically represented by a broader collection of metadata, including things like included child taxa, included synonym taxa, biological characters, and possibly other stuff such as geographic distribution. A single taxon might have more than one taxon name applied to it (synonyms), and a single taxon name (in the name-object sense, not just the name-string sense) might have been used to represent different taxon concepts (e.g., sensu stricto vs. sensu lato senses of the same name-object). The most practical way to refer to a taxon is the combination of a name-object (as described above), plus usage instance, e.g. “Aus bus Linnaeus 1758 sec. Pyle 2014” (the part before the “sec.” represents the name-object, and the part after the “sec.” refers to the specific usage instance that applies the name-object to a taxon concept).
Classifications (per se) are a little bit different, but are often included in the taxon concept space, even though they are technically not (logically) part of the taxon concept. The taxon concept is really the circumscribed set of organisms included within the concept. Changing the higher classification, by itself, has no impact on the circumscribed set of organisms included within the concept. However, that’s a topic for another can-of-worms discussion.
So…. The seven years of NOMINA meetings, millions of conversations and millions of dollars has revealed that the notion of a “Taxon Name Usage” instances (TNU), as indexed in the Global Names Usage Bank (GNUB), is an extremely powerful unit that addresses taxon names (name-objects), taxon concepts, and classifications; all with a single domain of identifiers (minted for TNUs). Rob Whitton and I have functioning prototypes that demonstrate the power of TNUs for managing nomenclatural, taxonomic, and classification data; and we just last week submitted a proposal to NSF to expand these prototypes into full-function services.
The seven years and millions of conversations and dollars has also taught us that the most practical way to manage this information in biodiversity informatics-land is through two nodes: a “dirty bucket” (GNI name-strings), and a “clean bucket” (GNUB). Dima Mozzherin has new funding from NSF to begin developing the service workflows to bridge name-strings (as they exist in most biodiversity databases) to Protonyms (the subset of TNUs that represent name-objects). Starting in October, we will begin to bridge our respective prototypes (funded by NSF through the Global Names project) into a seamless tool. We hope to have something more meaningful to say about this at TDWG; but one of the key things to keep in mind is that GNA (which includes GNI & GNUB) are low-level cross-linking tools and services – NOT replacements for CoL, ITIS, EOL, GBIF, WoRMS, NCBI taxonomy, etc., etc., etc. These other initiatives provide the information that end-users actually want. The role of GNA is to provide a core infrastructure (analogous to DNS) that most people use every day without ever knowing it.
The DOI thing is a bit of a misdirection. The “identifiers” (sensu non-LOD world) for name-strings are managed by GNI, and for TNUs by GNUB. Both are UUIDs, and as such are pure identifiers (i.e., not actionable by themselves). DOI is one of many possible identifier dereferencing services (ARC is another, and there are a host of others). DOI happens to be a particularly robust and useful dereferencing services, and as such it makes perfect sense to me to represent TNU identifiers as DOIs, as long as someone has the funding to make it happen.
So… to follow on Rod’s example, the TNU representing the “name-object” for the species epithet “vilcabambae”, as originally established in the publication Lehr 2007, is: 4B913B74-E880-4EC9-B0A9-F3AB9F02288B
Alone, that UUID does even less for you than the text-string “Pristimantis vilcabambae” does. However, combining it with a dereferencing service, such as http://zoobank.org/, you can start doing some more interesting things: http://zoobank.org/4B913B74-E880-4EC9-B0A9-F3AB9F02288B
For example, you can get to the original publication as registered in ZooBank (http://zoobank.org/37BFC245-DDD6-4AB4-B4B1-DD6826B86873), which gets you a link to the DOI and the ResearchGate page for this reference. You can also get a link to the GBIF page, ITIS page, EOL page, ION page, and a few others (you’d also get links to the ASW site, if they had continued to expose their internal identifiers; though now it seems that they don’t anymore). You also see a call to BHL’s OpenURL service to “automagically” get the page image of the original description. And you get a resultset from GNI to see links to other datasets.
And that’s all from just ONE metadata dereferencing service (ZooBank). I think it would be WONDERFUL to have this identifier represented within DOI-space as well (e.g., http://dx.doi.org/10.XXXXX/4B913B74-E880-4EC9-B0A9-F3AB9F02288B), but someone needs to step forward as the “XXXXX” domain to mint the DOI. By doing so, not only would you be plugged into the GNA infrastructure (as described above), but also the CrossRef infrastructure and all the whizbang services that it provides. PLAZI and GNA have agreed that a taxon treatment = a TNU, and hence will share the same UUIDs for them (thus opening up the PLAZI services for use with the same identifiers).
In summary, Taxon name-strings, name-objects, concepts (and also classifications) are very different things, with different implied properties, and different implied meanings. GNA is well on its way to serving robust services based on persistent identifiers that are actionable through multiple dereferencing services. Including more dereferencing services (like DOI) is a GOOD thing! Re-using identifiers is a GOOD thing. Unnecessarily re-inventing wheels is NOT a particularly good thing.
Aloha, Rich
P.S The astute among you will have noticed that the GNA cross-links and services (including ZooBank registrations) described above did not exist before I started replying to this email. And that is the POINT. GNA is an INFRASTRUCTURE to allow *US* (we the biodiversity practitioners of the world) to cross-link content. The fact that I was able to use the EXISTING GNA infrastructure to cross-link all these resources associated with the text-string name “Pristimantis vilcabambae” in FAR LESS time than it took me to compose this email message, speaks volumes about the potential that such an infrastructure can have.
From: api-users-bounces@lists.gbif.org [mailto:api-users-bounces@lists.gbif.org] On Behalf Of Roderic Page Sent: Monday, August 18, 2014 3:53 AM To: Rob Guralnick Cc: api-users@lists.gbif.org Subject: Re: [API-users] Is there a GBIF specific LSID that can be used?
Hi Rob,
At the risk of opening the whole taxon/name/concept can of worms, I’d see this a little differently.
For me a taxon name is a name + the original publication, rather than simply a text string. A taxon is different again, being essentially a statement about a collection of things that belong to the same taxon, and a statement of what to call them.
Taxon databases (e.g., GBIF) tend use strings for names, when it would be more elegant to use identifiers for names + publications. We could go some way towards cleaning the mess we’ve accumulated if we adopted (and reused) identifiers for these things. For a start, name strings that don’t map to identifiers in nomenclators would immediately be under suspicion as being potentially erroneous. it also links names to evidence, which is something we’re spectacularly bad at doing at the moment.
For example, "Pristimantis vilcabambae” is a text string which isn’t terribly useful. But if we combine that with details on where and when it was published we get something a bit more useful:
"Pristimantis vilcabambae Lehr 2007 published in DOI http://dx.doi.org/10.3099/0027-4100(2007)159%5B145:NEFLPP%5D2.0.CO;2 “ This is the information I’m accumulating in BioNames, by combining metadata from ION LSIDs with data from CrossRef and BioStor , see http://bionames.org/names/cluster/1949681
Should this "name string + publication” get a DOI? Sure. Then I’d want GBIF (and other taxon databases) to link to this name on their taxon pages. In other words, http://www.gbif.org/species/2425396 should have an identifier for the taxon name, instead of simply using a text string.
I’m beginning to sound like Rich Pyle, and he and I would a lost certainly model these things differently, but name strings <> taxon names <> taxa
Regards
Rod
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com ORCID: http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
On 18 Aug 2014, at 14:29, Robert Guralnick Robert.Guralnick@colorado.edu wrote:
Markus --- I think the answer to the question: "Would a taxon DOI be a valuable feature for you?" really depends on some of the details. With a taxon name, you are putting a DOI on a string and one that has been dissociated from its source(s). I would think more valuable would be a DOI linked to the checklist that contained the name, and maybe a passthrough (a la suffix passthroughs in the EZID system) to the individual name. That way I can resolve that taxon name to the source from whence it came.
Best, Rob
On Mon, Aug 18, 2014 at 3:44 AM, Markus Döring mdoering@gbif.org wrote: Hello Geoff,
GBIF uses simples integers as taxon identifiers, for example 2396049 for Ecsenius bicolor. These ids are stable, but obviously not globally unique. If you need a URI right now I would recommend for now to use our restful portal URL: http://www.gbif.org/species/2396049
For the future I could imagine us assigning DOIs to taxa reusing the current integer ids, but that has to be carefully evaluated first. Would a taxon DOI be a valuable feature for you?
Cheers, Markus
-- Markus Döring Software Developer Global Biodiversity Information Facility (GBIF) mdoering@gbif.org http://www.gbif.org
On 05 Aug 2014, at 05:17, Geoff Shuetrim geoff@galexy.net wrote:
Working with a range of web services, I have found myself making extensive use of the LSIDs that are specific to each data source. For example, for ITIS, I use the Ecsenius bicolor LSID: urn:lsid:itis.gov:itis_tsn:636326
For WoRMS, the LSID for Ecsenius bicolor is: urn:lsid:marinespecies.org:taxname:277652
For Atlas of living Australia the LSID for Ecsenius bicolor is: urn:lsid:biodiversity.org.au:afd.taxon:99c29e7c-5b04-4e57-8e6b-82aa442a801a
Is there a GBIF LSID that can similarly be used as a unique identifier for a taxon? I have come across the various GBIF unique keys but these are not unique outside of the GBIF environment and within the Gaia Guide systems I am deciding how best to work with these, ensuring their uniqueness, alongside identifiers from other data sources.
Thanks again for your assistance.
Geoff Shuetrim Gaia Guide Association http://www.gaiaguide.info/ _______________________________________________ API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
Hi Markus,
Sorry for the long diatribe I agree it was probably not the right forum for it. Back to the original question, speaking from the perspective of API users, I would say:
1) If you are going to expose your internal integer ids (as you do, and as Im very GLAD that you do), make sure that your internal identifiers are as stable as they can be (i.e., dont re-assign them in the future without warning). If you do expose them, no doubt API users will cache them and cross-link to them, and rely on them to be there.
2) If you provide a dereferencing service in the form of [httpURL_prefix]/[internal_integer_identifier] (as you do with the http://www.gbif.org/species/ and http://www.gbif.org/occurrence/ restful portals, and as Im very GLAD that you do), try to persist this service using the same http URL prefix.
3) If you must re-assign internal integer identifiers in the future (for whatever reason), then provide an API that cross-links the old integers to the corresponding new identifier, so people will be able to update their local cached cross-links. Also, avoid using overlapping numbers in this case, so that the [httpURL_prefix]/[internal_integer_identifier] for both old integers and new integers remain globally unique without collision.
I think these are by far the most important three things from the perspective of an API user. I think the addition of support for DOIs in the future would be wonderful as an additional dereferencing service, and I think it would be very valuable and appropriate for GBIF to provide this service to end-users. However, if the DOIs take the form of 10.[GBIFDOIDomain]/[Identifier], then I would STRONGLY recommend that the [Identifier] part itself be globally unique, independently of the 10.[GBIFDOIDomain]. Doing so costs you nothing, and gains you something VERY valuable which is independence of the identifier and the dereferencing service. Such independence means that the global uniqueness of the Identifier (sensu stricto) does not depend on the persistence of the dereferencing service. Such cross-dependence weakens the utility of identifiers, because it means that if EITHER the dereferencing service httpURL Prefix OR the identifier suffix needs to change for some reason, then the GUID dies.
Ill reply in more detail to the rest of your email off-list, as it involves more specifics than are appropriate her; any subscribers interested in those details should ping me and Ill forward it along.
Aloha,
Rich
From: Markus Döring [mailto:mdoering@gbif.org] Sent: Tuesday, August 19, 2014 12:38 AM To: Richard Pyle Cc: Roderic Page; Rob Guralnick; api-users@lists.gbif.org Subject: Re: [API-users] Is there a GBIF specific LSID that can be used?
Hi Rich, Rod & Rob,
thanks for this interesting taxonomic / GNA discussion. It might be a little confusing and boring to GBIF API users, so maybe we continue privately and restrict discussions on this list to GBIF API related topics.
The original question raised was if GBIF provides LSIDs or other globally unique identifiers for GBIF backbone taxa. As we only have local ids now and GBIF will be able to issue DataCite DOIs very soon I wondered if it helps to mint DOIs on top of the local ids to make them globally unique. Any thoughts on this would be appreciated.
A checklist bank id refers to a "name usage" and is similar to a TNU in GNUB I suppose. It identifies a taxon name being used within a certain (taxonomic) dataset and can refer to either an accepted taxon or a synonym. Identifiers are stable over different versions of the backbone, but the exact classification and list of synonyms for an accepted taxon is allowed to change. In the near future I would also like to allow the name string to change in case of misspellings and other small variations.
For concrete implementations it is quite a challenge to come up with a clear definition when a *taxon* identifier should change and when it should remain the same. Would users like to see true taxon concept identifiers for the GBIF backbone that remain stable as long as GBIF regards the taxon still the same whatever scientific name is used as the currently accepted label? If we had better information about types and original name usages (protonyms, basionyms) we could try to assign stable ids to a fixed set of protonyms in the GBIF backbone. Does that sound reasonable?
Cheers,
Markus
On 18 Aug 2014, at 23:20, Richard Pyle deepreef@bishopmuseum.org wrote:
Since Rod opened the can of worms, Ill dig in to it an feast along with the others.
Here is what seven years of NOMINA ( http://globalnames.org/Nomina http://globalnames.org/Nomina) meetings, plus millions of conversations at TDWG, Pro-iBiosphere, ICZN, ICB, iDigBio and many other regional, national, and international conferences, plus millions of dollars of targeted funding from various sources to drive the Global Names initiative .has led us to.
First, the biodiversity informatics realm is full of name-strings. These are strings of text characters, usually encoded as UTF-8, purported to represent taxon names of organisms. They may or may not include authorships, and/or abbreviations, and/or qualifiers of various sorts. These are the things that are indexed in GNI ( http://gni.globalnames.org/ http://gni.globalnames.org)
I completely agree with Rod that a taxon name is much more than just the string of UTF-8 characters used to render it. For clarity of communication (as if that were even possible in these kinds of discussions), I refer to these as name objects. They are conceptual (abstract) constructs, and are uniquely represented by a rich suite of metadata (publication metadata in which the name was originally established in accordance with a nomenclatural Code, authorship metadata, type specimen or type taxon metadata, etc.). A single taxon name might be represented via different name-strings (e.g., different alternate spellings, different genus combinations, etc.), and a single name-string might be applied to different name-objects (homonyms & homographs).
And, again, I completely agree with Rod that a taxon (=taxon concept, = taxonomic circumscription) is something else it is another conceptual (abstract) construct, typically represented by a broader collection of metadata, including things like included child taxa, included synonym taxa, biological characters, and possibly other stuff such as geographic distribution. A single taxon might have more than one taxon name applied to it (synonyms), and a single taxon name (in the name-object sense, not just the name-string sense) might have been used to represent different taxon concepts (e.g., sensu stricto vs. sensu lato senses of the same name-object). The most practical way to refer to a taxon is the combination of a name-object (as described above), plus usage instance, e.g. Aus bus Linnaeus 1758 sec. Pyle 2014 (the part before the sec. represents the name-object, and the part after the sec. refers to the specific usage instance that applies the name-object to a taxon concept).
Classifications (per se) are a little bit different, but are often included in the taxon concept space, even though they are technically not (logically) part of the taxon concept. The taxon concept is really the circumscribed set of organisms included within the concept. Changing the higher classification, by itself, has no impact on the circumscribed set of organisms included within the concept. However, thats a topic for another can-of-worms discussion.
So . The seven years of NOMINA meetings, millions of conversations and millions of dollars has revealed that the notion of a Taxon Name Usage instances (TNU), as indexed in the Global Names Usage Bank (GNUB), is an extremely powerful unit that addresses taxon names (name-objects), taxon concepts, and classifications; all with a single domain of identifiers (minted for TNUs). Rob Whitton and I have functioning prototypes that demonstrate the power of TNUs for managing nomenclatural, taxonomic, and classification data; and we just last week submitted a proposal to NSF to expand these prototypes into full-function services.
The seven years and millions of conversations and dollars has also taught us that the most practical way to manage this information in biodiversity informatics-land is through two nodes: a dirty bucket (GNI name-strings), and a clean bucket (GNUB). Dima Mozzherin has new funding from NSF to begin developing the service workflows to bridge name-strings (as they exist in most biodiversity databases) to Protonyms (the subset of TNUs that represent name-objects). Starting in October, we will begin to bridge our respective prototypes (funded by NSF through the Global Names project) into a seamless tool. We hope to have something more meaningful to say about this at TDWG; but one of the key things to keep in mind is that GNA (which includes GNI & GNUB) are low-level cross-linking tools and services NOT replacements for CoL, ITIS, EOL, GBIF, WoRMS, NCBI taxonomy, etc., etc., etc. These other initiatives provide the information that end-users actually want. The role of GNA is to provide a core infrastructure (analogous to DNS) that most people use every day without ever knowing it.
The DOI thing is a bit of a misdirection. The identifiers (sensu non-LOD world) for name-strings are managed by GNI, and for TNUs by GNUB. Both are UUIDs, and as such are pure identifiers (i.e., not actionable by themselves). DOI is one of many possible identifier dereferencing services (ARC is another, and there are a host of others). DOI happens to be a particularly robust and useful dereferencing services, and as such it makes perfect sense to me to represent TNU identifiers as DOIs, as long as someone has the funding to make it happen.
So to follow on Rods example, the TNU representing the name-object for the species epithet vilcabambae, as originally established in the publication Lehr 2007, is:
4B913B74-E880-4EC9-B0A9-F3AB9F02288B
Alone, that UUID does even less for you than the text-string Pristimantis vilcabambae does. However, combining it with a dereferencing service, such as http://zoobank.org/ http://zoobank.org/, you can start doing some more interesting things:
http://zoobank.org/4B913B74-E880-4EC9-B0A9-F3AB9F02288B http://zoobank.org/4B913B74-E880-4EC9-B0A9-F3AB9F02288B
For example, you can get to the original publication as registered in ZooBank ( http://zoobank.org/37BFC245-DDD6-4AB4-B4B1-DD6826B86873 http://zoobank.org/37BFC245-DDD6-4AB4-B4B1-DD6826B86873), which gets you a link to the DOI and the ResearchGate page for this reference. You can also get a link to the GBIF page, ITIS page, EOL page, ION page, and a few others (youd also get links to the ASW site, if they had continued to expose their internal identifiers; though now it seems that they dont anymore). You also see a call to BHLs OpenURL service to automagically get the page image of the original description. And you get a resultset from GNI to see links to other datasets.
And thats all from just ONE metadata dereferencing service (ZooBank). I think it would be WONDERFUL to have this identifier represented within DOI-space as well (e.g., http://dx.doi.org/10.XXXXX/4B913B74-E880-4EC9-B0A9-F3AB9F02288B http://dx.doi.org/10.XXXXX/4B913B74-E880-4EC9-B0A9-F3AB9F02288B), but someone needs to step forward as the XXXXX domain to mint the DOI. By doing so, not only would you be plugged into the GNA infrastructure (as described above), but also the CrossRef infrastructure and all the whizbang services that it provides. PLAZI and GNA have agreed that a taxon treatment = a TNU, and hence will share the same UUIDs for them (thus opening up the PLAZI services for use with the same identifiers).
In summary, Taxon name-strings, name-objects, concepts (and also classifications) are very different things, with different implied properties, and different implied meanings. GNA is well on its way to serving robust services based on persistent identifiers that are actionable through multiple dereferencing services. Including more dereferencing services (like DOI) is a GOOD thing! Re-using identifiers is a GOOD thing. Unnecessarily re-inventing wheels is NOT a particularly good thing.
Aloha,
Rich
P.S The astute among you will have noticed that the GNA cross-links and services (including ZooBank registrations) described above did not exist before I started replying to this email. And that is the POINT. GNA is an INFRASTRUCTURE to allow *US* (we the biodiversity practitioners of the world) to cross-link content. The fact that I was able to use the EXISTING GNA infrastructure to cross-link all these resources associated with the text-string name Pristimantis vilcabambae in FAR LESS time than it took me to compose this email message, speaks volumes about the potential that such an infrastructure can have.
From: api-users-bounces@lists.gbif.org [mailto:api-users-bounces@lists.gbif.org] On Behalf Of Roderic Page Sent: Monday, August 18, 2014 3:53 AM To: Rob Guralnick Cc: api-users@lists.gbif.org Subject: Re: [API-users] Is there a GBIF specific LSID that can be used?
Hi Rob,
At the risk of opening the whole taxon/name/concept can of worms, Id see this a little differently.
For me a taxon name is a name + the original publication, rather than simply a text string. A taxon is different again, being essentially a statement about a collection of things that belong to the same taxon, and a statement of what to call them.
Taxon databases (e.g., GBIF) tend use strings for names, when it would be more elegant to use identifiers for names + publications. We could go some way towards cleaning the mess weve accumulated if we adopted (and reused) identifiers for these things. For a start, name strings that dont map to identifiers in nomenclators would immediately be under suspicion as being potentially erroneous. it also links names to evidence, which is something were spectacularly bad at doing at the moment.
For example, "Pristimantis vilcabambae is a text string which isnt terribly useful. But if we combine that with details on where and when it was published we get something a bit more useful:
"Pristimantis vilcabambae Lehr 2007 published in DOI http://dx.doi.org/ http://dx.doi.org/10.3099/0027-4100(2007)159%5B145:NEFLPP%5D2.0.CO;2 This is the information Im accumulating in BioNames, by combining metadata from ION LSIDs with data from CrossRef and BioStor , see http://bionames.org/names/cluster/1949681 http://bionames.org/names/cluster/1949681
Should this "name string + publication get a DOI? Sure. Then Id want GBIF (and other taxon databases) to link to this name on their taxon pages. In other words, http://www.gbif.org/species/2425396 http://www.gbif.org/species/2425396 should have an identifier for the taxon name, instead of simply using a text string.
Im beginning to sound like Rich Pyle, and he and I would a lost certainly model these things differently, but name strings <> taxon names <> taxa
Regards
Rod
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: mailto:Roderic.Page@glasgow.ac.uk Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com/ http://iphylo.blogspot.com ORCID: http://orcid.org/0000-0002-7101-9767 http://orcid.org/0000-0002-7101-9767
Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
On 18 Aug 2014, at 14:29, Robert Guralnick < mailto:Robert.Guralnick@colorado.edu Robert.Guralnick@colorado.edu> wrote:
Markus --- I think the answer to the question: "Would a taxon DOI be a valuable feature for you?" really depends on some of the details. With a taxon name, you are putting a DOI on a string and one that has been dissociated from its source(s). I would think more valuable would be a DOI linked to the checklist that contained the name, and maybe a passthrough (a la suffix passthroughs in the EZID system) to the individual name. That way I can resolve that taxon name to the source from whence it came.
Best, Rob
On Mon, Aug 18, 2014 at 3:44 AM, Markus Döring < mailto:mdoering@gbif.org mdoering@gbif.org> wrote:
Hello Geoff,
GBIF uses simples integers as taxon identifiers, for example 2396049 for Ecsenius bicolor. These ids are stable, but obviously not globally unique. If you need a URI right now I would recommend for now to use our restful portal URL: http://www.gbif.org/species/2396049 http://www.gbif.org/species/2396049
For the future I could imagine us assigning DOIs to taxa reusing the current integer ids, but that has to be carefully evaluated first. Would a taxon DOI be a valuable feature for you?
Cheers, Markus
-- Markus Döring Software Developer Global Biodiversity Information Facility (GBIF) mailto:mdoering@gbif.org mdoering@gbif.org http://www.gbif.org/ http://www.gbif.org
On 05 Aug 2014, at 05:17, Geoff Shuetrim < mailto:geoff@galexy.net geoff@galexy.net> wrote:
Working with a range of web services, I have found myself making extensive
use of the LSIDs that are specific to each data source. For example, for ITIS, I use the Ecsenius bicolor LSID: urn:lsid:itis.gov:itis_tsn:636326
For WoRMS, the LSID for Ecsenius bicolor is:
urn:lsid:marinespecies.org:taxname:277652
For Atlas of living Australia the LSID for Ecsenius bicolor is:
urn:lsid:biodiversity.org.au:afd.taxon:99c29e7c-5b04-4e57-8e6b-82aa442a801a
Is there a GBIF LSID that can similarly be used as a unique identifier for
a taxon? I have come across the various GBIF unique keys but these are not unique outside of the GBIF environment and within the Gaia Guide systems I am deciding how best to work with these, ensuring their uniqueness, alongside identifiers from other data sources.
Thanks again for your assistance.
Geoff Shuetrim Gaia Guide Association http://www.gaiaguide.info/ http://www.gaiaguide.info/ _______________________________________________ API-users mailing list mailto:API-users@lists.gbif.org API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
http://lists.gbif.org/mailman/listinfo/api-users
_______________________________________________ API-users mailing list mailto:API-users@lists.gbif.org API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users http://lists.gbif.org/mailman/listinfo/api-users
_______________________________________________ API-users mailing list mailto:API-users@lists.gbif.org API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users http://lists.gbif.org/mailman/listinfo/api-users
_______________________________________________ API-users mailing list mailto:API-users@lists.gbif.org API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users http://lists.gbif.org/mailman/listinfo/api-users
Hi Markus
"If we had better information about types and original name usages (protonyms, basionyms) we could try to assign stable ids to a fixed set of protonyms in the GBIF backbone. Does that sound reasonable?"
This stable id for a protonym or more generally treatments of any sorts of taxonomic name usages is what Plazi is supplying and minting a stable http URI. This is also, what we supply you with the DWC-A for observation records from the literature.
The treatment also provides metadata that, among other resolves to the source article.
Right now, we make sure that all those article links are DOIs, either a Cross Ref DOI or a Data Cite DOI minted through the Biodiversity Literature Repository.
The treatment httpURI is also as much as possible linked with URI minted by Zoobank, and coordinated with Pensoft journals, and others, if they want to.
This is relevant, since all the protonyms will hopefully will be included in Zoobank.
We are working on a RDF representation of the treatments that should be available at around TDWG this year.
Since a taxononmic name usage is linked to a particular place in a publictation, normally given by the page number complementing the bibliographic reference, we do want to provide this resolution. Really, we want to provide the content, and thus we extract the treatments and make them for legacy data accessible which also allows us to provide you with materialscitations that can be linked back to the treatment through its UI, or in prospective publictions using taxpub, that has a treatment element (see the Pensoft publications).
So, you could provide a link to the treatments in plazi, which you already have for all those TNU we supply.
Donat
From: api-users-bounces@lists.gbif.org [mailto:api-users-bounces@lists.gbif.org] On Behalf Of Markus Döring Sent: Tuesday, August 19, 2014 12:38 PM To: Richard Pyle Cc: api-users@lists.gbif.org Subject: Re: [API-users] Is there a GBIF specific LSID that can be used?
Hi Rich, Rod & Rob,
thanks for this interesting taxonomic / GNA discussion. It might be a little confusing and boring to GBIF API users, so maybe we continue privately and restrict discussions on this list to GBIF API related topics.
The original question raised was if GBIF provides LSIDs or other globally unique identifiers for GBIF backbone taxa. As we only have local ids now and GBIF will be able to issue DataCite DOIs very soon I wondered if it helps to mint DOIs on top of the local ids to make them globally unique. Any thoughts on this would be appreciated.
A checklist bank id refers to a "name usage" and is similar to a TNU in GNUB I suppose. It identifies a taxon name being used within a certain (taxonomic) dataset and can refer to either an accepted taxon or a synonym. Identifiers are stable over different versions of the backbone, but the exact classification and list of synonyms for an accepted taxon is allowed to change. In the near future I would also like to allow the name string to change in case of misspellings and other small variations.
For concrete implementations it is quite a challenge to come up with a clear definition when a *taxon* identifier should change and when it should remain the same. Would users like to see true taxon concept identifiers for the GBIF backbone that remain stable as long as GBIF regards the taxon still the same whatever scientific name is used as the currently accepted label? If we had better information about types and original name usages (protonyms, basionyms) we could try to assign stable ids to a fixed set of protonyms in the GBIF backbone. Does that sound reasonable?
Cheers, Markus
On 18 Aug 2014, at 23:20, Richard Pyle <deepreef@bishopmuseum.orgmailto:deepreef@bishopmuseum.org> wrote:
Since Rod opened the can of worms, I'll dig in to it an feast along with the others.
Here is what seven years of NOMINA (http://globalnames.org/Nomina) meetings, plus millions of conversations at TDWG, Pro-iBiosphere, ICZN, ICB, iDigBio and many other regional, national, and international conferences, plus millions of dollars of targeted funding from various sources to drive the Global Names initiative....has led us to.
First, the biodiversity informatics realm is full of name-strings. These are strings of text characters, usually encoded as UTF-8, purported to represent taxon names of organisms. They may or may not include authorships, and/or abbreviations, and/or qualifiers of various sorts. These are the things that are indexed in GNI (http://gni.globalnames.orghttp://gni.globalnames.org/)
I completely agree with Rod that a "taxon name" is much more than just the string of UTF-8 characters used to render it. For clarity of communication (as if that were even possible in these kinds of discussions), I refer to these as "name objects". They are conceptual (abstract) constructs, and are uniquely represented by a rich suite of metadata (publication metadata in which the name was originally established in accordance with a nomenclatural Code, authorship metadata, type specimen or type taxon metadata, etc.). A single taxon name might be represented via different name-strings (e.g., different alternate spellings, different genus combinations, etc.), and a single name-string might be applied to different name-objects (homonyms & homographs).
And, again, I completely agree with Rod that a "taxon" (=taxon concept, = taxonomic circumscription) is something else - it is another conceptual (abstract) construct, typically represented by a broader collection of metadata, including things like included child taxa, included synonym taxa, biological characters, and possibly other stuff such as geographic distribution. A single taxon might have more than one taxon name applied to it (synonyms), and a single taxon name (in the name-object sense, not just the name-string sense) might have been used to represent different taxon concepts (e.g., sensu stricto vs. sensu lato senses of the same name-object). The most practical way to refer to a taxon is the combination of a name-object (as described above), plus usage instance, e.g. "Aus bus Linnaeus 1758 sec. Pyle 2014" (the part before the "sec." represents the name-object, and the part after the "sec." refers to the specific usage instance that applies the name-object to a taxon concept).
Classifications (per se) are a little bit different, but are often included in the taxon concept space, even though they are technically not (logically) part of the taxon concept. The taxon concept is really the circumscribed set of organisms included within the concept. Changing the higher classification, by itself, has no impact on the circumscribed set of organisms included within the concept. However, that's a topic for another can-of-worms discussion.
So.... The seven years of NOMINA meetings, millions of conversations and millions of dollars has revealed that the notion of a "Taxon Name Usage" instances (TNU), as indexed in the Global Names Usage Bank (GNUB), is an extremely powerful unit that addresses taxon names (name-objects), taxon concepts, and classifications; all with a single domain of identifiers (minted for TNUs). Rob Whitton and I have functioning prototypes that demonstrate the power of TNUs for managing nomenclatural, taxonomic, and classification data; and we just last week submitted a proposal to NSF to expand these prototypes into full-function services.
The seven years and millions of conversations and dollars has also taught us that the most practical way to manage this information in biodiversity informatics-land is through two nodes: a "dirty bucket" (GNI name-strings), and a "clean bucket" (GNUB). Dima Mozzherin has new funding from NSF to begin developing the service workflows to bridge name-strings (as they exist in most biodiversity databases) to Protonyms (the subset of TNUs that represent name-objects). Starting in October, we will begin to bridge our respective prototypes (funded by NSF through the Global Names project) into a seamless tool. We hope to have something more meaningful to say about this at TDWG; but one of the key things to keep in mind is that GNA (which includes GNI & GNUB) are low-level cross-linking tools and services - NOT replacements for CoL, ITIS, EOL, GBIF, WoRMS, NCBI taxonomy, etc., etc., etc. These other initiatives provide the information that end-users actually want. The role of GNA is to provide a core infrastructure (analogous to DNS) that most people use every day without ever knowing it.
The DOI thing is a bit of a misdirection. The "identifiers" (sensu non-LOD world) for name-strings are managed by GNI, and for TNUs by GNUB. Both are UUIDs, and as such are pure identifiers (i.e., not actionable by themselves). DOI is one of many possible identifier dereferencing services (ARC is another, and there are a host of others). DOI happens to be a particularly robust and useful dereferencing services, and as such it makes perfect sense to me to represent TNU identifiers as DOIs, as long as someone has the funding to make it happen.
So... to follow on Rod's example, the TNU representing the "name-object" for the species epithet "vilcabambae", as originally established in the publication Lehr 2007, is: 4B913B74-E880-4EC9-B0A9-F3AB9F02288B
Alone, that UUID does even less for you than the text-string "Pristimantis vilcabambae" does. However, combining it with a dereferencing service, such as http://zoobank.org/, you can start doing some more interesting things: http://zoobank.org/4B913B74-E880-4EC9-B0A9-F3AB9F02288B
For example, you can get to the original publication as registered in ZooBank (http://zoobank.org/37BFC245-DDD6-4AB4-B4B1-DD6826B86873), which gets you a link to the DOI and the ResearchGate page for this reference. You can also get a link to the GBIF page, ITIS page, EOL page, ION page, and a few others (you'd also get links to the ASW site, if they had continued to expose their internal identifiers; though now it seems that they don't anymore). You also see a call to BHL's OpenURL service to "automagically" get the page image of the original description. And you get a resultset from GNI to see links to other datasets.
And that's all from just ONE metadata dereferencing service (ZooBank). I think it would be WONDERFUL to have this identifier represented within DOI-space as well (e.g., http://dx.doi.org/10.XXXXX/4B913B74-E880-4EC9-B0A9-F3AB9F02288B), but someone needs to step forward as the "XXXXX" domain to mint the DOI. By doing so, not only would you be plugged into the GNA infrastructure (as described above), but also the CrossRef infrastructure and all the whizbang services that it provides. PLAZI and GNA have agreed that a taxon treatment = a TNU, and hence will share the same UUIDs for them (thus opening up the PLAZI services for use with the same identifiers).
In summary, Taxon name-strings, name-objects, concepts (and also classifications) are very different things, with different implied properties, and different implied meanings. GNA is well on its way to serving robust services based on persistent identifiers that are actionable through multiple dereferencing services. Including more dereferencing services (like DOI) is a GOOD thing! Re-using identifiers is a GOOD thing. Unnecessarily re-inventing wheels is NOT a particularly good thing.
Aloha, Rich
P.S The astute among you will have noticed that the GNA cross-links and services (including ZooBank registrations) described above did not exist before I started replying to this email. And that is the POINT. GNA is an INFRASTRUCTURE to allow *US* (we the biodiversity practitioners of the world) to cross-link content. The fact that I was able to use the EXISTING GNA infrastructure to cross-link all these resources associated with the text-string name "Pristimantis vilcabambae" in FAR LESS time than it took me to compose this email message, speaks volumes about the potential that such an infrastructure can have.
From: api-users-bounces@lists.gbif.orgmailto:api-users-bounces@lists.gbif.org [mailto:api-users-bounces@lists.gbif.org] On Behalf Of Roderic Page Sent: Monday, August 18, 2014 3:53 AM To: Rob Guralnick Cc: api-users@lists.gbif.orgmailto:api-users@lists.gbif.org Subject: Re: [API-users] Is there a GBIF specific LSID that can be used?
Hi Rob,
At the risk of opening the whole taxon/name/concept can of worms, I'd see this a little differently.
For me a taxon name is a name + the original publication, rather than simply a text string. A taxon is different again, being essentially a statement about a collection of things that belong to the same taxon, and a statement of what to call them.
Taxon databases (e.g., GBIF) tend use strings for names, when it would be more elegant to use identifiers for names + publications. We could go some way towards cleaning the mess we've accumulated if we adopted (and reused) identifiers for these things. For a start, name strings that don't map to identifiers in nomenclators would immediately be under suspicion as being potentially erroneous. it also links names to evidence, which is something we're spectacularly bad at doing at the moment.
For example, "Pristimantis vilcabambae" is a text string which isn't terribly useful. But if we combine that with details on where and when it was published we get something a bit more useful:
"Pristimantis vilcabambae Lehr 2007 published in DOI http://dx.doi.org/10.3099/0027-4100(2007)159%5B145:NEFLPP%5D2.0.CO;2 " This is the information I'm accumulating in BioNames, by combining metadata from ION LSIDs with data from CrossRef and BioStor , see http://bionames.org/names/cluster/1949681
Should this "name string + publication" get a DOI? Sure. Then I'd want GBIF (and other taxon databases) to link to this name on their taxon pages. In other words, http://www.gbif.org/species/2425396 should have an identifier for the taxon name, instead of simply using a text string.
I'm beginning to sound like Rich Pyle, and he and I would a lost certainly model these things differently, but name strings <> taxon names <> taxa
Regards
Rod
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: Roderic.Page@glasgow.ac.ukmailto:Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.comhttp://iphylo.blogspot.com/ ORCID: http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
On 18 Aug 2014, at 14:29, Robert Guralnick <Robert.Guralnick@colorado.edumailto:Robert.Guralnick@colorado.edu> wrote:
Markus --- I think the answer to the question: "Would a taxon DOI be a valuable feature for you?" really depends on some of the details. With a taxon name, you are putting a DOI on a string and one that has been dissociated from its source(s). I would think more valuable would be a DOI linked to the checklist that contained the name, and maybe a passthrough (a la suffix passthroughs in the EZID system) to the individual name. That way I can resolve that taxon name to the source from whence it came.
Best, Rob
On Mon, Aug 18, 2014 at 3:44 AM, Markus Döring <mdoering@gbif.orgmailto:mdoering@gbif.org> wrote: Hello Geoff,
GBIF uses simples integers as taxon identifiers, for example 2396049 for Ecsenius bicolor. These ids are stable, but obviously not globally unique. If you need a URI right now I would recommend for now to use our restful portal URL: http://www.gbif.org/species/2396049
For the future I could imagine us assigning DOIs to taxa reusing the current integer ids, but that has to be carefully evaluated first. Would a taxon DOI be a valuable feature for you?
Cheers, Markus
-- Markus Döring Software Developer Global Biodiversity Information Facility (GBIF) mdoering@gbif.orgmailto:mdoering@gbif.org http://www.gbif.orghttp://www.gbif.org/
On 05 Aug 2014, at 05:17, Geoff Shuetrim <geoff@galexy.netmailto:geoff@galexy.net> wrote:
Working with a range of web services, I have found myself making extensive use of the LSIDs that are specific to each data source. For example, for ITIS, I use the Ecsenius bicolor LSID: urn:lsid:itis.gov:itis_tsn:636326
For WoRMS, the LSID for Ecsenius bicolor is: urn:lsid:marinespecies.org:taxname:277652
For Atlas of living Australia the LSID for Ecsenius bicolor is: urn:lsid:biodiversity.org.au:afd.taxon:99c29e7c-5b04-4e57-8e6b-82aa442a801a
Is there a GBIF LSID that can similarly be used as a unique identifier for a taxon? I have come across the various GBIF unique keys but these are not unique outside of the GBIF environment and within the Gaia Guide systems I am deciding how best to work with these, ensuring their uniqueness, alongside identifiers from other data sources.
Thanks again for your assistance.
Geoff Shuetrim Gaia Guide Association http://www.gaiaguide.info/ _______________________________________________ API-users mailing list API-users@lists.gbif.orgmailto:API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
_______________________________________________ API-users mailing list API-users@lists.gbif.orgmailto:API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
_______________________________________________ API-users mailing list API-users@lists.gbif.orgmailto:API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
_______________________________________________ API-users mailing list API-users@lists.gbif.orgmailto:API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
Hi Markus and others:
Apologies for my delay in responding to the questions raised in this thread. Many thanks everyone for the considered input on the issue.
With regard to Markus' various questions about whether to create a globally unique system of GBIF identifiers for taxa, I started trying to give answers based on how I hope to interact with the GBIF system and found myself having to review many parts of this thread to give a considered answer. Perhaps the best response I can give is to explain how I depend on taxon identifiers in external systems.
My system includes documentation of a set of taxa. That documentation references external sources, including resources like WoRMS, ITIS, and Atlas of Living Australia. I need to:
1. Know that the external sources are all describing the taxon documented by my system. 2. Know how to ask the web service exposed by each external source for updated information about the taxon including what name that external source deems to be valid for the taxon.
Currently my system starts with a kingdom, a taxonomic rank and a name for a taxon (ideally that would also include a reference to a paper but it does not for now) and uses that information to test whether there is a matching taxon in each external system. If I get a match I can be fairly confident that I have found a new external source pertinent to the taxon in my system. Each taxon in my system has a bunch of identifiers, a local one, that happens to have an LSID syntax, and external ones that are either LSIDs or IDs local to external systems that have been made global by adding a prefix to create a globally unique identifier with a LSID syntax.
My system regularly reviews the external identifiers associated with each locally documented taxon to determine if they identify a taxon with a valid name, and the same name/rank as the locally documented taxon. External identifiers indicated to be associated with invalid names are replaced. When external services indicate that a replacement is not possible, because the taxon has been marked as obsolete, then my system is update to reflect that change (somewhat more manually).
I hope that perspective is useful,
Regards
Geoff Shuetrim
participants (6)
-
Donat Agosti
-
Geoff Shuetrim
-
Markus Döring
-
Richard Pyle
-
Robert Guralnick
-
Roderic Page