Re: [API-users] Some questions from a begginer

15 Sep 2015

      Hi Alex,

Absolutely. I think part of the challenge is to show that adding data to aggregators can yield real, tangible benefits. Part of the problem is that such benefits are often not obvious, or indeed, available. If aggregators could offer richer ways of augmenting data (e.g., “oh, I see you’ve added these specimens, did you know that they’ve been cited in these papers, are vouchers for these sequences. that these taxonomic names have changed recently, etc.) then I think might might also help encourage people to see value in doing this, rather than simply checking the box marked “send data to aggregator”.

Regards

Rod

---------------------------------------------------------
Roderic Page
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine
College of Medical, Veterinary and Life Sciences
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email:  Roderic.Page@glasgow.ac.uk<mailto:Roderic.Page@glasgow.ac.uk>
Tel:  +44 141 330 4778
Skype:  rdmpage
Facebook:  http://www.facebook.com/rdmpage
LinkedIn:  http://uk.linkedin.com/in/rdmpage
Twitter:  http://twitter.com/rdmpage
Blog:  http://iphylo.blogspot.com
ORCID:  http://orcid.org/0000-0002-7101-9767
Citations:  http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
ResearchGate https://www.researchgate.net/profile/Roderic_Page

On 15 Sep 2015, at 15:28, Alex Thompson <godfoder@acis.ufl.edu<mailto:godfoder@acis.ufl.edu>> wrote:

It seems insane to us on the aggregator / technical side of things, but we get a ton of collections people telling us at iDigBio "We've never been able to look at our data on a map before." Or they do know how to use mapping tools, but they've only ever looked at a handful of points at once, and never the thousand or so that form the X=Y straight line.

Part of the solution to that we're trying to get across is to really get providers to come to iDigBio and interact with their own data in our systems. This is both because we want feedback on how it comes out, and because we're trying to build a sense of engagement with the process. We (as a community) desperately need collections to see data publishing as less "throwing data over the fence" and more of a collaborative effort with aggregators.

People have been criticizing aggregated biodiversity data quality for years, but its rare that I meet anyone who views themselves as part of the solution. Its just an impediment to getting the "real work" done.

- Alex

On 09/15/2015 10:04 AM, Roderic Page wrote:
Hi Eduardo,

I thought that the feedback about data improvement should be sent directly to the data provider but, please, if there is something else let me know.

In an ideal world, yes the feedback should go to the data provider, things get fixed, then GBIF gets updated. However, data providers don’t always have the resources to fix things. I’m also interested in how many of the data issues that come up are things that GBIF itself can detect and flag. In my experience, there are issues that the provider was unaware of, but become apparent once the data is exposed by GBIF.

For example, here’s a case of a data set supplied to GBIF with a serious error https://github.com/ttu-vertnet/ttu-mammals/issues/12  This was obvious in GBIF simply by looking at the map, but apparently not to the data provider (this error has now been fixed).

The more we know about the sort of errors that can happen, the better placed we are to develop tools to catch them.

Regards

Rod

---------------------------------------------------------
Roderic Page
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine
College of Medical, Veterinary and Life Sciences
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email:  Roderic.Page@glasgow.ac.uk<mailto:Roderic.Page@glasgow.ac.uk>
Tel:  +44 141 330 4778
Skype:  rdmpage
Facebook:  http://www.facebook.com/rdmpage
LinkedIn:  http://uk.linkedin.com/in/rdmpage
Twitter:  http://twitter.com/rdmpage
Blog:  http://iphylo.blogspot.com<http://iphylo.blogspot.com/>
ORCID:  http://orcid.org/0000-0002-7101-9767
Citations:  http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
ResearchGate https://www.researchgate.net/profile/Roderic_Page

On 15 Sep 2015, at 14:53, Eduardo Dalcin <edalcin@jbrj.org<mailto:edalcin@jbrj.org>> wrote:

Hi Rod,

As you saw in the other message, the main problem that we have now is have the same voucher represented twice because NYBG had a DIGIR source and now have an IPT source. People at NYBG said that they ask GBIF to remove DIGIR, but still there. Maybe it occurs with other sources as well.

Related with the feedback of the data cleaning process I'm indeed interested in this discussion, but I'm not sure if this list is the best forum to do it.

Here at the National Center for Flora Conservation - CNCFlora, at the risk assessments, we just use occurrences that were validated by experts, taxonomically and spatially. This information may be useful, especially if the expert made some correction or comment on the occurrence. I can see that this is related with annotation initiatives, such as AnnoSys and FilteredPush. In my ideal and fantastic world, we would have an annotation feature on GBIF occurrences, where experts can interact with the material. In our Virtual Herbarium of Repatriated Plants<https://mailtrack.io/trace/link/b2036a078664eab467d602e1f1513c7641fadf73?url=http%3A%2F%2Fwww.herbariovirtualreflora.jbrj.gov.br%2Fjabot%2FherbarioVirtual%2FConsultaPublicoHVUC%2FResultadoDaConsultaNovaConsulta.do%3Flingua%3Den&signature=7efc3ae92fb5b099>, the experts can suggest new names if they have a login.

However, what is usual is the duplication of efforts for georeferencing the legacy occurrences. For example, different efforts, methodologies and uncertainty levels have been applied in different duplicates of the same occurrence, held by different herbaria.

I thought that the feedback about data improvement should be sent directly to the data provider but, please, if there is something else let me know.

Cheers,

Eduardo

--------------------------------
Eduardo Dalcin<https://mailtrack.io/trace/link/a5d3cb382ef00884ad61ce9e38743772edafd567?url=http%3A%2F%2Feduardo.dalc.in&signature=d9152b1fbbf0db39>
Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
e-mail: edalcin@jbrj.gov.br<mailto:edalcin@jbrj.gov.br>
Trabalho / Work: +55 21 3204 2116
--------------------------------
e-mail alternativo /  alternate email: edalcin@jbrj.org<mailto:edalcin@jbrj.org>
--------------------------------
Agendar reunião / Schedule a meeting: http://agendar.dalc.in<https://mailtrack.io/trace/link/3639d653caa48a1efeb08d1c342b7ffd0f5bd30b?url=http%3A%2F%2Fagendar.dalc.in&signature=07f7b0c516192bcd>

On Mon, Sep 14, 2015 at 1:50 PM, Roderic Page <Roderic.Page@glasgow.ac.uk<mailto:Roderic.Page@glasgow.ac.uk>> wrote:
Hi Eduardo,

it would be interesting to have example of the kinds of problems you encounter with GBIF data, so that we can look at was to fix the problems. It would also be interesting to know whether you would be able to provide GBIF with the corrections you make to GBIF data. It seems clear that lots of people are cleaning data in their own projects, but that doesn’t filter back to GBIF.

Regards

Rod

---------------------------------------------------------
Roderic Page
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine
College of Medical, Veterinary and Life Sciences
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email:  Roderic.Page@glasgow.ac.uk<mailto:Roderic.Page@glasgow.ac.uk>
Tel:  +44 141 330 4778<tel:%2B44%20141%20330%204778>
Skype:  rdmpage
Facebook:  http://www.facebook.com/rdmpage
LinkedIn:  http://uk.linkedin.com/in/rdmpage
Twitter:  http://twitter.com/rdmpage
Blog:  http://iphylo.blogspot.com<http://iphylo.blogspot.com/>
ORCID:  http://orcid.org/0000-0002-7101-9767
Citations:  http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
ResearchGate https://www.researchgate.net/profile/Roderic_Page

On 14 Sep 2015, at 17:34, Eduardo Dalcin <edalcin@jbrj.org<mailto:edalcin@jbrj.org>> wrote:

The problem with these tools (LontraHarvest, OpenRefine, etc.) is that they are just data *retrieval* tools, not providing for data analytical and representation functionalities

Mauro, for me this is a blessing! :)

At CNC Flora workflow, the data from GBIF is useless the way it is, because it have to be validated first, taxonomically and spatially. Only after the process of the cleaning, georeferencing and validation with the expert, the data will be analyzed to take part of the risk assessment.

Cheers

Eduardo

_______________________________________________
API-users mailing list
API-users@lists.gbif.org<mailto:API-users@lists.gbif.org>
http://lists.gbif.org/mailman/listinfo/api-users

_______________________________________________
API-users mailing list
API-users@lists.gbif.org<mailto:API-users@lists.gbif.org>
http://lists.gbif.org/mailman/listinfo/api-users

_______________________________________________
API-users mailing list
API-users@lists.gbif.org<mailto:API-users@lists.gbif.org>
http://lists.gbif.org/mailman/listinfo/api-users