Hi Eduardo,


I thought that the feedback about data improvement should be sent directly to the data provider but, please, if there is something else let me know.

In an ideal world, yes the feedback should go to the data provider, things get fixed, then GBIF gets updated. However, data providers don’t always have the resources to fix things. I’m also interested in how many of the data issues that come up are things that GBIF itself can detect and flag. In my experience, there are issues that the provider was unaware of, but become apparent once the data is exposed by GBIF. 

For example, here’s a case of a data set supplied to GBIF with a serious error https://github.com/ttu-vertnet/ttu-mammals/issues/12  This was obvious in GBIF simply by looking at the map, but apparently not to the data provider (this error has now been fixed).

The more we know about the sort of errors that can happen, the better placed we are to develop tools to catch them.

Regards

Rod

---------------------------------------------------------
Roderic Page
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine
College of Medical, Veterinary and Life Sciences
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email:  Roderic.Page@glasgow.ac.uk
Tel:  +44 141 330 4778
Skype:  rdmpage
Facebook:  http://www.facebook.com/rdmpage
LinkedIn:  http://uk.linkedin.com/in/rdmpage
Twitter:  http://twitter.com/rdmpage
Blog:  http://iphylo.blogspot.com
ORCID:  http://orcid.org/0000-0002-7101-9767

On 15 Sep 2015, at 14:53, Eduardo Dalcin <edalcin@jbrj.org> wrote:

​​
Hi Rod,

As you saw in the other message, the main problem that we have now is have the same voucher represented twice because NYBG had a DIGIR source and now have an IPT source. People at NYBG said that they ask GBIF to remove DIGIR, but still there. Maybe it occurs with other sources as well.

Related with the feedback of the data cleaning process I'm indeed interested in this discussion, but I'm not sure if this list is the best forum to do it.

Here at the National Center for Flora Conservation - CNCFlora, at the risk assessments, we just use occurrences that were validated by experts, taxonomically and spatially. This information may be useful, especially if the expert made some correction or comment on the occurrence. I can see that this is related with annotation initiatives, such as AnnoSys and FilteredPush. In my ideal and fantastic world, we would have an annotation feature on GBIF occurrences, where experts can interact with the material. In our Virtual Herbarium of Repatriated Plants, the experts can suggest new names if they have a login.

However, what is usual is the duplication of efforts for georeferencing the legacy occurrences. For example, different efforts, methodologies and uncertainty levels have been applied in different duplicates of the same occurrence, held by different herbaria.

I thought that the feedback about data improvement should be sent directly to the data provider but, please, if there is something else let me know.

Cheers,

Eduardo



--------------------------------
Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
Trabalho / Work: +55 21 3204 2116
--------------------------------
e-mail alternativo /  alternate email: edalcin@jbrj.org
--------------------------------
Agendar reunião / Schedule a meeting: http://agendar.dalc.in

On Mon, Sep 14, 2015 at 1:50 PM, Roderic Page <Roderic.Page@glasgow.ac.uk> wrote:
Hi Eduardo,

it would be interesting to have example of the kinds of problems you encounter with GBIF data, so that we can look at was to fix the problems. It would also be interesting to know whether you would be able to provide GBIF with the corrections you make to GBIF data. It seems clear that lots of people are cleaning data in their own projects, but that doesn’t filter back to GBIF.

Regards

Rod

---------------------------------------------------------
Roderic Page
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine
College of Medical, Veterinary and Life Sciences
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email:  Roderic.Page@glasgow.ac.uk
Tel:  +44 141 330 4778
Skype:  rdmpage
Facebook:  http://www.facebook.com/rdmpage
LinkedIn:  http://uk.linkedin.com/in/rdmpage
Twitter:  http://twitter.com/rdmpage
Blog:  http://iphylo.blogspot.com
ORCID:  http://orcid.org/0000-0002-7101-9767

On 14 Sep 2015, at 17:34, Eduardo Dalcin <edalcin@jbrj.org> wrote:

The problem with these tools (LontraHarvest, OpenRefine, etc.) is that they are just data *retrieval* tools, not providing for data analytical and representation functionalities

Mauro, for me this is a ​blessing! :)

At CNC Flora workflow, the data from GBIF is useless the way it is, because it have to be validated first, taxonomically and spatially. Only after the process of the cleaning, georeferencing and validation with the expert, the data will be analyzed to take part of the risk assessment.

Cheers

Eduardo



_______________________________________________
API-users mailing list
API-users@lists.gbif.org
http://lists.gbif.org/mailman/listinfo/api-users