Eduardo,
another difference in using downloads periodically is that you get the interpreted data from us (together with the original if you want to). That already contains quite a bit of data cleaning and aligning to controlled vocabularies that might be painful to reproduce otherwise. Also publishers are *very* often offline. Especially for the long running xml harvesting protocols (biocase,tapir,digir) this can be a bit of a challenge to index them entirely.
Markus
On 09 Sep 2015, at 20:02, Eduardo Dalcin edalcin@jbrj.org wrote:
Thanks Alex. Food for thought.
Best,
Eduardo
Eduardo Dalcin https://mailtrack.io/trace/link/216b27d1db9fb6d2aa5a6d7b07aaf7a6b6b19ed9?url=http%3A%2F%2Feduardo.dalc.in&signature=fd3c844b73e3c400 Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br mailto:edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
e-mail alternativo / alternate email: edalcin@jbrj.org mailto:edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/dc51cf0b6ea2d4897c2ae7a70bab52163650de15?url=http%3A%2F%2Fagendar.dalc.in&signature=c87a5e78e01a7976
On Wed, Sep 9, 2015 at 2:28 PM, Alex Thompson <godfoder@acis.ufl.edu mailto:godfoder@acis.ufl.edu> wrote: I'm kind of seconding Rod here.
It might make more sense, depending on your use case and local computer resources, to just get a download of Plantae *AND* Brazil from GBIF periodically, then process that to exclude existing Brazilian datasets. You could then use something like Apache hadoop / spark to efficiently split the file by dataset or by institution code.
This would greatly simplify your interactions with GBIF (down to just periodically generating a download programmatically) and you would have an easy place to insert any additional data transformations you want. This is the path i take for my work at least - the incremental cost of a couple million more records is worth the reduction in complexity overall.
- Alex
On 09/09/2015 12:16 PM, Eduardo Dalcin wrote:
Hi Rod,
The real purpose is to have a list of UUID and the "source web page" for the data set. Thus, one way to do it is to select those resources that counts <> 0 for PLANTAE *AND* Brazil.
I don't want to do any stats analysis, but feed up one local harverster / agregator.
The problem is, considering the reply from Jan Legind at Sep 3, we have to check one by one (https://goo.gl/3wysaA https://goo.gl/3wysaA) to check if it is a Herbarium / Preserved Specimen (Plantae) or not, from the request http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN.
Does it make sense?
Thanks for your curiosity! :)
Cheers,
Eduardo
Eduardo Dalcin https://mailtrack.io/trace/link/5516ed5e4f903c6ee9bd9fb3876fb65ffffc687c?url=http%3A%2F%2Feduardo.dalc.in&signature=cda9e9bf584a828c Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br mailto:edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116 tel:%2B55%2021%203204%202116
e-mail alternativo / alternate email: edalcin@jbrj.org mailto:edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/3a5eaa1df56016285886497766577e5357ddc6c1?url=http%3A%2F%2Fagendar.dalc.in&signature=c4e8d8113c34937f On Mon, Sep 7, 2015 at 12:33 PM, Roderic Page <Roderic.Page@glasgow.ac.uk mailto:Roderic.Page@glasgow.ac.uk> wrote: Hi Eduardo,
I’m curious, is the purpose to get counts by dataset by country, or to get all the plant occurrences for Brazil? The later can be obtained by downloading all plant occurrences in Brazil http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR (you could then compute the per-dataset stats locally). I realise that this isn’t as convenient as having GBIF slice the data for you in the API.
Regards
Rod
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: Roderic.Page@glasgow.ac.uk mailto:Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778 tel:%2B44%20141%20330%204778 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com http://iphylo.blogspot.com/ ORCID: http://orcid.org/0000-0002-7101-9767 http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ResearchGate https://www.researchgate.net/profile/Roderic_Page https://www.researchgate.net/profile/Roderic_Page
On 4 Sep 2015, at 10:39, Eduardo Dalcin <edalcin@jbrj.org mailto:edalcin@jbrj.org> wrote:
Hi Markus,
Yes, that's a shame I can't have country and "nub" together. There is any hope about it?
Eduardo
Eduardo Dalcin https://mailtrack.io/trace/link/bac23864202354f3789938ce352a878faa0cd8b8?url=http%3A%2F%2Feduardo.dalc.in&signature=aea58ef6f439535b Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br mailto:edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116 tel:%2B55%2021%203204%202116
e-mail alternativo / alternate email: edalcin@jbrj.org mailto:edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5 On Thu, Sep 3, 2015 at 4:29 PM, Markus Döring <mdoering@gbif.org mailto:mdoering@gbif.org> wrote: Eduardo,
as you might have seen from my issue comment the webservice uses a different parameter name for taxonKey which is a bug we need to fix at some point. Please use nubKey for now to use the service like that:
http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6 http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6
The real problem for you will be that we do not support the combination of the country and the taxon filter, just one of the two. So you cannot search for plants in Brazil I am afraid, just for datasets about Brazil and datasets with plant records.
Markus
On 03 Sep 2015, at 14:12, Eduardo Dalcin <edalcin@jbrj.org mailto:edalcin@jbrj.org> wrote:
Thanks Jan. I'll keep exploring and I'll be in touch, if I need.
Best,
Eduardo
Eduardo Dalcin Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br mailto:edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116 tel:%2B55%2021%203204%202116
e-mail alternativo / alternate email: edalcin@jbrj.org mailto:edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5
On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF] <jlegind@gbif.org mailto:jlegind@gbif.org> wrote: Dear Eduardo,
Thanks for getting in touch with us about these issues.
The first request http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN returns the number of records located in Brazil for the facets in the request.
The second query http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN uses the Occurrence Inventories web service http://www.gbif.org/developer/occurrence#inventories http://www.gbif.org/developer/occurrence#inventories which does not support the basis-of-record facet in the /datasets request. I understand that it would be better if the API response yielded an error message in this instance.
Concerning the other issues – you are indeed right that the counts do not make sense in the context of taxon key 6 which is Plantae. Actually the API does not handle the taxonKey search at all, contrary to what the documentation states:
/occurrence/counts/datasets
GET
Counts
Lists occurrence counts for datasets that cover a given taxon or country.
country, taxonKey
As you can see here, http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 , this request doesn’t return anything.
The GBIF developers will handle this issue in due time.
You can follow the issue in our bug tracking service here: http://dev.gbif.org/issues/browse/POR-2828 http://dev.gbif.org/issues/browse/POR-2828
With best regards,
Jan K. Legind
Data manager, GBIF Secretariat
From: API-users [mailto:api-users-bounces@lists.gbif.org mailto:api-users-bounces@lists.gbif.org] On Behalf Of Eduardo Dalcin Sent: 2. september 2015 20:06 To: api-users@lists.gbif.org mailto:api-users@lists.gbif.org; dev@gbif.org mailto:dev@gbif.org Cc: João Monnerat Lanna; Natália Queiroz; Diogo Silva; Laura; Ricardo Avancini Subject: [API-users] Some questions from a begginer
Hi folks,
This is my first message to the list. So, please, be nice :)
I'm working here at Rio de Janeiro Botanical Garden, together with the guys at the National Center for Flora Conservation. We are doing the risk assessment of the Brazilian flora to the government. We assess, so far, the risk of ca. 6.000 species, but we still have to assess ca. 35.000. Access occurrence records for Brazil is crucial, and every occurrence is important.
That means that we have to put together occurrence data from different sources and, after the first batch of the risk assessment, we realize that we need to build up our aggregator. We are planning to do this with the Lontra-harvester, with the help of the guys at Brazilian GBIF Node.
So, the one of the firsts steps was to list the available resources to understand the dimension of the task and, that brings me to my questions.
First:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
returns 4.982.689 records
And the request:
http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
returns (here) 7.406.310 records
Comments?
Second:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
return things like this:
"197908d0-5565-11d8-b290-b8a03c50a862":27629
But the consult of the same dataset:
http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5... http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5565-11d8-b290-b8a03c50a862
Returns "null" (of course, is a FishBase!)
I have plenty of examples like this, on yellow here (not finished!):
https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYy... https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYyj459VTc/edit?usp=sharing
Comments?
I think those two questions is a good start. Please, let me know if I'm doing something wrong.
Cheers,
Eduardo
Eduardo Dalcin
Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
e-mail: edalcin@jbrj.gov.br mailto:edalcin@jbrj.gov.br
Trabalho / Work: +55 21 3204 2116 tel:%2B55%2021%203204%202116
e-mail alternativo / alternate email: edalcin@jbrj.org mailto:edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5
API-users mailing list API-users@lists.gbif.org mailto:API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org mailto:API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org mailto:API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users