Hi folks,
This is my first message to the list. So, please, be nice :)
I'm working here at Rio de Janeiro Botanical Garden, together with the guys at the National Center for Flora Conservation. We are doing the risk assessment of the Brazilian flora to the government. We assess, so far, the risk of ca. 6.000 species https://mailtrack.io/trace/link/453c7d453a2da4e332a97aeca082eb5547d8191b?url=http%3A%2F%2Fcncflora.jbrj.gov.br%2Farquivos%2Farquivos%2Fpdfs%2FLivroVermelho.pdf&signature=500078d211c9dfde, but we still have to assess ca. 35.000. Access occurrence records for Brazil is crucial, and every occurrence is important.
That means that we have to put together occurrence data from different sources and, after the first batch of the risk assessment, we realize that we need to build up our aggregator. We are planning to do this with the Lontra-harvester, with the help of the guys at Brazilian GBIF Node.
So, the one of the firsts steps was to list the available resources to understand the dimension of the task and, that brings me to my questions.
First:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... https://mailtrack.io/trace/link/a74c129abf8bbfa4b19b1e189b34df8e0487feca?url=http%3A%2F%2Fapi.gbif.org%2Fv1%2Foccurrence%2Fcount%3Fcountry%3DBR%26taxonKey%3D6%26basisOfRecord%3DPRESERVED_SPECIMEN&signature=135dde174fa6c96e
returns 4.982.689 records
And the request:
http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... https://mailtrack.io/trace/link/9d992e7b6733894b7f83de11d178c1437080224b?url=http%3A%2F%2Fapi.gbif.org%2Fv1%2Foccurrence%2Fcounts%2Fdatasets%3Fcountry%3DBR%26taxonKey%3D6%26basisOfRecord%3DPRESERVED_SPECIMEN&signature=8b9c03790290e0ba
returns (here https://mailtrack.io/trace/link/f15b92c6e49e193b58c96df93026b672d630a4fd?url=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2F1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYyj459VTc%2Fedit%3Fusp%3Dsharing&signature=c9883058799925fd) 7.406.310 records
Comments?
Second:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... https://mailtrack.io/trace/link/a74c129abf8bbfa4b19b1e189b34df8e0487feca?url=http%3A%2F%2Fapi.gbif.org%2Fv1%2Foccurrence%2Fcount%3Fcountry%3DBR%26taxonKey%3D6%26basisOfRecord%3DPRESERVED_SPECIMEN&signature=135dde174fa6c96e
return things like this:
"197908d0-5565-11d8-b290-b8a03c50a862":27629
But the consult of the same dataset:
http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5... https://mailtrack.io/trace/link/6b3dbf4bfc7f45d67d05057979f1e72e2617ca68?url=http%3A%2F%2Fwww.gbif.org%2Foccurrence%2Fsearch%3FTAXON_KEY%3D6%26DATASET_KEY%3D197908d0-5565-11d8-b290-b8a03c50a862&signature=b7f33a6c62020571
Returns "null" (of course, is a FishBase!)
I have plenty of examples like this, on yellow here (not finished!):
https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYy... https://mailtrack.io/trace/link/21008349e3704f943677730af8f5afdc6cd3cc7b?url=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2F1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYyj459VTc%2Fedit%3Fusp%3Dsharing&signature=b9533d9cc7784c04
Comments?
I think those two questions is a good start. Please, let me know if I'm doing something wrong.
Cheers,
Eduardo -------------------------------- *Eduardo Dalcin https://mailtrack.io/trace/link/0013f9d54db6dace53d2df17834b697a7abd50cc?url=http%3A%2F%2Feduardo.dalc.in&signature=d1933ed6341a6234* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116 -------------------------------- *e-mail alternativo / * *alternate email:** edalcin@jbrj.org edalcin@jbrj.org* -------------------------------- Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/0307534e3500b69d419aa6594a8e7baed96be03e?url=http%3A%2F%2Fagendar.dalc.in&signature=8525294e2422d17d
Dear Eduardo,
Thanks for getting in touch with us about these issues.
The first request http://api.gbif.org/v1/occurrence/count?country=BR https://mailtrack.io/trace/link/a74c129abf8bbfa4b19b1e189b34df8e0487feca?url=http%3A%2F%2Fapi.gbif.org%2Fv1%2Foccurrence%2Fcount%3Fcountry%3DBR%26taxonKey%3D6%26basisOfRecord%3DPRESERVED_SPECIMEN&signature=135dde174fa6c96e &taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN returns the number of records located in Brazil for the facets in the request.
The second query http://api.gbif.org/v1/occurrence/counts/datasets?country=BR https://mailtrack.io/trace/link/9d992e7b6733894b7f83de11d178c1437080224b?url=http%3A%2F%2Fapi.gbif.org%2Fv1%2Foccurrence%2Fcounts%2Fdatasets%3Fcountry%3DBR%26taxonKey%3D6%26basisOfRecord%3DPRESERVED_SPECIMEN&signature=8b9c03790290e0ba &taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN uses the Occurrence Inventories web service http://www.gbif.org/developer/occurrence#inventories which does not support the basis-of-record facet in the /datasets request. I understand that it would be better if the API response yielded an error message in this instance.
Concerning the other issues – you are indeed right that the counts do not make sense in the context of taxon key 6 which is Plantae. Actually the API does not handle the taxonKey search at all, contrary to what the documentation states:
/occurrence/counts/datasets
GET
Counts http://api.gbif.org/v1/occurrence/counts/datasets?country=DE
Lists occurrence counts for datasets that cover a given taxon or country.
country http://www.gbif.org/developer/occurrence#p_country , taxonKey http://www.gbif.org/developer/occurrence#p_taxonKey
As you can see here, http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 , this request doesn’t return anything.
The GBIF developers will handle this issue in due time.
You can follow the issue in our bug tracking service here: http://dev.gbif.org/issues/browse/POR-2828
With best regards,
Jan K. Legind
Data manager, GBIF Secretariat
From: API-users [mailto:api-users-bounces@lists.gbif.org] On Behalf Of Eduardo Dalcin Sent: 2. september 2015 20:06 To: api-users@lists.gbif.org; dev@gbif.org Cc: João Monnerat Lanna; Natália Queiroz; Diogo Silva; Laura; Ricardo Avancini Subject: [API-users] Some questions from a begginer
Hi folks,
This is my first message to the list. So, please, be nice :)
I'm working here at Rio de Janeiro Botanical Garden, together with the guys at the National Center for Flora Conservation. We are doing the risk assessment of the Brazilian flora to the government. We assess, so far, the risk of ca. 6.000 species https://mailtrack.io/trace/link/453c7d453a2da4e332a97aeca082eb5547d8191b?url=http%3A%2F%2Fcncflora.jbrj.gov.br%2Farquivos%2Farquivos%2Fpdfs%2FLivroVermelho.pdf&signature=500078d211c9dfde , but we still have to assess ca. 35.000. Access occurrence records for Brazil is crucial, and every occurrence is important.
That means that we have to put together occurrence data from different sources and, after the first batch of the risk assessment, we realize that we need to build up our aggregator. We are planning to do this with the Lontra-harvester, with the help of the guys at Brazilian GBIF Node.
So, the one of the firsts steps was to list the available resources to understand the dimension of the task and, that brings me to my questions.
First:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR https://mailtrack.io/trace/link/a74c129abf8bbfa4b19b1e189b34df8e0487feca?url=http%3A%2F%2Fapi.gbif.org%2Fv1%2Foccurrence%2Fcount%3Fcountry%3DBR%26taxonKey%3D6%26basisOfRecord%3DPRESERVED_SPECIMEN&signature=135dde174fa6c96e &taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
returns 4.982.689 records
And the request:
http://api.gbif.org/v1/occurrence/counts/datasets?country=BR https://mailtrack.io/trace/link/9d992e7b6733894b7f83de11d178c1437080224b?url=http%3A%2F%2Fapi.gbif.org%2Fv1%2Foccurrence%2Fcounts%2Fdatasets%3Fcountry%3DBR%26taxonKey%3D6%26basisOfRecord%3DPRESERVED_SPECIMEN&signature=8b9c03790290e0ba &taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
returns (here https://mailtrack.io/trace/link/f15b92c6e49e193b58c96df93026b672d630a4fd?url=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2F1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYyj459VTc%2Fedit%3Fusp%3Dsharing&signature=c9883058799925fd ) 7.406.310 records
Comments?
Second:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR https://mailtrack.io/trace/link/a74c129abf8bbfa4b19b1e189b34df8e0487feca?url=http%3A%2F%2Fapi.gbif.org%2Fv1%2Foccurrence%2Fcount%3Fcountry%3DBR%26taxonKey%3D6%26basisOfRecord%3DPRESERVED_SPECIMEN&signature=135dde174fa6c96e &taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
return things like this:
"197908d0-5565-11d8-b290-b8a03c50a862":27629
But the consult of the same dataset:
http://www.gbif.org/occurrence/search?TAXON_KEY=6 https://mailtrack.io/trace/link/6b3dbf4bfc7f45d67d05057979f1e72e2617ca68?url=http%3A%2F%2Fwww.gbif.org%2Foccurrence%2Fsearch%3FTAXON_KEY%3D6%26DATASET_KEY%3D197908d0-5565-11d8-b290-b8a03c50a862&signature=b7f33a6c62020571 &DATASET_KEY=197908d0-5565-11d8-b290-b8a03c50a862
Returns "null" (of course, is a FishBase!)
I have plenty of examples like this, on yellow here (not finished!):
https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYy... https://mailtrack.io/trace/link/21008349e3704f943677730af8f5afdc6cd3cc7b?url=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2F1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYyj459VTc%2Fedit%3Fusp%3Dsharing&signature=b9533d9cc7784c04
Comments?
I think those two questions is a good start. Please, let me know if I'm doing something wrong.
Cheers,
Eduardo
--------------------------------
Eduardo Dalcin https://mailtrack.io/trace/link/0013f9d54db6dace53d2df17834b697a7abd50cc?url=http%3A%2F%2Feduardo.dalc.in&signature=d1933ed6341a6234
Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
e-mail: mailto:edalcin@jbrj.gov.br edalcin@jbrj.gov.br
Trabalho / Work: +55 21 3204 2116 tel:%2B55%2021%203204%202116
--------------------------------
e-mail alternativo / alternate email: edalcin@jbrj.org
--------------------------------
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/0307534e3500b69d419aa6594a8e7baed96be03e?url=http%3A%2F%2Fagendar.dalc.in&signature=8525294e2422d17d
https://mailtrack.io/trace/mail/6ee0fbf1be4727c1a867e58e78f6b09e8bae838c.png https://mailtrack.io/trace/mail/d32643b0a88cdbe30106bab5887d06e2d90023fa.png
Thanks Jan. I'll keep exploring and I'll be in touch, if I need.
Best,
Eduardo
-------------------------------- *Eduardo Dalcin https://mailtrack.io/trace/link/a50fe3f7bab3d092ec4820d103e6c7e461608569?url=http%3A%2F%2Feduardo.dalc.in&signature=62642000cdfeb84e* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116 -------------------------------- *e-mail alternativo / * *alternate email:** edalcin@jbrj.org edalcin@jbrj.org* -------------------------------- Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/38b84b84eb075facae980e2f889a6537f2a60e9a?url=http%3A%2F%2Fagendar.dalc.in&signature=af7cdcfd40ddf152
On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF] jlegind@gbif.org wrote:
Dear Eduardo,
Thanks for getting in touch with us about these issues.
The first request http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... https://mailtrack.io/trace/link/a74c129abf8bbfa4b19b1e189b34df8e0487feca?url=http%3A%2F%2Fapi.gbif.org%2Fv1%2Foccurrence%2Fcount%3Fcountry%3DBR%26taxonKey%3D6%26basisOfRecord%3DPRESERVED_SPECIMEN&signature=135dde174fa6c96e returns the number of records located in Brazil for the facets in the request.
The second query http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... https://mailtrack.io/trace/link/9d992e7b6733894b7f83de11d178c1437080224b?url=http%3A%2F%2Fapi.gbif.org%2Fv1%2Foccurrence%2Fcounts%2Fdatasets%3Fcountry%3DBR%26taxonKey%3D6%26basisOfRecord%3DPRESERVED_SPECIMEN&signature=8b9c03790290e0ba uses the Occurrence Inventories web service http://www.gbif.org/developer/occurrence#inventories which does not support the basis-of-record facet in the /datasets request. I understand that it would be better if the API response yielded an error message in this instance.
Concerning the other issues – you are indeed right that the counts do not make sense in the context of taxon key 6 which is Plantae. Actually the API does not handle the taxonKey search at all, contrary to what the documentation states:
/occurrence/counts/datasets
GET
Counts http://api.gbif.org/v1/occurrence/counts/datasets?country=DE
Lists occurrence counts for datasets that cover a given taxon *or* country.
country http://www.gbif.org/developer/occurrence#p_country, taxonKey http://www.gbif.org/developer/occurrence#p_taxonKey
As you can see here, http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 , this request doesn’t return anything.
The GBIF developers will handle this issue in due time.
You can follow the issue in our bug tracking service here: http://dev.gbif.org/issues/browse/POR-2828
With best regards,
Jan K. Legind
Data manager, GBIF Secretariat
*From:* API-users [mailto:api-users-bounces@lists.gbif.org] *On Behalf Of *Eduardo Dalcin *Sent:* 2. september 2015 20:06 *To:* api-users@lists.gbif.org; dev@gbif.org *Cc:* João Monnerat Lanna; Natália Queiroz; Diogo Silva; Laura; Ricardo Avancini *Subject:* [API-users] Some questions from a begginer
Hi folks,
This is my first message to the list. So, please, be nice :)
I'm working here at Rio de Janeiro Botanical Garden, together with the guys at the National Center for Flora Conservation. We are doing the risk assessment of the Brazilian flora to the government. We assess, so far, the risk of ca. 6.000 species https://mailtrack.io/trace/link/453c7d453a2da4e332a97aeca082eb5547d8191b?url=http%3A%2F%2Fcncflora.jbrj.gov.br%2Farquivos%2Farquivos%2Fpdfs%2FLivroVermelho.pdf&signature=500078d211c9dfde, but we still have to assess ca. 35.000. Access occurrence records for Brazil is crucial, and every occurrence is important.
That means that we have to put together occurrence data from different sources and, after the first batch of the risk assessment, we realize that we need to build up our aggregator. We are planning to do this with the Lontra-harvester, with the help of the guys at Brazilian GBIF Node.
So, the one of the firsts steps was to list the available resources to understand the dimension of the task and, that brings me to my questions.
First:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... https://mailtrack.io/trace/link/a74c129abf8bbfa4b19b1e189b34df8e0487feca?url=http%3A%2F%2Fapi.gbif.org%2Fv1%2Foccurrence%2Fcount%3Fcountry%3DBR%26taxonKey%3D6%26basisOfRecord%3DPRESERVED_SPECIMEN&signature=135dde174fa6c96e
returns 4.982.689 records
And the request:
http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... https://mailtrack.io/trace/link/9d992e7b6733894b7f83de11d178c1437080224b?url=http%3A%2F%2Fapi.gbif.org%2Fv1%2Foccurrence%2Fcounts%2Fdatasets%3Fcountry%3DBR%26taxonKey%3D6%26basisOfRecord%3DPRESERVED_SPECIMEN&signature=8b9c03790290e0ba
returns (here https://mailtrack.io/trace/link/f15b92c6e49e193b58c96df93026b672d630a4fd?url=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2F1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYyj459VTc%2Fedit%3Fusp%3Dsharing&signature=c9883058799925fd) 7.406.310 records
Comments?
Second:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... https://mailtrack.io/trace/link/a74c129abf8bbfa4b19b1e189b34df8e0487feca?url=http%3A%2F%2Fapi.gbif.org%2Fv1%2Foccurrence%2Fcount%3Fcountry%3DBR%26taxonKey%3D6%26basisOfRecord%3DPRESERVED_SPECIMEN&signature=135dde174fa6c96e
return things like this:
"197908d0-5565-11d8-b290-b8a03c50a862":27629
But the consult of the same dataset:
http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5... https://mailtrack.io/trace/link/6b3dbf4bfc7f45d67d05057979f1e72e2617ca68?url=http%3A%2F%2Fwww.gbif.org%2Foccurrence%2Fsearch%3FTAXON_KEY%3D6%26DATASET_KEY%3D197908d0-5565-11d8-b290-b8a03c50a862&signature=b7f33a6c62020571
Returns "null" (of course, is a FishBase!)
I have plenty of examples like this, on yellow here (not finished!):
https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYy... https://mailtrack.io/trace/link/21008349e3704f943677730af8f5afdc6cd3cc7b?url=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2F1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYyj459VTc%2Fedit%3Fusp%3Dsharing&signature=b9533d9cc7784c04
Comments?
I think those two questions is a good start. Please, let me know if I'm doing something wrong.
Cheers,
Eduardo
*Eduardo Dalcin https://mailtrack.io/trace/link/0013f9d54db6dace53d2df17834b697a7abd50cc?url=http%3A%2F%2Feduardo.dalc.in&signature=d1933ed6341a6234*
Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
e-mail: edalcin@jbrj.gov.br
Trabalho / Work: +55 21 3204 2116
*e-mail alternativo / * *alternate email**: edalcin@jbrj.org edalcin@jbrj.org*
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/0307534e3500b69d419aa6594a8e7baed96be03e?url=http%3A%2F%2Fagendar.dalc.in&signature=8525294e2422d17d
Eduardo,
as you might have seen from my issue comment the webservice uses a different parameter name for taxonKey which is a bug we need to fix at some point. Please use nubKey for now to use the service like that:
http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6
The real problem for you will be that we do not support the combination of the country and the taxon filter, just one of the two. So you cannot search for plants in Brazil I am afraid, just for datasets about Brazil and datasets with plant records.
Markus
On 03 Sep 2015, at 14:12, Eduardo Dalcin edalcin@jbrj.org wrote:
Thanks Jan. I'll keep exploring and I'll be in touch, if I need.
Best,
Eduardo
Eduardo Dalcin Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
e-mail alternativo / alternate email: edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in
On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF] jlegind@gbif.org wrote: Dear Eduardo,
Thanks for getting in touch with us about these issues.
The first request http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... returns the number of records located in Brazil for the facets in the request.
The second query http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... uses the Occurrence Inventories web service http://www.gbif.org/developer/occurrence#inventories which does not support the basis-of-record facet in the /datasets request. I understand that it would be better if the API response yielded an error message in this instance.
Concerning the other issues – you are indeed right that the counts do not make sense in the context of taxon key 6 which is Plantae. Actually the API does not handle the taxonKey search at all, contrary to what the documentation states:
/occurrence/counts/datasets
GET
Counts
Lists occurrence counts for datasets that cover a given taxon or country.
country, taxonKey
As you can see here, http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 , this request doesn’t return anything.
The GBIF developers will handle this issue in due time.
You can follow the issue in our bug tracking service here: http://dev.gbif.org/issues/browse/POR-2828
With best regards,
Jan K. Legind
Data manager, GBIF Secretariat
From: API-users [mailto:api-users-bounces@lists.gbif.org] On Behalf Of Eduardo Dalcin Sent: 2. september 2015 20:06 To: api-users@lists.gbif.org; dev@gbif.org Cc: João Monnerat Lanna; Natália Queiroz; Diogo Silva; Laura; Ricardo Avancini Subject: [API-users] Some questions from a begginer
Hi folks,
This is my first message to the list. So, please, be nice :)
I'm working here at Rio de Janeiro Botanical Garden, together with the guys at the National Center for Flora Conservation. We are doing the risk assessment of the Brazilian flora to the government. We assess, so far, the risk of ca. 6.000 species, but we still have to assess ca. 35.000. Access occurrence records for Brazil is crucial, and every occurrence is important.
That means that we have to put together occurrence data from different sources and, after the first batch of the risk assessment, we realize that we need to build up our aggregator. We are planning to do this with the Lontra-harvester, with the help of the guys at Brazilian GBIF Node.
So, the one of the firsts steps was to list the available resources to understand the dimension of the task and, that brings me to my questions.
First:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO...
returns 4.982.689 records
And the request:
http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&...
returns (here) 7.406.310 records
Comments?
Second:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO...
return things like this:
"197908d0-5565-11d8-b290-b8a03c50a862":27629
But the consult of the same dataset:
http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5...
Returns "null" (of course, is a FishBase!)
I have plenty of examples like this, on yellow here (not finished!):
https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYy...
Comments?
I think those two questions is a good start. Please, let me know if I'm doing something wrong.
Cheers,
Eduardo
Eduardo Dalcin
Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
e-mail: edalcin@jbrj.gov.br
Trabalho / Work: +55 21 3204 2116
e-mail alternativo / alternate email: edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in
Hi Markus,
Yes, that's a shame I can't have country and "nub" together. There is any hope about it?
Eduardo
-------------------------------- *Eduardo Dalcin https://mailtrack.io/trace/link/bac23864202354f3789938ce352a878faa0cd8b8?url=http%3A%2F%2Feduardo.dalc.in&signature=aea58ef6f439535b* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116 -------------------------------- *e-mail alternativo / * *alternate email:** edalcin@jbrj.org edalcin@jbrj.org* -------------------------------- Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5
On Thu, Sep 3, 2015 at 4:29 PM, Markus Döring mdoering@gbif.org wrote:
Eduardo,
as you might have seen from my issue comment the webservice uses a different parameter name for taxonKey which is a bug we need to fix at some point. Please use nubKey for now to use the service like that:
http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6
The real problem for you will be that we do not support the combination of the country and the taxon filter, just one of the two. So you cannot search for plants in Brazil I am afraid, just for datasets about Brazil and datasets with plant records.
Markus
On 03 Sep 2015, at 14:12, Eduardo Dalcin edalcin@jbrj.org wrote:
Thanks Jan. I'll keep exploring and I'll be in touch, if I need.
Best,
Eduardo
Eduardo Dalcin Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
e-mail alternativo / alternate email: edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in
On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF] jlegind@gbif.org
wrote:
Dear Eduardo,
Thanks for getting in touch with us about these issues.
The first request
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... returns the number of records located in Brazil for the facets in the request.
The second query
http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... uses the Occurrence Inventories web service http://www.gbif.org/developer/occurrence#inventories which does not support the basis-of-record facet in the /datasets request. I understand that it would be better if the API response yielded an error message in this instance.
Concerning the other issues – you are indeed right that the counts do
not make sense in the context of taxon key 6 which is Plantae. Actually the API does not handle the taxonKey search at all, contrary to what the documentation states:
/occurrence/counts/datasets
GET
Counts
Lists occurrence counts for datasets that cover a given taxon or country.
country, taxonKey
As you can see here,
http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 , this request doesn’t return anything.
The GBIF developers will handle this issue in due time.
You can follow the issue in our bug tracking service here:
http://dev.gbif.org/issues/browse/POR-2828
With best regards,
Jan K. Legind
Data manager, GBIF Secretariat
From: API-users [mailto:api-users-bounces@lists.gbif.org] On Behalf Of
Eduardo Dalcin
Sent: 2. september 2015 20:06 To: api-users@lists.gbif.org; dev@gbif.org Cc: João Monnerat Lanna; Natália Queiroz; Diogo Silva; Laura; Ricardo
Avancini
Subject: [API-users] Some questions from a begginer
Hi folks,
This is my first message to the list. So, please, be nice :)
I'm working here at Rio de Janeiro Botanical Garden, together with the
guys at the National Center for Flora Conservation. We are doing the risk assessment of the Brazilian flora to the government. We assess, so far, the risk of ca. 6.000 species, but we still have to assess ca. 35.000. Access occurrence records for Brazil is crucial, and every occurrence is important.
That means that we have to put together occurrence data from different
sources and, after the first batch of the risk assessment, we realize that we need to build up our aggregator. We are planning to do this with the Lontra-harvester, with the help of the guys at Brazilian GBIF Node.
So, the one of the firsts steps was to list the available resources to
understand the dimension of the task and, that brings me to my questions.
First:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO...
returns 4.982.689 records
And the request:
http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&...
returns (here) 7.406.310 records
Comments?
Second:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO...
return things like this:
"197908d0-5565-11d8-b290-b8a03c50a862":27629
But the consult of the same dataset:
http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5...
Returns "null" (of course, is a FishBase!)
I have plenty of examples like this, on yellow here (not finished!):
https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYy...
Comments?
I think those two questions is a good start. Please, let me know if I'm
doing something wrong.
Cheers,
Eduardo
Eduardo Dalcin
Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
e-mail: edalcin@jbrj.gov.br
Trabalho / Work: +55 21 3204 2116
e-mail alternativo / alternate email: edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in
Hi Eduardo,
I’m curious, is the purpose to get counts by dataset by country, or to get all the plant occurrences for Brazil? The later can be obtained by downloading all plant occurrences in Brazil http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR (you could then compute the per-dataset stats locally). I realise that this isn’t as convenient as having GBIF slice the data for you in the API.
Regards
Rod
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: Roderic.Page@glasgow.ac.ukmailto:Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com ORCID: http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ResearchGate https://www.researchgate.net/profile/Roderic_Page
On 4 Sep 2015, at 10:39, Eduardo Dalcin <edalcin@jbrj.orgmailto:edalcin@jbrj.org> wrote:
Hi Markus,
Yes, that's a shame I can't have country and "nub" together. There is any hope about it?
Eduardo
-------------------------------- Eduardo Dalcinhttps://mailtrack.io/trace/link/bac23864202354f3789938ce352a878faa0cd8b8?url=http%3A%2F%2Feduardo.dalc.in&signature=aea58ef6f439535b Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.brmailto:edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116 -------------------------------- e-mail alternativo / alternate email: edalcin@jbrj.orgmailto:edalcin@jbrj.org -------------------------------- Agendar reunião / Schedule a meeting: http://agendar.dalc.inhttps://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5
On Thu, Sep 3, 2015 at 4:29 PM, Markus Döring <mdoering@gbif.orgmailto:mdoering@gbif.org> wrote: Eduardo,
as you might have seen from my issue comment the webservice uses a different parameter name for taxonKey which is a bug we need to fix at some point. Please use nubKey for now to use the service like that:
http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6
The real problem for you will be that we do not support the combination of the country and the taxon filter, just one of the two. So you cannot search for plants in Brazil I am afraid, just for datasets about Brazil and datasets with plant records.
Markus
On 03 Sep 2015, at 14:12, Eduardo Dalcin <edalcin@jbrj.orgmailto:edalcin@jbrj.org> wrote:
Thanks Jan. I'll keep exploring and I'll be in touch, if I need.
Best,
Eduardo
Eduardo Dalcin Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.brmailto:edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116tel:%2B55%2021%203204%202116
e-mail alternativo / alternate email: edalcin@jbrj.orgmailto:edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.inhttps://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5
On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF] <jlegind@gbif.orgmailto:jlegind@gbif.org> wrote: Dear Eduardo,
Thanks for getting in touch with us about these issues.
The first request http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... returns the number of records located in Brazil for the facets in the request.
The second query http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... uses the Occurrence Inventories web service http://www.gbif.org/developer/occurrence#inventories which does not support the basis-of-record facet in the /datasets request. I understand that it would be better if the API response yielded an error message in this instance.
Concerning the other issues – you are indeed right that the counts do not make sense in the context of taxon key 6 which is Plantae. Actually the API does not handle the taxonKey search at all, contrary to what the documentation states:
/occurrence/counts/datasets
GET
Counts
Lists occurrence counts for datasets that cover a given taxon or country.
country, taxonKey
As you can see here, http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 , this request doesn’t return anything.
The GBIF developers will handle this issue in due time.
You can follow the issue in our bug tracking service here: http://dev.gbif.org/issues/browse/POR-2828
With best regards,
Jan K. Legind
Data manager, GBIF Secretariat
From: API-users [mailto:api-users-bounces@lists.gbif.orgmailto:api-users-bounces@lists.gbif.org] On Behalf Of Eduardo Dalcin Sent: 2. september 2015 20:06 To: api-users@lists.gbif.orgmailto:api-users@lists.gbif.org; dev@gbif.orgmailto:dev@gbif.org Cc: João Monnerat Lanna; Natália Queiroz; Diogo Silva; Laura; Ricardo Avancini Subject: [API-users] Some questions from a begginer
Hi folks,
This is my first message to the list. So, please, be nice :)
I'm working here at Rio de Janeiro Botanical Garden, together with the guys at the National Center for Flora Conservation. We are doing the risk assessment of the Brazilian flora to the government. We assess, so far, the risk of ca. 6.000 species, but we still have to assess ca. 35.000. Access occurrence records for Brazil is crucial, and every occurrence is important.
That means that we have to put together occurrence data from different sources and, after the first batch of the risk assessment, we realize that we need to build up our aggregator. We are planning to do this with the Lontra-harvester, with the help of the guys at Brazilian GBIF Node.
So, the one of the firsts steps was to list the available resources to understand the dimension of the task and, that brings me to my questions.
First:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO...
returns 4.982.689 records
And the request:
http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&...
returns (here) 7.406.310 records
Comments?
Second:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO...
return things like this:
"197908d0-5565-11d8-b290-b8a03c50a862":27629
But the consult of the same dataset:
http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5...
Returns "null" (of course, is a FishBase!)
I have plenty of examples like this, on yellow here (not finished!):
https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYy...
Comments?
I think those two questions is a good start. Please, let me know if I'm doing something wrong.
Cheers,
Eduardo
Eduardo Dalcin
Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
e-mail: edalcin@jbrj.gov.brmailto:edalcin@jbrj.gov.br
Trabalho / Work: +55 21 3204 2116tel:%2B55%2021%203204%202116
e-mail alternativo / alternate email: edalcin@jbrj.orgmailto:edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.inhttps://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5
_______________________________________________ API-users mailing list API-users@lists.gbif.orgmailto:API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
Dalcin,
For retrieving, counting, performing statistical analyses, databasing, plotting charts and maps, of GBIF data I have been using quite successfully the R oftware environment for statistical computing and graphics ( https://www.r-project.org/) with the rgbif package ( https://github.com/ropensci/rgbif) written by Scott Chamberlain, which makes direct calls to the GBIF API. The advantages of R as a data retrieval and analytical tool are overwhelming and would be worth exploring in your case.
Hope this helps.
Salud!
2015-09-07 12:33 GMT-03:00 Roderic Page Roderic.Page@glasgow.ac.uk:
Hi Eduardo,
I’m curious, is the purpose to get counts by dataset by country, or to get all the plant occurrences for Brazil? The later can be obtained by downloading all plant occurrences in Brazil http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR (you could then compute the per-dataset stats locally). I realise that this isn’t as convenient as having GBIF slice the data for you in the API.
Regards
Rod
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com ORCID: http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ResearchGate https://www.researchgate.net/profile/Roderic_Page
On 4 Sep 2015, at 10:39, Eduardo Dalcin edalcin@jbrj.org wrote:
Hi Markus,
Yes, that's a shame I can't have country and "nub" together. There is any hope about it?
Eduardo
*Eduardo Dalcin https://mailtrack.io/trace/link/bac23864202354f3789938ce352a878faa0cd8b8?url=http%3A%2F%2Feduardo.dalc.in&signature=aea58ef6f439535b* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
*e-mail alternativo / * *alternate email:** edalcin@jbrj.org
edalcin@jbrj.org*
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5
On Thu, Sep 3, 2015 at 4:29 PM, Markus Döring mdoering@gbif.org wrote:
Eduardo,
as you might have seen from my issue comment the webservice uses a different parameter name for taxonKey which is a bug we need to fix at some point. Please use nubKey for now to use the service like that:
http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6
The real problem for you will be that we do not support the combination of the country and the taxon filter, just one of the two. So you cannot search for plants in Brazil I am afraid, just for datasets about Brazil and datasets with plant records.
Markus
On 03 Sep 2015, at 14:12, Eduardo Dalcin edalcin@jbrj.org wrote:
Thanks Jan. I'll keep exploring and I'll be in touch, if I need.
Best,
Eduardo
Eduardo Dalcin Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
e-mail alternativo / alternate email: edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in
On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF] jlegind@gbif.org
wrote:
Dear Eduardo,
Thanks for getting in touch with us about these issues.
The first request
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... returns the number of records located in Brazil for the facets in the request.
The second query
http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... uses the Occurrence Inventories web service http://www.gbif.org/developer/occurrence#inventories which does not support the basis-of-record facet in the /datasets request. I understand that it would be better if the API response yielded an error message in this instance.
Concerning the other issues – you are indeed right that the counts do
not make sense in the context of taxon key 6 which is Plantae. Actually the API does not handle the taxonKey search at all, contrary to what the documentation states:
/occurrence/counts/datasets
GET
Counts
Lists occurrence counts for datasets that cover a given taxon or
country.
country, taxonKey
As you can see here,
http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 , this request doesn’t return anything.
The GBIF developers will handle this issue in due time.
You can follow the issue in our bug tracking service here:
http://dev.gbif.org/issues/browse/POR-2828
With best regards,
Jan K. Legind
Data manager, GBIF Secretariat
From: API-users [mailto:api-users-bounces@lists.gbif.org] On Behalf Of
Eduardo Dalcin
Sent: 2. september 2015 20:06 To: api-users@lists.gbif.org; dev@gbif.org Cc: João Monnerat Lanna; Natália Queiroz; Diogo Silva; Laura; Ricardo
Avancini
Subject: [API-users] Some questions from a begginer
Hi folks,
This is my first message to the list. So, please, be nice :)
I'm working here at Rio de Janeiro Botanical Garden, together with the
guys at the National Center for Flora Conservation. We are doing the risk assessment of the Brazilian flora to the government. We assess, so far, the risk of ca. 6.000 species, but we still have to assess ca. 35.000. Access occurrence records for Brazil is crucial, and every occurrence is important.
That means that we have to put together occurrence data from different
sources and, after the first batch of the risk assessment, we realize that we need to build up our aggregator. We are planning to do this with the Lontra-harvester, with the help of the guys at Brazilian GBIF Node.
So, the one of the firsts steps was to list the available resources to
understand the dimension of the task and, that brings me to my questions.
First:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO...
returns 4.982.689 records
And the request:
http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&...
returns (here) 7.406.310 records
Comments?
Second:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO...
return things like this:
"197908d0-5565-11d8-b290-b8a03c50a862":27629
But the consult of the same dataset:
http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5...
Returns "null" (of course, is a FishBase!)
I have plenty of examples like this, on yellow here (not finished!):
https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYy...
Comments?
I think those two questions is a good start. Please, let me know if I'm
doing something wrong.
Cheers,
Eduardo
Eduardo Dalcin
Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
e-mail: edalcin@jbrj.gov.br
Trabalho / Work: +55 21 3204 2116
e-mail alternativo / alternate email: edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
Thanks Mauro.
-------------------------------- *Eduardo Dalcin https://mailtrack.io/trace/link/260ddb67a47f62cce404e06d766c44c4739f1ede?url=http%3A%2F%2Feduardo.dalc.in&signature=e4530308d2563fc1* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116 -------------------------------- *e-mail alternativo / * *alternate email:** edalcin@jbrj.org edalcin@jbrj.org* -------------------------------- Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/437dcf6445c1d03972f4089a557db31363205d94?url=http%3A%2F%2Fagendar.dalc.in&signature=392fac43ed8d3965
On Mon, Sep 7, 2015 at 7:21 PM, Mauro Cavalcanti maurobio@gmail.com wrote:
Dalcin,
For retrieving, counting, performing statistical analyses, databasing, plotting charts and maps, of GBIF data I have been using quite successfully the R oftware environment for statistical computing and graphics ( https://www.r-project.org/) with the rgbif package ( https://github.com/ropensci/rgbif) written by Scott Chamberlain, which makes direct calls to the GBIF API. The advantages of R as a data retrieval and analytical tool are overwhelming and would be worth exploring in your case.
Hope this helps.
Salud!
2015-09-07 12:33 GMT-03:00 Roderic Page Roderic.Page@glasgow.ac.uk:
Hi Eduardo,
I’m curious, is the purpose to get counts by dataset by country, or to get all the plant occurrences for Brazil? The later can be obtained by downloading all plant occurrences in Brazil http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR (you could then compute the per-dataset stats locally). I realise that this isn’t as convenient as having GBIF slice the data for you in the API.
Regards
Rod
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com ORCID: http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ResearchGate https://www.researchgate.net/profile/Roderic_Page
On 4 Sep 2015, at 10:39, Eduardo Dalcin edalcin@jbrj.org wrote:
Hi Markus,
Yes, that's a shame I can't have country and "nub" together. There is any hope about it?
Eduardo
*Eduardo Dalcin https://mailtrack.io/trace/link/bac23864202354f3789938ce352a878faa0cd8b8?url=http%3A%2F%2Feduardo.dalc.in&signature=aea58ef6f439535b* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
*e-mail alternativo / * *alternate email:** edalcin@jbrj.org
edalcin@jbrj.org*
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5
On Thu, Sep 3, 2015 at 4:29 PM, Markus Döring mdoering@gbif.org wrote:
Eduardo,
as you might have seen from my issue comment the webservice uses a different parameter name for taxonKey which is a bug we need to fix at some point. Please use nubKey for now to use the service like that:
http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6
The real problem for you will be that we do not support the combination of the country and the taxon filter, just one of the two. So you cannot search for plants in Brazil I am afraid, just for datasets about Brazil and datasets with plant records.
Markus
On 03 Sep 2015, at 14:12, Eduardo Dalcin edalcin@jbrj.org wrote:
Thanks Jan. I'll keep exploring and I'll be in touch, if I need.
Best,
Eduardo
Eduardo Dalcin Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
e-mail alternativo / alternate email: edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in
On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF] jlegind@gbif.org
wrote:
Dear Eduardo,
Thanks for getting in touch with us about these issues.
The first request
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... returns the number of records located in Brazil for the facets in the request.
The second query
http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... uses the Occurrence Inventories web service http://www.gbif.org/developer/occurrence#inventories which does not support the basis-of-record facet in the /datasets request. I understand that it would be better if the API response yielded an error message in this instance.
Concerning the other issues – you are indeed right that the counts do
not make sense in the context of taxon key 6 which is Plantae. Actually the API does not handle the taxonKey search at all, contrary to what the documentation states:
/occurrence/counts/datasets
GET
Counts
Lists occurrence counts for datasets that cover a given taxon or
country.
country, taxonKey
As you can see here,
http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 , this request doesn’t return anything.
The GBIF developers will handle this issue in due time.
You can follow the issue in our bug tracking service here:
http://dev.gbif.org/issues/browse/POR-2828
With best regards,
Jan K. Legind
Data manager, GBIF Secretariat
From: API-users [mailto:api-users-bounces@lists.gbif.org] On Behalf
Of Eduardo Dalcin
Sent: 2. september 2015 20:06 To: api-users@lists.gbif.org; dev@gbif.org Cc: João Monnerat Lanna; Natália Queiroz; Diogo Silva; Laura; Ricardo
Avancini
Subject: [API-users] Some questions from a begginer
Hi folks,
This is my first message to the list. So, please, be nice :)
I'm working here at Rio de Janeiro Botanical Garden, together with the
guys at the National Center for Flora Conservation. We are doing the risk assessment of the Brazilian flora to the government. We assess, so far, the risk of ca. 6.000 species, but we still have to assess ca. 35.000. Access occurrence records for Brazil is crucial, and every occurrence is important.
That means that we have to put together occurrence data from different
sources and, after the first batch of the risk assessment, we realize that we need to build up our aggregator. We are planning to do this with the Lontra-harvester, with the help of the guys at Brazilian GBIF Node.
So, the one of the firsts steps was to list the available resources to
understand the dimension of the task and, that brings me to my questions.
First:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO...
returns 4.982.689 records
And the request:
http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&...
returns (here) 7.406.310 records
Comments?
Second:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO...
return things like this:
"197908d0-5565-11d8-b290-b8a03c50a862":27629
But the consult of the same dataset:
http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5...
Returns "null" (of course, is a FishBase!)
I have plenty of examples like this, on yellow here (not finished!):
https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYy...
Comments?
I think those two questions is a good start. Please, let me know if
I'm doing something wrong.
Cheers,
Eduardo
Eduardo Dalcin
Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
e-mail: edalcin@jbrj.gov.br
Trabalho / Work: +55 21 3204 2116
e-mail alternativo / alternate email: edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
-- Dr. Mauro J. Cavalcanti E-mail: maurobio@gmail.com Web: http://sites.google.com/site/maurobio
Hi Rod,
The real purpose is to have a list of UUID and the "source web page" for the data set. Thus, one way to do it is to select those resources that counts <> 0 for PLANTAE *AND* Brazil.
I don't want to do any stats analysis, but feed up one local harverster / agregator.
The problem is, considering the reply from Jan Legind at Sep 3, we have to check one by one (https://goo.gl/3wysaA) to check if it is a Herbarium / Preserved Specimen (Plantae) or not, from the request http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... .
Does it make sense?
Thanks for your curiosity! :)
Cheers,
Eduardo
-------------------------------- *Eduardo Dalcin https://mailtrack.io/trace/link/5516ed5e4f903c6ee9bd9fb3876fb65ffffc687c?url=http%3A%2F%2Feduardo.dalc.in&signature=cda9e9bf584a828c* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116 -------------------------------- *e-mail alternativo / * *alternate email:** edalcin@jbrj.org edalcin@jbrj.org* -------------------------------- Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/3a5eaa1df56016285886497766577e5357ddc6c1?url=http%3A%2F%2Fagendar.dalc.in&signature=c4e8d8113c34937f
On Mon, Sep 7, 2015 at 12:33 PM, Roderic Page Roderic.Page@glasgow.ac.uk wrote:
Hi Eduardo,
I’m curious, is the purpose to get counts by dataset by country, or to get all the plant occurrences for Brazil? The later can be obtained by downloading all plant occurrences in Brazil http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR (you could then compute the per-dataset stats locally). I realise that this isn’t as convenient as having GBIF slice the data for you in the API.
Regards
Rod
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com ORCID: http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ResearchGate https://www.researchgate.net/profile/Roderic_Page
On 4 Sep 2015, at 10:39, Eduardo Dalcin edalcin@jbrj.org wrote:
Hi Markus,
Yes, that's a shame I can't have country and "nub" together. There is any hope about it?
Eduardo
*Eduardo Dalcin https://mailtrack.io/trace/link/bac23864202354f3789938ce352a878faa0cd8b8?url=http%3A%2F%2Feduardo.dalc.in&signature=aea58ef6f439535b* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
*e-mail alternativo / * *alternate email:** edalcin@jbrj.org
edalcin@jbrj.org*
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5
On Thu, Sep 3, 2015 at 4:29 PM, Markus Döring mdoering@gbif.org wrote:
Eduardo,
as you might have seen from my issue comment the webservice uses a different parameter name for taxonKey which is a bug we need to fix at some point. Please use nubKey for now to use the service like that:
http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6
The real problem for you will be that we do not support the combination of the country and the taxon filter, just one of the two. So you cannot search for plants in Brazil I am afraid, just for datasets about Brazil and datasets with plant records.
Markus
On 03 Sep 2015, at 14:12, Eduardo Dalcin edalcin@jbrj.org wrote:
Thanks Jan. I'll keep exploring and I'll be in touch, if I need.
Best,
Eduardo
Eduardo Dalcin Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
e-mail alternativo / alternate email: edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in
On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF] jlegind@gbif.org
wrote:
Dear Eduardo,
Thanks for getting in touch with us about these issues.
The first request
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... returns the number of records located in Brazil for the facets in the request.
The second query
http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... uses the Occurrence Inventories web service http://www.gbif.org/developer/occurrence#inventories which does not support the basis-of-record facet in the /datasets request. I understand that it would be better if the API response yielded an error message in this instance.
Concerning the other issues – you are indeed right that the counts do
not make sense in the context of taxon key 6 which is Plantae. Actually the API does not handle the taxonKey search at all, contrary to what the documentation states:
/occurrence/counts/datasets
GET
Counts
Lists occurrence counts for datasets that cover a given taxon or
country.
country, taxonKey
As you can see here,
http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 , this request doesn’t return anything.
The GBIF developers will handle this issue in due time.
You can follow the issue in our bug tracking service here:
http://dev.gbif.org/issues/browse/POR-2828
With best regards,
Jan K. Legind
Data manager, GBIF Secretariat
From: API-users [mailto:api-users-bounces@lists.gbif.org] On Behalf Of
Eduardo Dalcin
Sent: 2. september 2015 20:06 To: api-users@lists.gbif.org; dev@gbif.org Cc: João Monnerat Lanna; Natália Queiroz; Diogo Silva; Laura; Ricardo
Avancini
Subject: [API-users] Some questions from a begginer
Hi folks,
This is my first message to the list. So, please, be nice :)
I'm working here at Rio de Janeiro Botanical Garden, together with the
guys at the National Center for Flora Conservation. We are doing the risk assessment of the Brazilian flora to the government. We assess, so far, the risk of ca. 6.000 species, but we still have to assess ca. 35.000. Access occurrence records for Brazil is crucial, and every occurrence is important.
That means that we have to put together occurrence data from different
sources and, after the first batch of the risk assessment, we realize that we need to build up our aggregator. We are planning to do this with the Lontra-harvester, with the help of the guys at Brazilian GBIF Node.
So, the one of the firsts steps was to list the available resources to
understand the dimension of the task and, that brings me to my questions.
First:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO...
returns 4.982.689 records
And the request:
http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&...
returns (here) 7.406.310 records
Comments?
Second:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO...
return things like this:
"197908d0-5565-11d8-b290-b8a03c50a862":27629
But the consult of the same dataset:
http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5...
Returns "null" (of course, is a FishBase!)
I have plenty of examples like this, on yellow here (not finished!):
https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYy...
Comments?
I think those two questions is a good start. Please, let me know if I'm
doing something wrong.
Cheers,
Eduardo
Eduardo Dalcin
Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
e-mail: edalcin@jbrj.gov.br
Trabalho / Work: +55 21 3204 2116
e-mail alternativo / alternate email: edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
I'm kind of seconding Rod here.
It might make more sense, depending on your use case and local computer resources, to just get a download of Plantae *AND* Brazil from GBIF periodically, then process that to exclude existing Brazilian datasets. You could then use something like Apache hadoop / spark to efficiently split the file by dataset or by institution code.
This would greatly simplify your interactions with GBIF (down to just periodically generating a download programmatically) and you would have an easy place to insert any additional data transformations you want. This is the path i take for my work at least - the incremental cost of a couple million more records is worth the reduction in complexity overall.
- Alex
On 09/09/2015 12:16 PM, Eduardo Dalcin wrote:
Hi Rod,
The real purpose is to have a list of UUID and the "source web page" for the data set. Thus, one way to do it is to select those resources that counts <> 0 for PLANTAE *AND* Brazil.
I don't want to do any stats analysis, but feed up one local harverster / agregator.
The problem is, considering the reply from Jan Legind at Sep 3, we have to check one by one (https://goo.gl/3wysaA) to check if it is a Herbarium / Preserved Specimen (Plantae) or not, from the request http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&....
Does it make sense?
Thanks for your curiosity! :)
Cheers,
Eduardo
*Eduardo Dalcin https://mailtrack.io/trace/link/5516ed5e4f903c6ee9bd9fb3876fb65ffffc687c?url=http%3A%2F%2Feduardo.dalc.in&signature=cda9e9bf584a828c* **Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br mailto:edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
*e-mail alternativo / **alternate email:**edalcin@jbrj.org
mailto:edalcin@jbrj.org*
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/3a5eaa1df56016285886497766577e5357ddc6c1?url=http%3A%2F%2Fagendar.dalc.in&signature=c4e8d8113c34937f
On Mon, Sep 7, 2015 at 12:33 PM, Roderic Page <Roderic.Page@glasgow.ac.uk mailto:Roderic.Page@glasgow.ac.uk> wrote:
Hi Eduardo, I’m curious, is the purpose to get counts by dataset by country, or to get all the plant occurrences for Brazil? The later can be obtained by downloading all plant occurrences in Brazil http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR (you could then compute the per-dataset stats locally). I realise that this isn’t as convenient as having GBIF slice the data for you in the API. Regards Rod --------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK Email: Roderic.Page@glasgow.ac.uk <mailto:Roderic.Page@glasgow.ac.uk> Tel: +44 141 330 4778 <tel:%2B44%20141%20330%204778> Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com ORCID: http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ResearchGatehttps://www.researchgate.net/profile/Roderic_Page
On 4 Sep 2015, at 10:39, Eduardo Dalcin <edalcin@jbrj.org <mailto:edalcin@jbrj.org>> wrote: Hi Markus, Yes, that's a shame I can't have country and "nub" together. There is any hope about it? Eduardo -------------------------------- *Eduardo Dalcin <https://mailtrack.io/trace/link/bac23864202354f3789938ce352a878faa0cd8b8?url=http%3A%2F%2Feduardo.dalc.in&signature=aea58ef6f439535b>* **Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br <mailto:edalcin@jbrj.gov.br> Trabalho / Work: +55 21 3204 2116 <tel:%2B55%2021%203204%202116> -------------------------------- *e-mail alternativo / **alternate email:**edalcin@jbrj.org <mailto:edalcin@jbrj.org>* -------------------------------- Agendar reunião / Schedule a meeting: http://agendar.dalc.in <https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5> On Thu, Sep 3, 2015 at 4:29 PM, Markus Döring <mdoering@gbif.org <mailto:mdoering@gbif.org>> wrote: Eduardo, as you might have seen from my issue comment the webservice uses a different parameter name for taxonKey which is a bug we need to fix at some point. Please use nubKey for now to use the service like that: http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6 The real problem for you will be that we do not support the combination of the country and the taxon filter, just one of the two. So you cannot search for plants in Brazil I am afraid, just for datasets about Brazil and datasets with plant records. Markus > On 03 Sep 2015, at 14:12, Eduardo Dalcin <edalcin@jbrj.org <mailto:edalcin@jbrj.org>> wrote: > > Thanks Jan. I'll keep exploring and I'll be in touch, if I need. > > Best, > > Eduardo > > > > -------------------------------- > Eduardo Dalcin > Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ > e-mail: edalcin@jbrj.gov.br <mailto:edalcin@jbrj.gov.br> > Trabalho / Work: +55 21 3204 2116 <tel:%2B55%2021%203204%202116> > -------------------------------- > e-mail alternativo / alternate email: edalcin@jbrj.org <mailto:edalcin@jbrj.org> > -------------------------------- > Agendar reunião / Schedule a meeting: http://agendar.dalc.in <https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5> > > On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF] <jlegind@gbif.org <mailto:jlegind@gbif.org>> wrote: > Dear Eduardo, > > > > Thanks for getting in touch with us about these issues. > > > > The first request http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN returns the number of records located in Brazil for the facets in the request. > > The second query http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN uses the Occurrence Inventories web service http://www.gbif.org/developer/occurrence#inventories which does not support the basis-of-record facet in the /datasets request. I understand that it would be better if the API response yielded an error message in this instance. > > > > Concerning the other issues – you are indeed right that the counts do not make sense in the context of taxon key 6 which is Plantae. Actually the API does not handle the taxonKey search at all, contrary to what the documentation states: > > > > /occurrence/counts/datasets > > GET > > Counts > > Lists occurrence counts for datasets that cover a given taxon or country. > > country, taxonKey > > > > As you can see here, http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 , this request doesn’t return anything. > > > > The GBIF developers will handle this issue in due time. > > You can follow the issue in our bug tracking service here: http://dev.gbif.org/issues/browse/POR-2828 > > > > > > With best regards, > > > > Jan K. Legind > > Data manager, GBIF Secretariat > > > > > > From: API-users [mailto:api-users-bounces@lists.gbif.org <mailto:api-users-bounces@lists.gbif.org>] On Behalf Of Eduardo Dalcin > Sent: 2. september 2015 20:06 > To: api-users@lists.gbif.org <mailto:api-users@lists.gbif.org>; dev@gbif.org <mailto:dev@gbif.org> > Cc: João Monnerat Lanna; Natália Queiroz; Diogo Silva; Laura; Ricardo Avancini > Subject: [API-users] Some questions from a begginer > > > > Hi folks, > > > > This is my first message to the list. So, please, be nice :) > > > > I'm working here at Rio de Janeiro Botanical Garden, together with the guys at the National Center for Flora Conservation. We are doing the risk assessment of the Brazilian flora to the government. We assess, so far, the risk of ca. 6.000 species, but we still have to assess ca. 35.000. Access occurrence records for Brazil is crucial, and every occurrence is important. > > > > That means that we have to put together occurrence data from different sources and, after the first batch of the risk assessment, we realize that we need to build up our aggregator. We are planning to do this with the Lontra-harvester, with the help of the guys at Brazilian GBIF Node. > > > > So, the one of the firsts steps was to list the available resources to understand the dimension of the task and, that brings me to my questions. > > > > First: > > > > The request: > > > > http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN > > > > returns 4.982.689 records > > > > And the request: > > > > http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN > > > > returns (here) 7.406.310 records > > > > Comments? > > > > Second: > > > > The request: > > > > http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN > > > > return things like this: > > > > "197908d0-5565-11d8-b290-b8a03c50a862":27629 > > > But the consult of the same dataset: > > > > http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5565-11d8-b290-b8a03c50a862 > > > > Returns "null" (of course, is a FishBase!) > > > > I have plenty of examples like this, on yellow here (not finished!): > > > > https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYyj459VTc/edit?usp=sharing > > > > Comments? > > > > I think those two questions is a good start. Please, let me know if I'm doing something wrong. > > > > Cheers, > > > > Eduardo > > -------------------------------- > > Eduardo Dalcin > > Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ > > e-mail: edalcin@jbrj.gov.br <mailto:edalcin@jbrj.gov.br> > > Trabalho / Work: +55 21 3204 2116 <tel:%2B55%2021%203204%202116> > > -------------------------------- > > e-mail alternativo / alternate email: edalcin@jbrj.org <mailto:edalcin@jbrj.org> > > -------------------------------- > > Agendar reunião / Schedule a meeting: http://agendar.dalc.in <https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5> > > > > _______________________________________________ API-users mailing list API-users@lists.gbif.org <mailto:API-users@lists.gbif.org> http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
Note that the R client rgbif does interface with the GBIF download API in addition to the search API - making it easier to deal with larger datasets. This works even if you downloaded bulk data from the GBIF GUI. Ignore this if you don't use R :)
Best, S
On Wed, Sep 9, 2015 at 10:35 AM Alex Thompson godfoder@acis.ufl.edu wrote:
I'm kind of seconding Rod here.
It might make more sense, depending on your use case and local computer resources, to just get a download of Plantae *AND* Brazil from GBIF periodically, then process that to exclude existing Brazilian datasets. You could then use something like Apache hadoop / spark to efficiently split the file by dataset or by institution code.
This would greatly simplify your interactions with GBIF (down to just periodically generating a download programmatically) and you would have an easy place to insert any additional data transformations you want. This is the path i take for my work at least - the incremental cost of a couple million more records is worth the reduction in complexity overall.
- Alex
On 09/09/2015 12:16 PM, Eduardo Dalcin wrote:
Hi Rod,
The real purpose is to have a list of UUID and the "source web page" for the data set. Thus, one way to do it is to select those resources that counts <> 0 for PLANTAE *AND* Brazil.
I don't want to do any stats analysis, but feed up one local harverster / agregator.
The problem is, considering the reply from Jan Legind at Sep 3, we have to check one by one (https://goo.gl/3wysaA) to check if it is a Herbarium / Preserved Specimen (Plantae) or not, from the request http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... .
Does it make sense?
Thanks for your curiosity! :)
Cheers,
Eduardo
*Eduardo Dalcin https://mailtrack.io/trace/link/5516ed5e4f903c6ee9bd9fb3876fb65ffffc687c?url=http%3A%2F%2Feduardo.dalc.in&signature=cda9e9bf584a828c* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
*e-mail alternativo / * *alternate email:** edalcin@jbrj.org
edalcin@jbrj.org*
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/3a5eaa1df56016285886497766577e5357ddc6c1?url=http%3A%2F%2Fagendar.dalc.in&signature=c4e8d8113c34937f
On Mon, Sep 7, 2015 at 12:33 PM, Roderic Page Roderic.Page@glasgow.ac.uk wrote:
Hi Eduardo,
I’m curious, is the purpose to get counts by dataset by country, or to get all the plant occurrences for Brazil? The later can be obtained by downloading all plant occurrences in Brazil http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR (you could then compute the per-dataset stats locally). I realise that this isn’t as convenient as having GBIF slice the data for you in the API.
Regards
Rod
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778 <%2B44%20141%20330%204778> Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com ORCID: http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ResearchGate https://www.researchgate.net/profile/Roderic_Page
On 4 Sep 2015, at 10:39, Eduardo Dalcin edalcin@jbrj.org wrote:
Hi Markus,
Yes, that's a shame I can't have country and "nub" together. There is any hope about it?
Eduardo
*Eduardo Dalcin https://mailtrack.io/trace/link/bac23864202354f3789938ce352a878faa0cd8b8?url=http%3A%2F%2Feduardo.dalc.in&signature=aea58ef6f439535b* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
*e-mail alternativo / * *alternate email:** edalcin@jbrj.org
edalcin@jbrj.org*
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5
On Thu, Sep 3, 2015 at 4:29 PM, Markus Döring mdoering@gbif.org wrote:
Eduardo,
as you might have seen from my issue comment the webservice uses a different parameter name for taxonKey which is a bug we need to fix at some point. Please use nubKey for now to use the service like that:
http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6
The real problem for you will be that we do not support the combination of the country and the taxon filter, just one of the two. So you cannot search for plants in Brazil I am afraid, just for datasets about Brazil and datasets with plant records.
Markus
On 03 Sep 2015, at 14:12, Eduardo Dalcin edalcin@jbrj.org wrote:
Thanks Jan. I'll keep exploring and I'll be in touch, if I need.
Best,
Eduardo
Eduardo Dalcin Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
e-mail alternativo / alternate email: edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in
On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF] jlegind@gbif.org
wrote:
Dear Eduardo,
Thanks for getting in touch with us about these issues.
The first request
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... returns the number of records located in Brazil for the facets in the request.
The second query
http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... uses the Occurrence Inventories web service http://www.gbif.org/developer/occurrence#inventories which does not support the basis-of-record facet in the /datasets request. I understand that it would be better if the API response yielded an error message in this instance.
Concerning the other issues – you are indeed right that the counts do
not make sense in the context of taxon key 6 which is Plantae. Actually the API does not handle the taxonKey search at all, contrary to what the documentation states:
/occurrence/counts/datasets
GET
Counts
Lists occurrence counts for datasets that cover a given taxon or
country.
country, taxonKey
As you can see here,
http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 , this request doesn’t return anything.
The GBIF developers will handle this issue in due time.
You can follow the issue in our bug tracking service here:
http://dev.gbif.org/issues/browse/POR-2828
With best regards,
Jan K. Legind
Data manager, GBIF Secretariat
From: API-users [mailto:api-users-bounces@lists.gbif.org] On Behalf
Of Eduardo Dalcin
Sent: 2. september 2015 20:06 To: api-users@lists.gbif.org; dev@gbif.org Cc: João Monnerat Lanna; Natália Queiroz; Diogo Silva; Laura; Ricardo
Avancini
Subject: [API-users] Some questions from a begginer
Hi folks,
This is my first message to the list. So, please, be nice :)
I'm working here at Rio de Janeiro Botanical Garden, together with the
guys at the National Center for Flora Conservation. We are doing the risk assessment of the Brazilian flora to the government. We assess, so far, the risk of ca. 6.000 species, but we still have to assess ca. 35.000. Access occurrence records for Brazil is crucial, and every occurrence is important.
That means that we have to put together occurrence data from different
sources and, after the first batch of the risk assessment, we realize that we need to build up our aggregator. We are planning to do this with the Lontra-harvester, with the help of the guys at Brazilian GBIF Node.
So, the one of the firsts steps was to list the available resources to
understand the dimension of the task and, that brings me to my questions.
First:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO...
returns 4.982.689 records
And the request:
http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&...
returns (here) 7.406.310 records
Comments?
Second:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO...
return things like this:
"197908d0-5565-11d8-b290-b8a03c50a862":27629
But the consult of the same dataset:
http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5...
Returns "null" (of course, is a FishBase!)
I have plenty of examples like this, on yellow here (not finished!):
https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYy...
Comments?
I think those two questions is a good start. Please, let me know if
I'm doing something wrong.
Cheers,
Eduardo
Eduardo Dalcin
Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
e-mail: edalcin@jbrj.gov.br
Trabalho / Work: +55 21 3204 2116
e-mail alternativo / alternate email: edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing listAPI-users@lists.gbif.orghttp://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
Thanks Scott. I don't use R, but I'm sure one of my students do ;)
It might worth have a look at.
Best,
Eduardo
-------------------------------- *Eduardo Dalcin https://mailtrack.io/trace/link/f31394dcc9db21c7fb1b115433752048065831ff?url=http%3A%2F%2Feduardo.dalc.in&signature=67ea20046acedc75* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116 -------------------------------- *e-mail alternativo / * *alternate email:** edalcin@jbrj.org edalcin@jbrj.org* -------------------------------- Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/a6c1a2bd7f1ec3a1dc44f4dc96d635f396a4d942?url=http%3A%2F%2Fagendar.dalc.in&signature=d51db86326ecdcad
On Wed, Sep 9, 2015 at 2:44 PM, Scott Chamberlain scott@ropensci.org wrote:
Note that the R client rgbif does interface with the GBIF download API in addition to the search API - making it easier to deal with larger datasets. This works even if you downloaded bulk data from the GBIF GUI. Ignore this if you don't use R :)
Best, S
On Wed, Sep 9, 2015 at 10:35 AM Alex Thompson godfoder@acis.ufl.edu wrote:
I'm kind of seconding Rod here.
It might make more sense, depending on your use case and local computer resources, to just get a download of Plantae *AND* Brazil from GBIF periodically, then process that to exclude existing Brazilian datasets. You could then use something like Apache hadoop / spark to efficiently split the file by dataset or by institution code.
This would greatly simplify your interactions with GBIF (down to just periodically generating a download programmatically) and you would have an easy place to insert any additional data transformations you want. This is the path i take for my work at least - the incremental cost of a couple million more records is worth the reduction in complexity overall.
- Alex
On 09/09/2015 12:16 PM, Eduardo Dalcin wrote:
Hi Rod,
The real purpose is to have a list of UUID and the "source web page" for the data set. Thus, one way to do it is to select those resources that counts <> 0 for PLANTAE *AND* Brazil.
I don't want to do any stats analysis, but feed up one local harverster / agregator.
The problem is, considering the reply from Jan Legind at Sep 3, we have to check one by one (https://goo.gl/3wysaA) to check if it is a Herbarium / Preserved Specimen (Plantae) or not, from the request http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... .
Does it make sense?
Thanks for your curiosity! :)
Cheers,
Eduardo
*Eduardo Dalcin https://mailtrack.io/trace/link/5516ed5e4f903c6ee9bd9fb3876fb65ffffc687c?url=http%3A%2F%2Feduardo.dalc.in&signature=cda9e9bf584a828c* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
*e-mail alternativo / * *alternate email:** edalcin@jbrj.org
edalcin@jbrj.org*
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/3a5eaa1df56016285886497766577e5357ddc6c1?url=http%3A%2F%2Fagendar.dalc.in&signature=c4e8d8113c34937f
On Mon, Sep 7, 2015 at 12:33 PM, Roderic Page <Roderic.Page@glasgow.ac.uk
wrote:
Hi Eduardo,
I’m curious, is the purpose to get counts by dataset by country, or to get all the plant occurrences for Brazil? The later can be obtained by downloading all plant occurrences in Brazil http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR (you could then compute the per-dataset stats locally). I realise that this isn’t as convenient as having GBIF slice the data for you in the API.
Regards
Rod
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778 <%2B44%20141%20330%204778> Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com ORCID: http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ResearchGate https://www.researchgate.net/profile/Roderic_Page
On 4 Sep 2015, at 10:39, Eduardo Dalcin edalcin@jbrj.org wrote:
Hi Markus,
Yes, that's a shame I can't have country and "nub" together. There is any hope about it?
Eduardo
*Eduardo Dalcin https://mailtrack.io/trace/link/bac23864202354f3789938ce352a878faa0cd8b8?url=http%3A%2F%2Feduardo.dalc.in&signature=aea58ef6f439535b* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
*e-mail alternativo / * *alternate email:** edalcin@jbrj.org
edalcin@jbrj.org*
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5
On Thu, Sep 3, 2015 at 4:29 PM, Markus Döring mdoering@gbif.org wrote:
Eduardo,
as you might have seen from my issue comment the webservice uses a different parameter name for taxonKey which is a bug we need to fix at some point. Please use nubKey for now to use the service like that:
http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6
The real problem for you will be that we do not support the combination of the country and the taxon filter, just one of the two. So you cannot search for plants in Brazil I am afraid, just for datasets about Brazil and datasets with plant records.
Markus
On 03 Sep 2015, at 14:12, Eduardo Dalcin edalcin@jbrj.org wrote:
Thanks Jan. I'll keep exploring and I'll be in touch, if I need.
Best,
Eduardo
Eduardo Dalcin Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
e-mail alternativo / alternate email: edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in
On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF] jlegind@gbif.org
wrote:
Dear Eduardo,
Thanks for getting in touch with us about these issues.
The first request
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... returns the number of records located in Brazil for the facets in the request.
The second query
http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... uses the Occurrence Inventories web service http://www.gbif.org/developer/occurrence#inventories which does not support the basis-of-record facet in the /datasets request. I understand that it would be better if the API response yielded an error message in this instance.
Concerning the other issues – you are indeed right that the counts do
not make sense in the context of taxon key 6 which is Plantae. Actually the API does not handle the taxonKey search at all, contrary to what the documentation states:
/occurrence/counts/datasets
GET
Counts
Lists occurrence counts for datasets that cover a given taxon or
country.
country, taxonKey
As you can see here,
http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 , this request doesn’t return anything.
The GBIF developers will handle this issue in due time.
You can follow the issue in our bug tracking service here:
http://dev.gbif.org/issues/browse/POR-2828
With best regards,
Jan K. Legind
Data manager, GBIF Secretariat
From: API-users [mailto:api-users-bounces@lists.gbif.org] On Behalf
Of Eduardo Dalcin
Sent: 2. september 2015 20:06 To: api-users@lists.gbif.org; dev@gbif.org Cc: João Monnerat Lanna; Natália Queiroz; Diogo Silva; Laura; Ricardo
Avancini
Subject: [API-users] Some questions from a begginer
Hi folks,
This is my first message to the list. So, please, be nice :)
I'm working here at Rio de Janeiro Botanical Garden, together with
the guys at the National Center for Flora Conservation. We are doing the risk assessment of the Brazilian flora to the government. We assess, so far, the risk of ca. 6.000 species, but we still have to assess ca. 35.000. Access occurrence records for Brazil is crucial, and every occurrence is important.
That means that we have to put together occurrence data from
different sources and, after the first batch of the risk assessment, we realize that we need to build up our aggregator. We are planning to do this with the Lontra-harvester, with the help of the guys at Brazilian GBIF Node.
So, the one of the firsts steps was to list the available resources
to understand the dimension of the task and, that brings me to my questions.
First:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO...
returns 4.982.689 records
And the request:
http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&...
returns (here) 7.406.310 records
Comments?
Second:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO...
return things like this:
"197908d0-5565-11d8-b290-b8a03c50a862":27629
But the consult of the same dataset:
http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5...
Returns "null" (of course, is a FishBase!)
I have plenty of examples like this, on yellow here (not finished!):
https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYy...
Comments?
I think those two questions is a good start. Please, let me know if
I'm doing something wrong.
Cheers,
Eduardo
Eduardo Dalcin
Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
e-mail: edalcin@jbrj.gov.br
Trabalho / Work: +55 21 3204 2116
e-mail alternativo / alternate email: edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing listAPI-users@lists.gbif.orghttp://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
Scott,
That's my very point - that using R and rgbif should be the best path to take in this case, both because of the easier access to the GBIF API provided by rgbif and the HUGE data analytical capabilities of R itself. I had been working on a paper discussing this in the context of conservation databases (using R/rgbif and a Red-Listed group of mammals as an exemple), but unfortunately this work has been delayed by unexpected health problems. Hope it can be the light someday, however.
Best regards, Em 09/09/2015 14:44, "Scott Chamberlain" scott@ropensci.org escreveu:
Note that the R client rgbif does interface with the GBIF download API in addition to the search API - making it easier to deal with larger datasets. This works even if you downloaded bulk data from the GBIF GUI. Ignore this if you don't use R :)
Best, S
On Wed, Sep 9, 2015 at 10:35 AM Alex Thompson godfoder@acis.ufl.edu wrote:
I'm kind of seconding Rod here.
It might make more sense, depending on your use case and local computer resources, to just get a download of Plantae *AND* Brazil from GBIF periodically, then process that to exclude existing Brazilian datasets. You could then use something like Apache hadoop / spark to efficiently split the file by dataset or by institution code.
This would greatly simplify your interactions with GBIF (down to just periodically generating a download programmatically) and you would have an easy place to insert any additional data transformations you want. This is the path i take for my work at least - the incremental cost of a couple million more records is worth the reduction in complexity overall.
- Alex
On 09/09/2015 12:16 PM, Eduardo Dalcin wrote:
Hi Rod,
The real purpose is to have a list of UUID and the "source web page" for the data set. Thus, one way to do it is to select those resources that counts <> 0 for PLANTAE *AND* Brazil.
I don't want to do any stats analysis, but feed up one local harverster / agregator.
The problem is, considering the reply from Jan Legind at Sep 3, we have to check one by one (https://goo.gl/3wysaA) to check if it is a Herbarium / Preserved Specimen (Plantae) or not, from the request http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... .
Does it make sense?
Thanks for your curiosity! :)
Cheers,
Eduardo
*Eduardo Dalcin https://mailtrack.io/trace/link/5516ed5e4f903c6ee9bd9fb3876fb65ffffc687c?url=http%3A%2F%2Feduardo.dalc.in&signature=cda9e9bf584a828c* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
*e-mail alternativo / * *alternate email:** edalcin@jbrj.org
edalcin@jbrj.org*
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/3a5eaa1df56016285886497766577e5357ddc6c1?url=http%3A%2F%2Fagendar.dalc.in&signature=c4e8d8113c34937f
On Mon, Sep 7, 2015 at 12:33 PM, Roderic Page <Roderic.Page@glasgow.ac.uk
wrote:
Hi Eduardo,
I’m curious, is the purpose to get counts by dataset by country, or to get all the plant occurrences for Brazil? The later can be obtained by downloading all plant occurrences in Brazil http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR (you could then compute the per-dataset stats locally). I realise that this isn’t as convenient as having GBIF slice the data for you in the API.
Regards
Rod
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778 <%2B44%20141%20330%204778> Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com ORCID: http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ResearchGate https://www.researchgate.net/profile/Roderic_Page
On 4 Sep 2015, at 10:39, Eduardo Dalcin edalcin@jbrj.org wrote:
Hi Markus,
Yes, that's a shame I can't have country and "nub" together. There is any hope about it?
Eduardo
*Eduardo Dalcin https://mailtrack.io/trace/link/bac23864202354f3789938ce352a878faa0cd8b8?url=http%3A%2F%2Feduardo.dalc.in&signature=aea58ef6f439535b* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
*e-mail alternativo / * *alternate email:** edalcin@jbrj.org
edalcin@jbrj.org*
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5
On Thu, Sep 3, 2015 at 4:29 PM, Markus Döring mdoering@gbif.org wrote:
Eduardo,
as you might have seen from my issue comment the webservice uses a different parameter name for taxonKey which is a bug we need to fix at some point. Please use nubKey for now to use the service like that:
http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6
The real problem for you will be that we do not support the combination of the country and the taxon filter, just one of the two. So you cannot search for plants in Brazil I am afraid, just for datasets about Brazil and datasets with plant records.
Markus
On 03 Sep 2015, at 14:12, Eduardo Dalcin edalcin@jbrj.org wrote:
Thanks Jan. I'll keep exploring and I'll be in touch, if I need.
Best,
Eduardo
Eduardo Dalcin Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
e-mail alternativo / alternate email: edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in
On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF] jlegind@gbif.org
wrote:
Dear Eduardo,
Thanks for getting in touch with us about these issues.
The first request
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... returns the number of records located in Brazil for the facets in the request.
The second query
http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... uses the Occurrence Inventories web service http://www.gbif.org/developer/occurrence#inventories which does not support the basis-of-record facet in the /datasets request. I understand that it would be better if the API response yielded an error message in this instance.
Concerning the other issues – you are indeed right that the counts do
not make sense in the context of taxon key 6 which is Plantae. Actually the API does not handle the taxonKey search at all, contrary to what the documentation states:
/occurrence/counts/datasets
GET
Counts
Lists occurrence counts for datasets that cover a given taxon or
country.
country, taxonKey
As you can see here,
http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 , this request doesn’t return anything.
The GBIF developers will handle this issue in due time.
You can follow the issue in our bug tracking service here:
http://dev.gbif.org/issues/browse/POR-2828
With best regards,
Jan K. Legind
Data manager, GBIF Secretariat
From: API-users [mailto:api-users-bounces@lists.gbif.org] On Behalf
Of Eduardo Dalcin
Sent: 2. september 2015 20:06 To: api-users@lists.gbif.org; dev@gbif.org Cc: João Monnerat Lanna; Natália Queiroz; Diogo Silva; Laura; Ricardo
Avancini
Subject: [API-users] Some questions from a begginer
Hi folks,
This is my first message to the list. So, please, be nice :)
I'm working here at Rio de Janeiro Botanical Garden, together with
the guys at the National Center for Flora Conservation. We are doing the risk assessment of the Brazilian flora to the government. We assess, so far, the risk of ca. 6.000 species, but we still have to assess ca. 35.000. Access occurrence records for Brazil is crucial, and every occurrence is important.
That means that we have to put together occurrence data from
different sources and, after the first batch of the risk assessment, we realize that we need to build up our aggregator. We are planning to do this with the Lontra-harvester, with the help of the guys at Brazilian GBIF Node.
So, the one of the firsts steps was to list the available resources
to understand the dimension of the task and, that brings me to my questions.
First:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO...
returns 4.982.689 records
And the request:
http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&...
returns (here) 7.406.310 records
Comments?
Second:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO...
return things like this:
"197908d0-5565-11d8-b290-b8a03c50a862":27629
But the consult of the same dataset:
http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5...
Returns "null" (of course, is a FishBase!)
I have plenty of examples like this, on yellow here (not finished!):
https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYy...
Comments?
I think those two questions is a good start. Please, let me know if
I'm doing something wrong.
Cheers,
Eduardo
Eduardo Dalcin
Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
e-mail: edalcin@jbrj.gov.br
Trabalho / Work: +55 21 3204 2116
e-mail alternativo / alternate email: edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing listAPI-users@lists.gbif.orghttp://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
Hi Eduardo (et al.),
If I understand correctly, the list at https://goo.gl/3wysaA shows the resources with data from Brazil and you want to filter out those with records other than Plants, am I right? Have you considered using OpenRefine (http://openrefine.org/) for this task? OpenRefine has a service to fetch URLs built based on data from other columns, which plays very well with GBIF APIs. You can make the program dinamically build the API request URL based on the dataset UUID, and fetch and parse the JSON response, without having to download the data and without having to code almost anything. The way I would go here is:
1. Create a column based off of the value in column A of your table, to extract just the dataset UUID 2. Create a new column fetching the GBIF API, adding the value in the previous column to a template URL: http://api.gbif.org/v1/occurrence/search?TAXON_KEY=6&limit=1&DATASET... <value>. The "limit:1" part makes things faster by avoiding having to show the default 20 records in the column 3. Create yet another column parsing the JSON result from the previous column, extracting just the value in the field "count". The result is the number of plant records in that dataset (therefore, resources such as FishBase will have a value of zero)
Actually, you can add as many columns as you want, with as many API calls, to fill the rest of the fields in your table. Using the "registry" API, you can get the title, external data link and the protocol (IPT, DiGIR...).
Hope this helps. Let me know if you are interested in this approach and need more help using OpenRefine. Cheers!
Javier Otegui http://www.jotegui.com
On Wed, Sep 9, 2015 at 8:07 PM, Mauro Cavalcanti maurobio@gmail.com wrote:
Scott,
That's my very point - that using R and rgbif should be the best path to take in this case, both because of the easier access to the GBIF API provided by rgbif and the HUGE data analytical capabilities of R itself. I had been working on a paper discussing this in the context of conservation databases (using R/rgbif and a Red-Listed group of mammals as an exemple), but unfortunately this work has been delayed by unexpected health problems. Hope it can be the light someday, however.
Best regards, Em 09/09/2015 14:44, "Scott Chamberlain" scott@ropensci.org escreveu:
Note that the R client rgbif does interface with the GBIF download API in addition to the search API - making it easier to deal with larger datasets. This works even if you downloaded bulk data from the GBIF GUI. Ignore this if you don't use R :)
Best, S
On Wed, Sep 9, 2015 at 10:35 AM Alex Thompson godfoder@acis.ufl.edu wrote:
I'm kind of seconding Rod here.
It might make more sense, depending on your use case and local computer resources, to just get a download of Plantae *AND* Brazil from GBIF periodically, then process that to exclude existing Brazilian datasets. You could then use something like Apache hadoop / spark to efficiently split the file by dataset or by institution code.
This would greatly simplify your interactions with GBIF (down to just periodically generating a download programmatically) and you would have an easy place to insert any additional data transformations you want. This is the path i take for my work at least - the incremental cost of a couple million more records is worth the reduction in complexity overall.
- Alex
On 09/09/2015 12:16 PM, Eduardo Dalcin wrote:
Hi Rod,
The real purpose is to have a list of UUID and the "source web page" for the data set. Thus, one way to do it is to select those resources that counts <> 0 for PLANTAE *AND* Brazil.
I don't want to do any stats analysis, but feed up one local harverster / agregator.
The problem is, considering the reply from Jan Legind at Sep 3, we have to check one by one (https://goo.gl/3wysaA) to check if it is a Herbarium / Preserved Specimen (Plantae) or not, from the request http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... .
Does it make sense?
Thanks for your curiosity! :)
Cheers,
Eduardo
*Eduardo Dalcin https://mailtrack.io/trace/link/5516ed5e4f903c6ee9bd9fb3876fb65ffffc687c?url=http%3A%2F%2Feduardo.dalc.in&signature=cda9e9bf584a828c* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
*e-mail alternativo / * *alternate email:** edalcin@jbrj.org
edalcin@jbrj.org*
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/3a5eaa1df56016285886497766577e5357ddc6c1?url=http%3A%2F%2Fagendar.dalc.in&signature=c4e8d8113c34937f
On Mon, Sep 7, 2015 at 12:33 PM, Roderic Page < Roderic.Page@glasgow.ac.uk> wrote:
Hi Eduardo,
I’m curious, is the purpose to get counts by dataset by country, or to get all the plant occurrences for Brazil? The later can be obtained by downloading all plant occurrences in Brazil http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR (you could then compute the per-dataset stats locally). I realise that this isn’t as convenient as having GBIF slice the data for you in the API.
Regards
Rod
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778 <%2B44%20141%20330%204778> Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com ORCID: http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ResearchGate https://www.researchgate.net/profile/Roderic_Page
On 4 Sep 2015, at 10:39, Eduardo Dalcin edalcin@jbrj.org wrote:
Hi Markus,
Yes, that's a shame I can't have country and "nub" together. There is any hope about it?
Eduardo
*Eduardo Dalcin https://mailtrack.io/trace/link/bac23864202354f3789938ce352a878faa0cd8b8?url=http%3A%2F%2Feduardo.dalc.in&signature=aea58ef6f439535b* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
*e-mail alternativo / * *alternate email:** edalcin@jbrj.org
edalcin@jbrj.org*
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5
On Thu, Sep 3, 2015 at 4:29 PM, Markus Döring mdoering@gbif.org wrote:
Eduardo,
as you might have seen from my issue comment the webservice uses a different parameter name for taxonKey which is a bug we need to fix at some point. Please use nubKey for now to use the service like that:
http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6
The real problem for you will be that we do not support the combination of the country and the taxon filter, just one of the two. So you cannot search for plants in Brazil I am afraid, just for datasets about Brazil and datasets with plant records.
Markus
On 03 Sep 2015, at 14:12, Eduardo Dalcin edalcin@jbrj.org wrote:
Thanks Jan. I'll keep exploring and I'll be in touch, if I need.
Best,
Eduardo
Eduardo Dalcin Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
e-mail alternativo / alternate email: edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in
On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF] jlegind@gbif.org
wrote:
Dear Eduardo,
Thanks for getting in touch with us about these issues.
The first request
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... returns the number of records located in Brazil for the facets in the request.
The second query
http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... uses the Occurrence Inventories web service http://www.gbif.org/developer/occurrence#inventories which does not support the basis-of-record facet in the /datasets request. I understand that it would be better if the API response yielded an error message in this instance.
Concerning the other issues – you are indeed right that the counts
do not make sense in the context of taxon key 6 which is Plantae. Actually the API does not handle the taxonKey search at all, contrary to what the documentation states:
/occurrence/counts/datasets
GET
Counts
Lists occurrence counts for datasets that cover a given taxon or
country.
country, taxonKey
As you can see here,
http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 , this request doesn’t return anything.
The GBIF developers will handle this issue in due time.
You can follow the issue in our bug tracking service here:
http://dev.gbif.org/issues/browse/POR-2828
With best regards,
Jan K. Legind
Data manager, GBIF Secretariat
From: API-users [mailto:api-users-bounces@lists.gbif.org] On Behalf
Of Eduardo Dalcin
Sent: 2. september 2015 20:06 To: api-users@lists.gbif.org; dev@gbif.org Cc: João Monnerat Lanna; Natália Queiroz; Diogo Silva; Laura;
Ricardo Avancini
Subject: [API-users] Some questions from a begginer
Hi folks,
This is my first message to the list. So, please, be nice :)
I'm working here at Rio de Janeiro Botanical Garden, together with
the guys at the National Center for Flora Conservation. We are doing the risk assessment of the Brazilian flora to the government. We assess, so far, the risk of ca. 6.000 species, but we still have to assess ca. 35.000. Access occurrence records for Brazil is crucial, and every occurrence is important.
That means that we have to put together occurrence data from
different sources and, after the first batch of the risk assessment, we realize that we need to build up our aggregator. We are planning to do this with the Lontra-harvester, with the help of the guys at Brazilian GBIF Node.
So, the one of the firsts steps was to list the available resources
to understand the dimension of the task and, that brings me to my questions.
First:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO...
returns 4.982.689 records
And the request:
http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&...
returns (here) 7.406.310 records
Comments?
Second:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO...
return things like this:
"197908d0-5565-11d8-b290-b8a03c50a862":27629
But the consult of the same dataset:
http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5...
Returns "null" (of course, is a FishBase!)
I have plenty of examples like this, on yellow here (not finished!):
https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYy...
Comments?
I think those two questions is a good start. Please, let me know if
I'm doing something wrong.
Cheers,
Eduardo
Eduardo Dalcin
Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
e-mail: edalcin@jbrj.gov.br
Trabalho / Work: +55 21 3204 2116
e-mail alternativo / alternate email: edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing listAPI-users@lists.gbif.orghttp://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
Javier,
The problem with these tools (LontraHarvest, OpenRefine, etc.) is that they are just data *retrieval* tools, not providing for data analytical and representation functionalities -- one or more different tools should be used after retrieving the data of interest for plotting maps, charts, tabulation, statistical analyses, etc. This is just where R excels, allowing to perform all these operations in a unified, straightforward workflow.
Salud!
2015-09-09 17:23 GMT-03:00 Javier Otegui javier.otegui@gmail.com:
Hi Eduardo (et al.),
If I understand correctly, the list at https://goo.gl/3wysaA shows the resources with data from Brazil and you want to filter out those with records other than Plants, am I right? Have you considered using OpenRefine (http://openrefine.org/) for this task? OpenRefine has a service to fetch URLs built based on data from other columns, which plays very well with GBIF APIs. You can make the program dinamically build the API request URL based on the dataset UUID, and fetch and parse the JSON response, without having to download the data and without having to code almost anything. The way I would go here is:
- Create a column based off of the value in column A of your table,
to extract just the dataset UUID 2. Create a new column fetching the GBIF API, adding the value in the previous column to a template URL: http://api.gbif.org/v1/occurrence/search?TAXON_KEY=6&limit=1&DATASET... <value>. The "limit:1" part makes things faster by avoiding having to show the default 20 records in the column 3. Create yet another column parsing the JSON result from the previous column, extracting just the value in the field "count". The result is the number of plant records in that dataset (therefore, resources such as FishBase will have a value of zero)
Actually, you can add as many columns as you want, with as many API calls, to fill the rest of the fields in your table. Using the "registry" API, you can get the title, external data link and the protocol (IPT, DiGIR...).
Hope this helps. Let me know if you are interested in this approach and need more help using OpenRefine. Cheers!
Javier Otegui http://www.jotegui.com
On Wed, Sep 9, 2015 at 8:07 PM, Mauro Cavalcanti maurobio@gmail.com wrote:
Scott,
That's my very point - that using R and rgbif should be the best path to take in this case, both because of the easier access to the GBIF API provided by rgbif and the HUGE data analytical capabilities of R itself. I had been working on a paper discussing this in the context of conservation databases (using R/rgbif and a Red-Listed group of mammals as an exemple), but unfortunately this work has been delayed by unexpected health problems. Hope it can be the light someday, however.
Best regards, Em 09/09/2015 14:44, "Scott Chamberlain" scott@ropensci.org escreveu:
Note that the R client rgbif does interface with the GBIF download API in addition to the search API - making it easier to deal with larger datasets. This works even if you downloaded bulk data from the GBIF GUI. Ignore this if you don't use R :)
Best, S
On Wed, Sep 9, 2015 at 10:35 AM Alex Thompson godfoder@acis.ufl.edu wrote:
I'm kind of seconding Rod here.
It might make more sense, depending on your use case and local computer resources, to just get a download of Plantae *AND* Brazil from GBIF periodically, then process that to exclude existing Brazilian datasets. You could then use something like Apache hadoop / spark to efficiently split the file by dataset or by institution code.
This would greatly simplify your interactions with GBIF (down to just periodically generating a download programmatically) and you would have an easy place to insert any additional data transformations you want. This is the path i take for my work at least - the incremental cost of a couple million more records is worth the reduction in complexity overall.
- Alex
On 09/09/2015 12:16 PM, Eduardo Dalcin wrote:
Hi Rod,
The real purpose is to have a list of UUID and the "source web page" for the data set. Thus, one way to do it is to select those resources that counts <> 0 for PLANTAE *AND* Brazil.
I don't want to do any stats analysis, but feed up one local harverster / agregator.
The problem is, considering the reply from Jan Legind at Sep 3, we have to check one by one (https://goo.gl/3wysaA) to check if it is a Herbarium / Preserved Specimen (Plantae) or not, from the request http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... .
Does it make sense?
Thanks for your curiosity! :)
Cheers,
Eduardo
*Eduardo Dalcin https://mailtrack.io/trace/link/5516ed5e4f903c6ee9bd9fb3876fb65ffffc687c?url=http%3A%2F%2Feduardo.dalc.in&signature=cda9e9bf584a828c* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
*e-mail alternativo / * *alternate email:** edalcin@jbrj.org
edalcin@jbrj.org*
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/3a5eaa1df56016285886497766577e5357ddc6c1?url=http%3A%2F%2Fagendar.dalc.in&signature=c4e8d8113c34937f
On Mon, Sep 7, 2015 at 12:33 PM, Roderic Page < Roderic.Page@glasgow.ac.uk> wrote:
Hi Eduardo,
I’m curious, is the purpose to get counts by dataset by country, or to get all the plant occurrences for Brazil? The later can be obtained by downloading all plant occurrences in Brazil http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR (you could then compute the per-dataset stats locally). I realise that this isn’t as convenient as having GBIF slice the data for you in the API.
Regards
Rod
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778 <%2B44%20141%20330%204778> Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com ORCID: http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ResearchGate https://www.researchgate.net/profile/Roderic_Page
On 4 Sep 2015, at 10:39, Eduardo Dalcin edalcin@jbrj.org wrote:
Hi Markus,
Yes, that's a shame I can't have country and "nub" together. There is any hope about it?
Eduardo
*Eduardo Dalcin https://mailtrack.io/trace/link/bac23864202354f3789938ce352a878faa0cd8b8?url=http%3A%2F%2Feduardo.dalc.in&signature=aea58ef6f439535b* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
*e-mail alternativo / * *alternate email:** edalcin@jbrj.org
edalcin@jbrj.org*
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5
On Thu, Sep 3, 2015 at 4:29 PM, Markus Döring mdoering@gbif.org wrote:
Eduardo,
as you might have seen from my issue comment the webservice uses a different parameter name for taxonKey which is a bug we need to fix at some point. Please use nubKey for now to use the service like that:
http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6
The real problem for you will be that we do not support the combination of the country and the taxon filter, just one of the two. So you cannot search for plants in Brazil I am afraid, just for datasets about Brazil and datasets with plant records.
Markus
> On 03 Sep 2015, at 14:12, Eduardo Dalcin edalcin@jbrj.org wrote: > > Thanks Jan. I'll keep exploring and I'll be in touch, if I need. > > Best, > > Eduardo > > > > -------------------------------- > Eduardo Dalcin > Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ > e-mail: edalcin@jbrj.gov.br > Trabalho / Work: +55 21 3204 2116 > -------------------------------- > e-mail alternativo / alternate email: edalcin@jbrj.org > -------------------------------- > Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5 > > On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF] jlegind@gbif.org wrote: > Dear Eduardo, > > > > Thanks for getting in touch with us about these issues. > > > > The first request http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... returns the number of records located in Brazil for the facets in the request. > > The second query http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... uses the Occurrence Inventories web service http://www.gbif.org/developer/occurrence#inventories which does not support the basis-of-record facet in the /datasets request. I understand that it would be better if the API response yielded an error message in this instance. > > > > Concerning the other issues – you are indeed right that the counts do not make sense in the context of taxon key 6 which is Plantae. Actually the API does not handle the taxonKey search at all, contrary to what the documentation states: > > > > /occurrence/counts/datasets > > GET > > Counts > > Lists occurrence counts for datasets that cover a given taxon or country. > > country, taxonKey > > > > As you can see here, http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 , this request doesn’t return anything. > > > > The GBIF developers will handle this issue in due time. > > You can follow the issue in our bug tracking service here: http://dev.gbif.org/issues/browse/POR-2828 > > > > > > With best regards, > > > > Jan K. Legind > > Data manager, GBIF Secretariat > > > > > > From: API-users [mailto:api-users-bounces@lists.gbif.org] On Behalf Of Eduardo Dalcin > Sent: 2. september 2015 20:06 > To: api-users@lists.gbif.org; dev@gbif.org > Cc: João Monnerat Lanna; Natália Queiroz; Diogo Silva; Laura; Ricardo Avancini > Subject: [API-users] Some questions from a begginer > > > > Hi folks, > > > > This is my first message to the list. So, please, be nice :) > > > > I'm working here at Rio de Janeiro Botanical Garden, together with the guys at the National Center for Flora Conservation. We are doing the risk assessment of the Brazilian flora to the government. We assess, so far, the risk of ca. 6.000 species, but we still have to assess ca. 35.000. Access occurrence records for Brazil is crucial, and every occurrence is important. > > > > That means that we have to put together occurrence data from different sources and, after the first batch of the risk assessment, we realize that we need to build up our aggregator. We are planning to do this with the Lontra-harvester, with the help of the guys at Brazilian GBIF Node. > > > > So, the one of the firsts steps was to list the available resources to understand the dimension of the task and, that brings me to my questions. > > > > First: > > > > The request: > > > > http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... > > > > returns 4.982.689 records > > > > And the request: > > > > http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... > > > > returns (here) 7.406.310 records > > > > Comments? > > > > Second: > > > > The request: > > > > http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... > > > > return things like this: > > > > "197908d0-5565-11d8-b290-b8a03c50a862":27629 > > > But the consult of the same dataset: > > > > http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5... > > > > Returns "null" (of course, is a FishBase!) > > > > I have plenty of examples like this, on yellow here (not finished!): > > > > https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYy... > > > > Comments? > > > > I think those two questions is a good start. Please, let me know if I'm doing something wrong. > > > > Cheers, > > > > Eduardo > > -------------------------------- > > Eduardo Dalcin > > Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ > > e-mail: edalcin@jbrj.gov.br > > Trabalho / Work: +55 21 3204 2116 > > -------------------------------- > > e-mail alternativo / alternate email: edalcin@jbrj.org > > -------------------------------- > > Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5 > > > >
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing listAPI-users@lists.gbif.orghttp://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
Mauro,
Agreed 100% (I'm actually a regular R user), but according to Eduardo's needs (or my understanding of his needs), I think that might be an overkill here. I have replicated and completed Eduardo's table in less than 30 minutes with OpenRefine, while using R would probably take a larger amount of time just to get the data. But again, for more serious analytics, few things (if any) beat R.
Cheers!
El mié 09/09/2015, 22:59, Mauro Cavalcanti maurobio@gmail.com escribió:
Javier,
The problem with these tools (LontraHarvest, OpenRefine, etc.) is that they are just data *retrieval* tools, not providing for data analytical and representation functionalities -- one or more different tools should be used after retrieving the data of interest for plotting maps, charts, tabulation, statistical analyses, etc. This is just where R excels, allowing to perform all these operations in a unified, straightforward workflow.
Salud!
2015-09-09 17:23 GMT-03:00 Javier Otegui javier.otegui@gmail.com:
Hi Eduardo (et al.),
If I understand correctly, the list at https://goo.gl/3wysaA shows the resources with data from Brazil and you want to filter out those with records other than Plants, am I right? Have you considered using OpenRefine (http://openrefine.org/) for this task? OpenRefine has a service to fetch URLs built based on data from other columns, which plays very well with GBIF APIs. You can make the program dinamically build the API request URL based on the dataset UUID, and fetch and parse the JSON response, without having to download the data and without having to code almost anything. The way I would go here is:
- Create a column based off of the value in column A of your table,
to extract just the dataset UUID 2. Create a new column fetching the GBIF API, adding the value in the previous column to a template URL: http://api.gbif.org/v1/occurrence/search?TAXON_KEY=6&limit=1&DATASET... <value>. The "limit:1" part makes things faster by avoiding having to show the default 20 records in the column 3. Create yet another column parsing the JSON result from the previous column, extracting just the value in the field "count". The result is the number of plant records in that dataset (therefore, resources such as FishBase will have a value of zero)
Actually, you can add as many columns as you want, with as many API calls, to fill the rest of the fields in your table. Using the "registry" API, you can get the title, external data link and the protocol (IPT, DiGIR...).
Hope this helps. Let me know if you are interested in this approach and need more help using OpenRefine. Cheers!
Javier Otegui http://www.jotegui.com
On Wed, Sep 9, 2015 at 8:07 PM, Mauro Cavalcanti maurobio@gmail.com wrote:
Scott,
That's my very point - that using R and rgbif should be the best path to take in this case, both because of the easier access to the GBIF API provided by rgbif and the HUGE data analytical capabilities of R itself. I had been working on a paper discussing this in the context of conservation databases (using R/rgbif and a Red-Listed group of mammals as an exemple), but unfortunately this work has been delayed by unexpected health problems. Hope it can be the light someday, however.
Best regards, Em 09/09/2015 14:44, "Scott Chamberlain" scott@ropensci.org escreveu:
Note that the R client rgbif does interface with the GBIF download API in addition to the search API - making it easier to deal with larger datasets. This works even if you downloaded bulk data from the GBIF GUI. Ignore this if you don't use R :)
Best, S
On Wed, Sep 9, 2015 at 10:35 AM Alex Thompson godfoder@acis.ufl.edu wrote:
I'm kind of seconding Rod here.
It might make more sense, depending on your use case and local computer resources, to just get a download of Plantae *AND* Brazil from GBIF periodically, then process that to exclude existing Brazilian datasets. You could then use something like Apache hadoop / spark to efficiently split the file by dataset or by institution code.
This would greatly simplify your interactions with GBIF (down to just periodically generating a download programmatically) and you would have an easy place to insert any additional data transformations you want. This is the path i take for my work at least - the incremental cost of a couple million more records is worth the reduction in complexity overall.
- Alex
On 09/09/2015 12:16 PM, Eduardo Dalcin wrote:
Hi Rod,
The real purpose is to have a list of UUID and the "source web page" for the data set. Thus, one way to do it is to select those resources that counts <> 0 for PLANTAE *AND* Brazil.
I don't want to do any stats analysis, but feed up one local harverster / agregator.
The problem is, considering the reply from Jan Legind at Sep 3, we have to check one by one (https://goo.gl/3wysaA) to check if it is a Herbarium / Preserved Specimen (Plantae) or not, from the request http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... .
Does it make sense?
Thanks for your curiosity! :)
Cheers,
Eduardo
*Eduardo Dalcin https://mailtrack.io/trace/link/5516ed5e4f903c6ee9bd9fb3876fb65ffffc687c?url=http%3A%2F%2Feduardo.dalc.in&signature=cda9e9bf584a828c* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
*e-mail alternativo / * *alternate email:** edalcin@jbrj.org
edalcin@jbrj.org*
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/3a5eaa1df56016285886497766577e5357ddc6c1?url=http%3A%2F%2Fagendar.dalc.in&signature=c4e8d8113c34937f
On Mon, Sep 7, 2015 at 12:33 PM, Roderic Page < Roderic.Page@glasgow.ac.uk> wrote:
Hi Eduardo,
I’m curious, is the purpose to get counts by dataset by country, or to get all the plant occurrences for Brazil? The later can be obtained by downloading all plant occurrences in Brazil http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR (you could then compute the per-dataset stats locally). I realise that this isn’t as convenient as having GBIF slice the data for you in the API.
Regards
Rod
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778 <%2B44%20141%20330%204778> Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com ORCID: http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ResearchGate https://www.researchgate.net/profile/Roderic_Page
On 4 Sep 2015, at 10:39, Eduardo Dalcin edalcin@jbrj.org wrote:
Hi Markus,
Yes, that's a shame I can't have country and "nub" together. There is any hope about it?
Eduardo
*Eduardo Dalcin https://mailtrack.io/trace/link/bac23864202354f3789938ce352a878faa0cd8b8?url=http%3A%2F%2Feduardo.dalc.in&signature=aea58ef6f439535b* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
*e-mail alternativo / * *alternate email:** edalcin@jbrj.org
edalcin@jbrj.org*
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5
On Thu, Sep 3, 2015 at 4:29 PM, Markus Döring mdoering@gbif.org wrote:
> Eduardo, > > as you might have seen from my issue comment the webservice uses a > different parameter name for taxonKey which is a bug we need to fix at some > point. > Please use nubKey for now to use the service like that: > > http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6 > > The real problem for you will be that we do not support the > combination of the country and the taxon filter, just one of the two. So > you cannot search for plants in Brazil I am afraid, just for datasets about > Brazil and datasets with plant records. > > Markus > > > > > On 03 Sep 2015, at 14:12, Eduardo Dalcin edalcin@jbrj.org wrote: > > > > Thanks Jan. I'll keep exploring and I'll be in touch, if I need. > > > > Best, > > > > Eduardo > > > > > > > > -------------------------------- > > Eduardo Dalcin > > Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ > > e-mail: edalcin@jbrj.gov.br > > Trabalho / Work: +55 21 3204 2116 > > -------------------------------- > > e-mail alternativo / alternate email: edalcin@jbrj.org > > -------------------------------- > > Agendar reunião / Schedule a meeting: http://agendar.dalc.in > https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5 > > > > On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF] < > jlegind@gbif.org> wrote: > > Dear Eduardo, > > > > > > > > Thanks for getting in touch with us about these issues. > > > > > > > > The first request > http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... > returns the number of records located in Brazil for the facets in the > request. > > > > The second query > http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... > uses the Occurrence Inventories web service > http://www.gbif.org/developer/occurrence#inventories which does not > support the basis-of-record facet in the /datasets request. I understand > that it would be better if the API response yielded an error message in > this instance. > > > > > > > > Concerning the other issues – you are indeed right that the counts > do not make sense in the context of taxon key 6 which is Plantae. Actually > the API does not handle the taxonKey search at all, contrary to what the > documentation states: > > > > > > > > /occurrence/counts/datasets > > > > GET > > > > Counts > > > > Lists occurrence counts for datasets that cover a given taxon or > country. > > > > country, taxonKey > > > > > > > > As you can see here, > http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 , this > request doesn’t return anything. > > > > > > > > The GBIF developers will handle this issue in due time. > > > > You can follow the issue in our bug tracking service here: > http://dev.gbif.org/issues/browse/POR-2828 > > > > > > > > > > > > With best regards, > > > > > > > > Jan K. Legind > > > > Data manager, GBIF Secretariat > > > > > > > > > > > > From: API-users [mailto:api-users-bounces@lists.gbif.org] On > Behalf Of Eduardo Dalcin > > Sent: 2. september 2015 20:06 > > To: api-users@lists.gbif.org; dev@gbif.org > > Cc: João Monnerat Lanna; Natália Queiroz; Diogo Silva; Laura; > Ricardo Avancini > > Subject: [API-users] Some questions from a begginer > > > > > > > > Hi folks, > > > > > > > > This is my first message to the list. So, please, be nice :) > > > > > > > > I'm working here at Rio de Janeiro Botanical Garden, together with > the guys at the National Center for Flora Conservation. We are doing the > risk assessment of the Brazilian flora to the government. We assess, so > far, the risk of ca. 6.000 species, but we still have to assess ca. 35.000. > Access occurrence records for Brazil is crucial, and every occurrence is > important. > > > > > > > > That means that we have to put together occurrence data from > different sources and, after the first batch of the risk assessment, we > realize that we need to build up our aggregator. We are planning to do this > with the Lontra-harvester, with the help of the guys at Brazilian GBIF Node. > > > > > > > > So, the one of the firsts steps was to list the available > resources to understand the dimension of the task and, that brings me to my > questions. > > > > > > > > First: > > > > > > > > The request: > > > > > > > > > http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... > > > > > > > > returns 4.982.689 records > > > > > > > > And the request: > > > > > > > > > http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... > > > > > > > > returns (here) 7.406.310 records > > > > > > > > Comments? > > > > > > > > Second: > > > > > > > > The request: > > > > > > > > > http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... > > > > > > > > return things like this: > > > > > > > > "197908d0-5565-11d8-b290-b8a03c50a862":27629 > > > > > > But the consult of the same dataset: > > > > > > > > > http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5... > > > > > > > > Returns "null" (of course, is a FishBase!) > > > > > > > > I have plenty of examples like this, on yellow here (not > finished!): > > > > > > > > > https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYy... > > > > > > > > Comments? > > > > > > > > I think those two questions is a good start. Please, let me know > if I'm doing something wrong. > > > > > > > > Cheers, > > > > > > > > Eduardo > > > > -------------------------------- > > > > Eduardo Dalcin > > > > Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ > > > > e-mail: edalcin@jbrj.gov.br > > > > Trabalho / Work: +55 21 3204 2116 > > > > -------------------------------- > > > > e-mail alternativo / alternate email: edalcin@jbrj.org > > > > -------------------------------- > > > > Agendar reunião / Schedule a meeting: http://agendar.dalc.in > https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5 > > > > > > > > > > _______________________________________________ API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing listAPI-users@lists.gbif.orghttp://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
-- Dr. Mauro J. Cavalcanti E-mail: maurobio@gmail.com Web: http://sites.google.com/site/maurobio
Javier,
Without some kind of empirical benchmark, a faithful comparison between R and one of these other tools cannot be stated so clearly. In my own experience, using R/rgbif to retrieve *and* locally database about 20,000 records from GBIF took less than 15 minutes. And all subsequent processing (counting, tabulating, plotting, etc.) is done on the local machine, by means of R, so this is dependent upon the performance of said machine. I understand that the dataset under discussion is much larger than the one I have been working with, but I cannot say that R would be an "overkill" in any instance -- after all, why would anyone just want to retrieve records from GBIF? For doing what with them? A conservation monitoring center is supposed to generate *products*, in the form of charts, tables, maps, statistcs, and therefore some kind of specializaed tool will be required at some point in the process. So, why not using an integrated package (like R) from the start for this?
Salud!
2015-09-09 18:16 GMT-03:00 Javier Otegui javier.otegui@gmail.com:
Mauro,
Agreed 100% (I'm actually a regular R user), but according to Eduardo's needs (or my understanding of his needs), I think that might be an overkill here. I have replicated and completed Eduardo's table in less than 30 minutes with OpenRefine, while using R would probably take a larger amount of time just to get the data. But again, for more serious analytics, few things (if any) beat R.
Cheers!
El mié 09/09/2015, 22:59, Mauro Cavalcanti maurobio@gmail.com escribió:
Javier,
The problem with these tools (LontraHarvest, OpenRefine, etc.) is that they are just data *retrieval* tools, not providing for data analytical and representation functionalities -- one or more different tools should be used after retrieving the data of interest for plotting maps, charts, tabulation, statistical analyses, etc. This is just where R excels, allowing to perform all these operations in a unified, straightforward workflow.
Salud!
2015-09-09 17:23 GMT-03:00 Javier Otegui javier.otegui@gmail.com:
Hi Eduardo (et al.),
If I understand correctly, the list at https://goo.gl/3wysaA shows the resources with data from Brazil and you want to filter out those with records other than Plants, am I right? Have you considered using OpenRefine (http://openrefine.org/) for this task? OpenRefine has a service to fetch URLs built based on data from other columns, which plays very well with GBIF APIs. You can make the program dinamically build the API request URL based on the dataset UUID, and fetch and parse the JSON response, without having to download the data and without having to code almost anything. The way I would go here is:
- Create a column based off of the value in column A of your table,
to extract just the dataset UUID 2. Create a new column fetching the GBIF API, adding the value in the previous column to a template URL: http://api.gbif.org/v1/occurrence/search?TAXON_KEY=6&limit=1&DATASET... <value>. The "limit:1" part makes things faster by avoiding having to show the default 20 records in the column 3. Create yet another column parsing the JSON result from the previous column, extracting just the value in the field "count". The result is the number of plant records in that dataset (therefore, resources such as FishBase will have a value of zero)
Actually, you can add as many columns as you want, with as many API calls, to fill the rest of the fields in your table. Using the "registry" API, you can get the title, external data link and the protocol (IPT, DiGIR...).
Hope this helps. Let me know if you are interested in this approach and need more help using OpenRefine. Cheers!
Javier Otegui http://www.jotegui.com
On Wed, Sep 9, 2015 at 8:07 PM, Mauro Cavalcanti maurobio@gmail.com wrote:
Scott,
That's my very point - that using R and rgbif should be the best path to take in this case, both because of the easier access to the GBIF API provided by rgbif and the HUGE data analytical capabilities of R itself. I had been working on a paper discussing this in the context of conservation databases (using R/rgbif and a Red-Listed group of mammals as an exemple), but unfortunately this work has been delayed by unexpected health problems. Hope it can be the light someday, however.
Best regards, Em 09/09/2015 14:44, "Scott Chamberlain" scott@ropensci.org escreveu:
Note that the R client rgbif does interface with the GBIF download API in addition to the search API - making it easier to deal with larger datasets. This works even if you downloaded bulk data from the GBIF GUI. Ignore this if you don't use R :)
Best, S
On Wed, Sep 9, 2015 at 10:35 AM Alex Thompson godfoder@acis.ufl.edu wrote:
I'm kind of seconding Rod here.
It might make more sense, depending on your use case and local computer resources, to just get a download of Plantae *AND* Brazil from GBIF periodically, then process that to exclude existing Brazilian datasets. You could then use something like Apache hadoop / spark to efficiently split the file by dataset or by institution code.
This would greatly simplify your interactions with GBIF (down to just periodically generating a download programmatically) and you would have an easy place to insert any additional data transformations you want. This is the path i take for my work at least - the incremental cost of a couple million more records is worth the reduction in complexity overall.
- Alex
On 09/09/2015 12:16 PM, Eduardo Dalcin wrote:
Hi Rod,
The real purpose is to have a list of UUID and the "source web page" for the data set. Thus, one way to do it is to select those resources that counts <> 0 for PLANTAE *AND* Brazil.
I don't want to do any stats analysis, but feed up one local harverster / agregator.
The problem is, considering the reply from Jan Legind at Sep 3, we have to check one by one (https://goo.gl/3wysaA) to check if it is a Herbarium / Preserved Specimen (Plantae) or not, from the request http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... .
Does it make sense?
Thanks for your curiosity! :)
Cheers,
Eduardo
*Eduardo Dalcin https://mailtrack.io/trace/link/5516ed5e4f903c6ee9bd9fb3876fb65ffffc687c?url=http%3A%2F%2Feduardo.dalc.in&signature=cda9e9bf584a828c* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
*e-mail alternativo / * *alternate email:** edalcin@jbrj.org
edalcin@jbrj.org*
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/3a5eaa1df56016285886497766577e5357ddc6c1?url=http%3A%2F%2Fagendar.dalc.in&signature=c4e8d8113c34937f
On Mon, Sep 7, 2015 at 12:33 PM, Roderic Page < Roderic.Page@glasgow.ac.uk> wrote:
> Hi Eduardo, > > I’m curious, is the purpose to get counts by dataset by country, or > to get all the plant occurrences for Brazil? The later can be obtained by > downloading all plant occurrences in Brazil > http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR (you > could then compute the per-dataset stats locally). I realise that this > isn’t as convenient as having GBIF slice the data for you in the API. > > Regards > > Rod > > --------------------------------------------------------- > Roderic Page > Professor of Taxonomy > Institute of Biodiversity, Animal Health and Comparative Medicine > College of Medical, Veterinary and Life Sciences > Graham Kerr Building > University of Glasgow > Glasgow G12 8QQ, UK > > Email: Roderic.Page@glasgow.ac.uk > Tel: +44 141 330 4778 <%2B44%20141%20330%204778> > Skype: rdmpage > Facebook: http://www.facebook.com/rdmpage > LinkedIn: http://uk.linkedin.com/in/rdmpage > Twitter: http://twitter.com/rdmpage > Blog: http://iphylo.blogspot.com > ORCID: http://orcid.org/0000-0002-7101-9767 > Citations: > http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ > ResearchGate https://www.researchgate.net/profile/Roderic_Page > > > On 4 Sep 2015, at 10:39, Eduardo Dalcin edalcin@jbrj.org wrote: > > Hi Markus, > > Yes, that's a shame I can't have country and "nub" together. There > is any hope about it? > > Eduardo > > > -------------------------------- > *Eduardo Dalcin > https://mailtrack.io/trace/link/bac23864202354f3789938ce352a878faa0cd8b8?url=http%3A%2F%2Feduardo.dalc.in&signature=aea58ef6f439535b* > Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ > e-mail: edalcin@jbrj.gov.br > Trabalho / Work: +55 21 3204 2116 > -------------------------------- > *e-mail alternativo / * *alternate email:** edalcin@jbrj.org > edalcin@jbrj.org* > -------------------------------- > Agendar reunião / Schedule a meeting: http://agendar.dalc.in > https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5 > > On Thu, Sep 3, 2015 at 4:29 PM, Markus Döring mdoering@gbif.org > wrote: > >> Eduardo, >> >> as you might have seen from my issue comment the webservice uses a >> different parameter name for taxonKey which is a bug we need to fix at some >> point. >> Please use nubKey for now to use the service like that: >> >> http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6 >> >> The real problem for you will be that we do not support the >> combination of the country and the taxon filter, just one of the two. So >> you cannot search for plants in Brazil I am afraid, just for datasets about >> Brazil and datasets with plant records. >> >> Markus >> >> >> >> > On 03 Sep 2015, at 14:12, Eduardo Dalcin edalcin@jbrj.org >> wrote: >> > >> > Thanks Jan. I'll keep exploring and I'll be in touch, if I need. >> > >> > Best, >> > >> > Eduardo >> > >> > >> > >> > -------------------------------- >> > Eduardo Dalcin >> > Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ >> > e-mail: edalcin@jbrj.gov.br >> > Trabalho / Work: +55 21 3204 2116 >> > -------------------------------- >> > e-mail alternativo / alternate email: edalcin@jbrj.org >> > -------------------------------- >> > Agendar reunião / Schedule a meeting: http://agendar.dalc.in >> https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5 >> > >> > On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF] < >> jlegind@gbif.org> wrote: >> > Dear Eduardo, >> > >> > >> > >> > Thanks for getting in touch with us about these issues. >> > >> > >> > >> > The first request >> http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... >> returns the number of records located in Brazil for the facets in the >> request. >> > >> > The second query >> http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... >> uses the Occurrence Inventories web service >> http://www.gbif.org/developer/occurrence#inventories which does >> not support the basis-of-record facet in the /datasets request. I >> understand that it would be better if the API response yielded an error >> message in this instance. >> > >> > >> > >> > Concerning the other issues – you are indeed right that the >> counts do not make sense in the context of taxon key 6 which is Plantae. >> Actually the API does not handle the taxonKey search at all, contrary to >> what the documentation states: >> > >> > >> > >> > /occurrence/counts/datasets >> > >> > GET >> > >> > Counts >> > >> > Lists occurrence counts for datasets that cover a given taxon or >> country. >> > >> > country, taxonKey >> > >> > >> > >> > As you can see here, >> http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 , >> this request doesn’t return anything. >> > >> > >> > >> > The GBIF developers will handle this issue in due time. >> > >> > You can follow the issue in our bug tracking service here: >> http://dev.gbif.org/issues/browse/POR-2828 >> > >> > >> > >> > >> > >> > With best regards, >> > >> > >> > >> > Jan K. Legind >> > >> > Data manager, GBIF Secretariat >> > >> > >> > >> > >> > >> > From: API-users [mailto:api-users-bounces@lists.gbif.org] On >> Behalf Of Eduardo Dalcin >> > Sent: 2. september 2015 20:06 >> > To: api-users@lists.gbif.org; dev@gbif.org >> > Cc: João Monnerat Lanna; Natália Queiroz; Diogo Silva; Laura; >> Ricardo Avancini >> > Subject: [API-users] Some questions from a begginer >> > >> > >> > >> > Hi folks, >> > >> > >> > >> > This is my first message to the list. So, please, be nice :) >> > >> > >> > >> > I'm working here at Rio de Janeiro Botanical Garden, together >> with the guys at the National Center for Flora Conservation. We are doing >> the risk assessment of the Brazilian flora to the government. We assess, so >> far, the risk of ca. 6.000 species, but we still have to assess ca. 35.000. >> Access occurrence records for Brazil is crucial, and every occurrence is >> important. >> > >> > >> > >> > That means that we have to put together occurrence data from >> different sources and, after the first batch of the risk assessment, we >> realize that we need to build up our aggregator. We are planning to do this >> with the Lontra-harvester, with the help of the guys at Brazilian GBIF Node. >> > >> > >> > >> > So, the one of the firsts steps was to list the available >> resources to understand the dimension of the task and, that brings me to my >> questions. >> > >> > >> > >> > First: >> > >> > >> > >> > The request: >> > >> > >> > >> > >> http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... >> > >> > >> > >> > returns 4.982.689 records >> > >> > >> > >> > And the request: >> > >> > >> > >> > >> http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... >> > >> > >> > >> > returns (here) 7.406.310 records >> > >> > >> > >> > Comments? >> > >> > >> > >> > Second: >> > >> > >> > >> > The request: >> > >> > >> > >> > >> http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... >> > >> > >> > >> > return things like this: >> > >> > >> > >> > "197908d0-5565-11d8-b290-b8a03c50a862":27629 >> > >> > >> > But the consult of the same dataset: >> > >> > >> > >> > >> http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5... >> > >> > >> > >> > Returns "null" (of course, is a FishBase!) >> > >> > >> > >> > I have plenty of examples like this, on yellow here (not >> finished!): >> > >> > >> > >> > >> https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYy... >> > >> > >> > >> > Comments? >> > >> > >> > >> > I think those two questions is a good start. Please, let me know >> if I'm doing something wrong. >> > >> > >> > >> > Cheers, >> > >> > >> > >> > Eduardo >> > >> > -------------------------------- >> > >> > Eduardo Dalcin >> > >> > Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ >> > >> > e-mail: edalcin@jbrj.gov.br >> > >> > Trabalho / Work: +55 21 3204 2116 >> > >> > -------------------------------- >> > >> > e-mail alternativo / alternate email: edalcin@jbrj.org >> > >> > -------------------------------- >> > >> > Agendar reunião / Schedule a meeting: http://agendar.dalc.in >> https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5 >> > >> > >> > >> > >> >> > _______________________________________________ > API-users mailing list > API-users@lists.gbif.org > http://lists.gbif.org/mailman/listinfo/api-users > > >
API-users mailing listAPI-users@lists.gbif.orghttp://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
-- Dr. Mauro J. Cavalcanti E-mail: maurobio@gmail.com Web: http://sites.google.com/site/maurobio
The problem with these tools (LontraHarvest, OpenRefine, etc.) is that they are just data *retrieval* tools, not providing for data analytical and representation functionalities
Mauro, for me this is a blessing! :)
At CNC Flora workflow, the data from GBIF is useless the way it is, because it have to be validated first, taxonomically and spatially. Only after the process of the cleaning, georeferencing and validation with the expert, the data will be analyzed to take part of the risk assessment.
Cheers
Eduardo
Hi Eduardo,
it would be interesting to have example of the kinds of problems you encounter with GBIF data, so that we can look at was to fix the problems. It would also be interesting to know whether you would be able to provide GBIF with the corrections you make to GBIF data. It seems clear that lots of people are cleaning data in their own projects, but that doesn’t filter back to GBIF.
Regards
Rod
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: Roderic.Page@glasgow.ac.ukmailto:Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com ORCID: http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ResearchGate https://www.researchgate.net/profile/Roderic_Page
On 14 Sep 2015, at 17:34, Eduardo Dalcin <edalcin@jbrj.orgmailto:edalcin@jbrj.org> wrote:
The problem with these tools (LontraHarvest, OpenRefine, etc.) is that they are just data *retrieval* tools, not providing for data analytical and representation functionalities
Mauro, for me this is a blessing! :)
At CNC Flora workflow, the data from GBIF is useless the way it is, because it have to be validated first, taxonomically and spatially. Only after the process of the cleaning, georeferencing and validation with the expert, the data will be analyzed to take part of the risk assessment.
Cheers
Eduardo
_______________________________________________ API-users mailing list API-users@lists.gbif.orgmailto:API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
Hi Rod,
As you saw in the other message, the main problem that we have now is have the same voucher represented twice because NYBG had a DIGIR source and now have an IPT source. People at NYBG said that they ask GBIF to remove DIGIR, but still there. Maybe it occurs with other sources as well.
Related with the feedback of the data cleaning process I'm indeed interested in this discussion, but I'm not sure if this list is the best forum to do it.
Here at the National Center for Flora Conservation - CNCFlora, at the risk assessments, we just use occurrences that were validated by experts, taxonomically and spatially. This information may be useful, especially if the expert made some correction or comment on the occurrence. I can see that this is related with annotation initiatives, such as AnnoSys and FilteredPush. In my ideal and fantastic world, we would have an annotation feature on GBIF occurrences, where experts can interact with the material. In our Virtual Herbarium of Repatriated Plants https://mailtrack.io/trace/link/b2036a078664eab467d602e1f1513c7641fadf73?url=http%3A%2F%2Fwww.herbariovirtualreflora.jbrj.gov.br%2Fjabot%2FherbarioVirtual%2FConsultaPublicoHVUC%2FResultadoDaConsultaNovaConsulta.do%3Flingua%3Den&signature=7efc3ae92fb5b099, the experts can suggest new names if they have a login.
However, what is usual is the duplication of efforts for georeferencing the legacy occurrences. For example, different efforts, methodologies and uncertainty levels have been applied in different duplicates of the same occurrence, held by different herbaria.
I thought that the feedback about data improvement should be sent directly to the data provider but, please, if there is something else let me know.
Cheers,
Eduardo
-------------------------------- *Eduardo Dalcin https://mailtrack.io/trace/link/a5d3cb382ef00884ad61ce9e38743772edafd567?url=http%3A%2F%2Feduardo.dalc.in&signature=d9152b1fbbf0db39* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116 -------------------------------- *e-mail alternativo / * *alternate email:** edalcin@jbrj.org edalcin@jbrj.org* -------------------------------- Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/3639d653caa48a1efeb08d1c342b7ffd0f5bd30b?url=http%3A%2F%2Fagendar.dalc.in&signature=07f7b0c516192bcd
On Mon, Sep 14, 2015 at 1:50 PM, Roderic Page Roderic.Page@glasgow.ac.uk wrote:
Hi Eduardo,
it would be interesting to have example of the kinds of problems you encounter with GBIF data, so that we can look at was to fix the problems. It would also be interesting to know whether you would be able to provide GBIF with the corrections you make to GBIF data. It seems clear that lots of people are cleaning data in their own projects, but that doesn’t filter back to GBIF.
Regards
Rod
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com ORCID: http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ResearchGate https://www.researchgate.net/profile/Roderic_Page
On 14 Sep 2015, at 17:34, Eduardo Dalcin edalcin@jbrj.org wrote:
The problem with these tools (LontraHarvest, OpenRefine, etc.) is that
they are just data *retrieval* tools, not providing for data analytical and representation functionalities
Mauro, for me this is a blessing! :)
At CNC Flora workflow, the data from GBIF is useless the way it is, because it have to be validated first, taxonomically and spatially. Only after the process of the cleaning, georeferencing and validation with the expert, the data will be analyzed to take part of the risk assessment.
Cheers
Eduardo
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
Hi Eduardo,
I thought that the feedback about data improvement should be sent directly to the data provider but, please, if there is something else let me know.
In an ideal world, yes the feedback should go to the data provider, things get fixed, then GBIF gets updated. However, data providers don’t always have the resources to fix things. I’m also interested in how many of the data issues that come up are things that GBIF itself can detect and flag. In my experience, there are issues that the provider was unaware of, but become apparent once the data is exposed by GBIF.
For example, here’s a case of a data set supplied to GBIF with a serious error https://github.com/ttu-vertnet/ttu-mammals/issues/12 This was obvious in GBIF simply by looking at the map, but apparently not to the data provider (this error has now been fixed).
The more we know about the sort of errors that can happen, the better placed we are to develop tools to catch them.
Regards
Rod
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: Roderic.Page@glasgow.ac.ukmailto:Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com ORCID: http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ResearchGate https://www.researchgate.net/profile/Roderic_Page
On 15 Sep 2015, at 14:53, Eduardo Dalcin <edalcin@jbrj.orgmailto:edalcin@jbrj.org> wrote:
Hi Rod,
As you saw in the other message, the main problem that we have now is have the same voucher represented twice because NYBG had a DIGIR source and now have an IPT source. People at NYBG said that they ask GBIF to remove DIGIR, but still there. Maybe it occurs with other sources as well.
Related with the feedback of the data cleaning process I'm indeed interested in this discussion, but I'm not sure if this list is the best forum to do it.
Here at the National Center for Flora Conservation - CNCFlora, at the risk assessments, we just use occurrences that were validated by experts, taxonomically and spatially. This information may be useful, especially if the expert made some correction or comment on the occurrence. I can see that this is related with annotation initiatives, such as AnnoSys and FilteredPush. In my ideal and fantastic world, we would have an annotation feature on GBIF occurrences, where experts can interact with the material. In our Virtual Herbarium of Repatriated Plantshttps://mailtrack.io/trace/link/b2036a078664eab467d602e1f1513c7641fadf73?url=http%3A%2F%2Fwww.herbariovirtualreflora.jbrj.gov.br%2Fjabot%2FherbarioVirtual%2FConsultaPublicoHVUC%2FResultadoDaConsultaNovaConsulta.do%3Flingua%3Den&signature=7efc3ae92fb5b099, the experts can suggest new names if they have a login.
However, what is usual is the duplication of efforts for georeferencing the legacy occurrences. For example, different efforts, methodologies and uncertainty levels have been applied in different duplicates of the same occurrence, held by different herbaria.
I thought that the feedback about data improvement should be sent directly to the data provider but, please, if there is something else let me know.
Cheers,
Eduardo
-------------------------------- Eduardo Dalcinhttps://mailtrack.io/trace/link/a5d3cb382ef00884ad61ce9e38743772edafd567?url=http%3A%2F%2Feduardo.dalc.in&signature=d9152b1fbbf0db39 Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.brmailto:edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116 -------------------------------- e-mail alternativo / alternate email: edalcin@jbrj.orgmailto:edalcin@jbrj.org -------------------------------- Agendar reunião / Schedule a meeting: http://agendar.dalc.inhttps://mailtrack.io/trace/link/3639d653caa48a1efeb08d1c342b7ffd0f5bd30b?url=http%3A%2F%2Fagendar.dalc.in&signature=07f7b0c516192bcd
On Mon, Sep 14, 2015 at 1:50 PM, Roderic Page <Roderic.Page@glasgow.ac.ukmailto:Roderic.Page@glasgow.ac.uk> wrote: Hi Eduardo,
it would be interesting to have example of the kinds of problems you encounter with GBIF data, so that we can look at was to fix the problems. It would also be interesting to know whether you would be able to provide GBIF with the corrections you make to GBIF data. It seems clear that lots of people are cleaning data in their own projects, but that doesn’t filter back to GBIF.
Regards
Rod
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: Roderic.Page@glasgow.ac.ukmailto:Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778tel:%2B44%20141%20330%204778 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.comhttp://iphylo.blogspot.com/ ORCID: http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ResearchGate https://www.researchgate.net/profile/Roderic_Page
On 14 Sep 2015, at 17:34, Eduardo Dalcin <edalcin@jbrj.orgmailto:edalcin@jbrj.org> wrote:
The problem with these tools (LontraHarvest, OpenRefine, etc.) is that they are just data *retrieval* tools, not providing for data analytical and representation functionalities
Mauro, for me this is a blessing! :)
At CNC Flora workflow, the data from GBIF is useless the way it is, because it have to be validated first, taxonomically and spatially. Only after the process of the cleaning, georeferencing and validation with the expert, the data will be analyzed to take part of the risk assessment.
Cheers
Eduardo
_______________________________________________ API-users mailing list API-users@lists.gbif.orgmailto:API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
It seems insane to us on the aggregator / technical side of things, but we get a ton of collections people telling us at iDigBio "We've never been able to look at our data on a map before." Or they do know how to use mapping tools, but they've only ever looked at a handful of points at once, and never the thousand or so that form the X=Y straight line.
Part of the solution to that we're trying to get across is to really get providers to come to iDigBio and interact with their own data in our systems. This is both because we want feedback on how it comes out, and because we're trying to build a sense of engagement with the process. We (as a community) desperately need collections to see data publishing as less "throwing data over the fence" and more of a collaborative effort with aggregators.
People have been criticizing aggregated biodiversity data quality for years, but its rare that I meet anyone who views themselves as part of the solution. Its just an impediment to getting the "real work" done.
- Alex
On 09/15/2015 10:04 AM, Roderic Page wrote:
Hi Eduardo,
I thought that the feedback about data improvement should be sent directly to the data provider but, please, if there is something else let me know.
In an ideal world, yes the feedback should go to the data provider, things get fixed, then GBIF gets updated. However, data providers don’t always have the resources to fix things. I’m also interested in how many of the data issues that come up are things that GBIF itself can detect and flag. In my experience, there are issues that the provider was unaware of, but become apparent once the data is exposed by GBIF.
For example, here’s a case of a data set supplied to GBIF with a serious error https://github.com/ttu-vertnet/ttu-mammals/issues/12 This was obvious in GBIF simply by looking at the map, but apparently not to the data provider (this error has now been fixed).
The more we know about the sort of errors that can happen, the better placed we are to develop tools to catch them.
Regards
Rod
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: Roderic.Page@glasgow.ac.uk mailto:Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com ORCID: http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ResearchGatehttps://www.researchgate.net/profile/Roderic_Page
On 15 Sep 2015, at 14:53, Eduardo Dalcin <edalcin@jbrj.org mailto:edalcin@jbrj.org> wrote:
Hi Rod,
As you saw in the other message, the main problem that we have now is have the same voucher represented twice because NYBG had a DIGIR source and now have an IPT source. People at NYBG said that they ask GBIF to remove DIGIR, but still there. Maybe it occurs with other sources as well.
Related with the feedback of the data cleaning process I'm indeed interested in this discussion, but I'm not sure if this list is the best forum to do it.
Here at the National Center for Flora Conservation - CNCFlora, at the risk assessments, we just use occurrences that were validated by experts, taxonomically and spatially. This information may be useful, especially if the expert made some correction or comment on the occurrence. I can see that this is related with annotation initiatives, such as AnnoSys and FilteredPush. In my ideal and fantastic world, we would have an annotation feature on GBIF occurrences, where experts can interact with the material. In our Virtual Herbarium of Repatriated Plants https://mailtrack.io/trace/link/b2036a078664eab467d602e1f1513c7641fadf73?url=http%3A%2F%2Fwww.herbariovirtualreflora.jbrj.gov.br%2Fjabot%2FherbarioVirtual%2FConsultaPublicoHVUC%2FResultadoDaConsultaNovaConsulta.do%3Flingua%3Den&signature=7efc3ae92fb5b099, the experts can suggest new names if they have a login.
However, what is usual is the duplication of efforts for georeferencing the legacy occurrences. For example, different efforts, methodologies and uncertainty levels have been applied in different duplicates of the same occurrence, held by different herbaria.
I thought that the feedback about data improvement should be sent directly to the data provider but, please, if there is something else let me know.
Cheers,
Eduardo
*Eduardo Dalcin https://mailtrack.io/trace/link/a5d3cb382ef00884ad61ce9e38743772edafd567?url=http%3A%2F%2Feduardo.dalc.in&signature=d9152b1fbbf0db39* **Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br mailto:edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
*e-mail alternativo / **alternate email:**edalcin@jbrj.org
mailto:edalcin@jbrj.org*
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/3639d653caa48a1efeb08d1c342b7ffd0f5bd30b?url=http%3A%2F%2Fagendar.dalc.in&signature=07f7b0c516192bcd
On Mon, Sep 14, 2015 at 1:50 PM, Roderic Page <Roderic.Page@glasgow.ac.uk mailto:Roderic.Page@glasgow.ac.uk> wrote:
Hi Eduardo, it would be interesting to have example of the kinds of problems you encounter with GBIF data, so that we can look at was to fix the problems. It would also be interesting to know whether you would be able to provide GBIF with the corrections you make to GBIF data. It seems clear that lots of people are cleaning data in their own projects, but that doesn’t filter back to GBIF. Regards Rod --------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK Email: Roderic.Page@glasgow.ac.uk <mailto:Roderic.Page@glasgow.ac.uk> Tel: +44 141 330 4778 <tel:%2B44%20141%20330%204778> Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com <http://iphylo.blogspot.com/> ORCID: http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ResearchGatehttps://www.researchgate.net/profile/Roderic_Page
On 14 Sep 2015, at 17:34, Eduardo Dalcin <edalcin@jbrj.org <mailto:edalcin@jbrj.org>> wrote: The problem with these tools (LontraHarvest, OpenRefine, etc.) is that they are just data *retrieval* tools, not providing for data analytical and representation functionalities Mauro, for me this is a blessing! :) At CNC Flora workflow, the data from GBIF is useless the way it is, because it have to be validated first, taxonomically and spatially. Only after the process of the cleaning, georeferencing and validation with the expert, the data will be analyzed to take part of the risk assessment. Cheers Eduardo _______________________________________________ API-users mailing list API-users@lists.gbif.org <mailto:API-users@lists.gbif.org> http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
Hi Alex,
Absolutely. I think part of the challenge is to show that adding data to aggregators can yield real, tangible benefits. Part of the problem is that such benefits are often not obvious, or indeed, available. If aggregators could offer richer ways of augmenting data (e.g., “oh, I see you’ve added these specimens, did you know that they’ve been cited in these papers, are vouchers for these sequences. that these taxonomic names have changed recently, etc.) then I think might might also help encourage people to see value in doing this, rather than simply checking the box marked “send data to aggregator”.
Regards
Rod
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: Roderic.Page@glasgow.ac.ukmailto:Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com ORCID: http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ResearchGate https://www.researchgate.net/profile/Roderic_Page
On 15 Sep 2015, at 15:28, Alex Thompson <godfoder@acis.ufl.edumailto:godfoder@acis.ufl.edu> wrote:
It seems insane to us on the aggregator / technical side of things, but we get a ton of collections people telling us at iDigBio "We've never been able to look at our data on a map before." Or they do know how to use mapping tools, but they've only ever looked at a handful of points at once, and never the thousand or so that form the X=Y straight line.
Part of the solution to that we're trying to get across is to really get providers to come to iDigBio and interact with their own data in our systems. This is both because we want feedback on how it comes out, and because we're trying to build a sense of engagement with the process. We (as a community) desperately need collections to see data publishing as less "throwing data over the fence" and more of a collaborative effort with aggregators.
People have been criticizing aggregated biodiversity data quality for years, but its rare that I meet anyone who views themselves as part of the solution. Its just an impediment to getting the "real work" done.
- Alex
On 09/15/2015 10:04 AM, Roderic Page wrote: Hi Eduardo,
I thought that the feedback about data improvement should be sent directly to the data provider but, please, if there is something else let me know.
In an ideal world, yes the feedback should go to the data provider, things get fixed, then GBIF gets updated. However, data providers don’t always have the resources to fix things. I’m also interested in how many of the data issues that come up are things that GBIF itself can detect and flag. In my experience, there are issues that the provider was unaware of, but become apparent once the data is exposed by GBIF.
For example, here’s a case of a data set supplied to GBIF with a serious error https://github.com/ttu-vertnet/ttu-mammals/issues/12 This was obvious in GBIF simply by looking at the map, but apparently not to the data provider (this error has now been fixed).
The more we know about the sort of errors that can happen, the better placed we are to develop tools to catch them.
Regards
Rod
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: Roderic.Page@glasgow.ac.ukmailto:Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.comhttp://iphylo.blogspot.com/ ORCID: http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ResearchGate https://www.researchgate.net/profile/Roderic_Page
On 15 Sep 2015, at 14:53, Eduardo Dalcin <edalcin@jbrj.orgmailto:edalcin@jbrj.org> wrote:
Hi Rod,
As you saw in the other message, the main problem that we have now is have the same voucher represented twice because NYBG had a DIGIR source and now have an IPT source. People at NYBG said that they ask GBIF to remove DIGIR, but still there. Maybe it occurs with other sources as well.
Related with the feedback of the data cleaning process I'm indeed interested in this discussion, but I'm not sure if this list is the best forum to do it.
Here at the National Center for Flora Conservation - CNCFlora, at the risk assessments, we just use occurrences that were validated by experts, taxonomically and spatially. This information may be useful, especially if the expert made some correction or comment on the occurrence. I can see that this is related with annotation initiatives, such as AnnoSys and FilteredPush. In my ideal and fantastic world, we would have an annotation feature on GBIF occurrences, where experts can interact with the material. In our Virtual Herbarium of Repatriated Plantshttps://mailtrack.io/trace/link/b2036a078664eab467d602e1f1513c7641fadf73?url=http%3A%2F%2Fwww.herbariovirtualreflora.jbrj.gov.br%2Fjabot%2FherbarioVirtual%2FConsultaPublicoHVUC%2FResultadoDaConsultaNovaConsulta.do%3Flingua%3Den&signature=7efc3ae92fb5b099, the experts can suggest new names if they have a login.
However, what is usual is the duplication of efforts for georeferencing the legacy occurrences. For example, different efforts, methodologies and uncertainty levels have been applied in different duplicates of the same occurrence, held by different herbaria.
I thought that the feedback about data improvement should be sent directly to the data provider but, please, if there is something else let me know.
Cheers,
Eduardo
-------------------------------- Eduardo Dalcinhttps://mailtrack.io/trace/link/a5d3cb382ef00884ad61ce9e38743772edafd567?url=http%3A%2F%2Feduardo.dalc.in&signature=d9152b1fbbf0db39 Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.brmailto:edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116 -------------------------------- e-mail alternativo / alternate email: edalcin@jbrj.orgmailto:edalcin@jbrj.org -------------------------------- Agendar reunião / Schedule a meeting: http://agendar.dalc.inhttps://mailtrack.io/trace/link/3639d653caa48a1efeb08d1c342b7ffd0f5bd30b?url=http%3A%2F%2Fagendar.dalc.in&signature=07f7b0c516192bcd
On Mon, Sep 14, 2015 at 1:50 PM, Roderic Page <Roderic.Page@glasgow.ac.ukmailto:Roderic.Page@glasgow.ac.uk> wrote: Hi Eduardo,
it would be interesting to have example of the kinds of problems you encounter with GBIF data, so that we can look at was to fix the problems. It would also be interesting to know whether you would be able to provide GBIF with the corrections you make to GBIF data. It seems clear that lots of people are cleaning data in their own projects, but that doesn’t filter back to GBIF.
Regards
Rod
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: Roderic.Page@glasgow.ac.ukmailto:Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778tel:%2B44%20141%20330%204778 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.comhttp://iphylo.blogspot.com/ ORCID: http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ResearchGate https://www.researchgate.net/profile/Roderic_Page
On 14 Sep 2015, at 17:34, Eduardo Dalcin <edalcin@jbrj.orgmailto:edalcin@jbrj.org> wrote:
The problem with these tools (LontraHarvest, OpenRefine, etc.) is that they are just data *retrieval* tools, not providing for data analytical and representation functionalities
Mauro, for me this is a blessing! :)
At CNC Flora workflow, the data from GBIF is useless the way it is, because it have to be validated first, taxonomically and spatially. Only after the process of the cleaning, georeferencing and validation with the expert, the data will be analyzed to take part of the risk assessment.
Cheers
Eduardo
_______________________________________________ API-users mailing list API-users@lists.gbif.orgmailto:API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
_______________________________________________ API-users mailing list API-users@lists.gbif.orgmailto:API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
_______________________________________________ API-users mailing list API-users@lists.gbif.orgmailto:API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
Javier,
Thank you for your suggestion. I may come back to you if I need help ok? Thanks for the offer also!
Eduardo
-------------------------------- *Eduardo Dalcin https://mailtrack.io/trace/link/4807d9aae06ef9f148dd0c67b1a85fd9c27d6a1a?url=http%3A%2F%2Feduardo.dalc.in&signature=1b3c182ada178f91* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116 -------------------------------- *e-mail alternativo / * *alternate email:** edalcin@jbrj.org edalcin@jbrj.org* -------------------------------- Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/c017b5af9bbed782c53245ec028883eaed4c9b6e?url=http%3A%2F%2Fagendar.dalc.in&signature=016b8a072a7480cb
On Wed, Sep 9, 2015 at 5:23 PM, Javier Otegui javier.otegui@gmail.com wrote:
Hi Eduardo (et al.),
If I understand correctly, the list at https://goo.gl/3wysaA shows the resources with data from Brazil and you want to filter out those with records other than Plants, am I right? Have you considered using OpenRefine (http://openrefine.org/) for this task? OpenRefine has a service to fetch URLs built based on data from other columns, which plays very well with GBIF APIs. You can make the program dinamically build the API request URL based on the dataset UUID, and fetch and parse the JSON response, without having to download the data and without having to code almost anything. The way I would go here is:
- Create a column based off of the value in column A of your table,
to extract just the dataset UUID 2. Create a new column fetching the GBIF API, adding the value in the previous column to a template URL: http://api.gbif.org/v1/occurrence/search?TAXON_KEY=6&limit=1&DATASET... <value>. The "limit:1" part makes things faster by avoiding having to show the default 20 records in the column 3. Create yet another column parsing the JSON result from the previous column, extracting just the value in the field "count". The result is the number of plant records in that dataset (therefore, resources such as FishBase will have a value of zero)
Actually, you can add as many columns as you want, with as many API calls, to fill the rest of the fields in your table. Using the "registry" API, you can get the title, external data link and the protocol (IPT, DiGIR...).
Hope this helps. Let me know if you are interested in this approach and need more help using OpenRefine. Cheers!
Javier Otegui http://www.jotegui.com
On Wed, Sep 9, 2015 at 8:07 PM, Mauro Cavalcanti maurobio@gmail.com wrote:
Scott,
That's my very point - that using R and rgbif should be the best path to take in this case, both because of the easier access to the GBIF API provided by rgbif and the HUGE data analytical capabilities of R itself. I had been working on a paper discussing this in the context of conservation databases (using R/rgbif and a Red-Listed group of mammals as an exemple), but unfortunately this work has been delayed by unexpected health problems. Hope it can be the light someday, however.
Best regards, Em 09/09/2015 14:44, "Scott Chamberlain" scott@ropensci.org escreveu:
Note that the R client rgbif does interface with the GBIF download API in addition to the search API - making it easier to deal with larger datasets. This works even if you downloaded bulk data from the GBIF GUI. Ignore this if you don't use R :)
Best, S
On Wed, Sep 9, 2015 at 10:35 AM Alex Thompson godfoder@acis.ufl.edu wrote:
I'm kind of seconding Rod here.
It might make more sense, depending on your use case and local computer resources, to just get a download of Plantae *AND* Brazil from GBIF periodically, then process that to exclude existing Brazilian datasets. You could then use something like Apache hadoop / spark to efficiently split the file by dataset or by institution code.
This would greatly simplify your interactions with GBIF (down to just periodically generating a download programmatically) and you would have an easy place to insert any additional data transformations you want. This is the path i take for my work at least - the incremental cost of a couple million more records is worth the reduction in complexity overall.
- Alex
On 09/09/2015 12:16 PM, Eduardo Dalcin wrote:
Hi Rod,
The real purpose is to have a list of UUID and the "source web page" for the data set. Thus, one way to do it is to select those resources that counts <> 0 for PLANTAE *AND* Brazil.
I don't want to do any stats analysis, but feed up one local harverster / agregator.
The problem is, considering the reply from Jan Legind at Sep 3, we have to check one by one (https://goo.gl/3wysaA) to check if it is a Herbarium / Preserved Specimen (Plantae) or not, from the request http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... .
Does it make sense?
Thanks for your curiosity! :)
Cheers,
Eduardo
*Eduardo Dalcin https://mailtrack.io/trace/link/5516ed5e4f903c6ee9bd9fb3876fb65ffffc687c?url=http%3A%2F%2Feduardo.dalc.in&signature=cda9e9bf584a828c* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
*e-mail alternativo / * *alternate email:** edalcin@jbrj.org
edalcin@jbrj.org*
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/3a5eaa1df56016285886497766577e5357ddc6c1?url=http%3A%2F%2Fagendar.dalc.in&signature=c4e8d8113c34937f
On Mon, Sep 7, 2015 at 12:33 PM, Roderic Page < Roderic.Page@glasgow.ac.uk> wrote:
Hi Eduardo,
I’m curious, is the purpose to get counts by dataset by country, or to get all the plant occurrences for Brazil? The later can be obtained by downloading all plant occurrences in Brazil http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR (you could then compute the per-dataset stats locally). I realise that this isn’t as convenient as having GBIF slice the data for you in the API.
Regards
Rod
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778 <%2B44%20141%20330%204778> Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com ORCID: http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ResearchGate https://www.researchgate.net/profile/Roderic_Page
On 4 Sep 2015, at 10:39, Eduardo Dalcin edalcin@jbrj.org wrote:
Hi Markus,
Yes, that's a shame I can't have country and "nub" together. There is any hope about it?
Eduardo
*Eduardo Dalcin https://mailtrack.io/trace/link/bac23864202354f3789938ce352a878faa0cd8b8?url=http%3A%2F%2Feduardo.dalc.in&signature=aea58ef6f439535b* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
*e-mail alternativo / * *alternate email:** edalcin@jbrj.org
edalcin@jbrj.org*
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5
On Thu, Sep 3, 2015 at 4:29 PM, Markus Döring mdoering@gbif.org wrote:
Eduardo,
as you might have seen from my issue comment the webservice uses a different parameter name for taxonKey which is a bug we need to fix at some point. Please use nubKey for now to use the service like that:
http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6
The real problem for you will be that we do not support the combination of the country and the taxon filter, just one of the two. So you cannot search for plants in Brazil I am afraid, just for datasets about Brazil and datasets with plant records.
Markus
> On 03 Sep 2015, at 14:12, Eduardo Dalcin edalcin@jbrj.org wrote: > > Thanks Jan. I'll keep exploring and I'll be in touch, if I need. > > Best, > > Eduardo > > > > -------------------------------- > Eduardo Dalcin > Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ > e-mail: edalcin@jbrj.gov.br > Trabalho / Work: +55 21 3204 2116 > -------------------------------- > e-mail alternativo / alternate email: edalcin@jbrj.org > -------------------------------- > Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5 > > On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF] jlegind@gbif.org wrote: > Dear Eduardo, > > > > Thanks for getting in touch with us about these issues. > > > > The first request http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... returns the number of records located in Brazil for the facets in the request. > > The second query http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... uses the Occurrence Inventories web service http://www.gbif.org/developer/occurrence#inventories which does not support the basis-of-record facet in the /datasets request. I understand that it would be better if the API response yielded an error message in this instance. > > > > Concerning the other issues – you are indeed right that the counts do not make sense in the context of taxon key 6 which is Plantae. Actually the API does not handle the taxonKey search at all, contrary to what the documentation states: > > > > /occurrence/counts/datasets > > GET > > Counts > > Lists occurrence counts for datasets that cover a given taxon or country. > > country, taxonKey > > > > As you can see here, http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 , this request doesn’t return anything. > > > > The GBIF developers will handle this issue in due time. > > You can follow the issue in our bug tracking service here: http://dev.gbif.org/issues/browse/POR-2828 > > > > > > With best regards, > > > > Jan K. Legind > > Data manager, GBIF Secretariat > > > > > > From: API-users [mailto:api-users-bounces@lists.gbif.org] On Behalf Of Eduardo Dalcin > Sent: 2. september 2015 20:06 > To: api-users@lists.gbif.org; dev@gbif.org > Cc: João Monnerat Lanna; Natália Queiroz; Diogo Silva; Laura; Ricardo Avancini > Subject: [API-users] Some questions from a begginer > > > > Hi folks, > > > > This is my first message to the list. So, please, be nice :) > > > > I'm working here at Rio de Janeiro Botanical Garden, together with the guys at the National Center for Flora Conservation. We are doing the risk assessment of the Brazilian flora to the government. We assess, so far, the risk of ca. 6.000 species, but we still have to assess ca. 35.000. Access occurrence records for Brazil is crucial, and every occurrence is important. > > > > That means that we have to put together occurrence data from different sources and, after the first batch of the risk assessment, we realize that we need to build up our aggregator. We are planning to do this with the Lontra-harvester, with the help of the guys at Brazilian GBIF Node. > > > > So, the one of the firsts steps was to list the available resources to understand the dimension of the task and, that brings me to my questions. > > > > First: > > > > The request: > > > > http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... > > > > returns 4.982.689 records > > > > And the request: > > > > http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... > > > > returns (here) 7.406.310 records > > > > Comments? > > > > Second: > > > > The request: > > > > http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... > > > > return things like this: > > > > "197908d0-5565-11d8-b290-b8a03c50a862":27629 > > > But the consult of the same dataset: > > > > http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5... > > > > Returns "null" (of course, is a FishBase!) > > > > I have plenty of examples like this, on yellow here (not finished!): > > > > https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYy... > > > > Comments? > > > > I think those two questions is a good start. Please, let me know if I'm doing something wrong. > > > > Cheers, > > > > Eduardo > > -------------------------------- > > Eduardo Dalcin > > Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ > > e-mail: edalcin@jbrj.gov.br > > Trabalho / Work: +55 21 3204 2116 > > -------------------------------- > > e-mail alternativo / alternate email: edalcin@jbrj.org > > -------------------------------- > > Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5 > > > >
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing listAPI-users@lists.gbif.orghttp://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
Thanks Alex. Food for thought.
Best,
Eduardo
-------------------------------- *Eduardo Dalcin https://mailtrack.io/trace/link/216b27d1db9fb6d2aa5a6d7b07aaf7a6b6b19ed9?url=http%3A%2F%2Feduardo.dalc.in&signature=fd3c844b73e3c400* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116 -------------------------------- *e-mail alternativo / * *alternate email:** edalcin@jbrj.org edalcin@jbrj.org* -------------------------------- Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/dc51cf0b6ea2d4897c2ae7a70bab52163650de15?url=http%3A%2F%2Fagendar.dalc.in&signature=c87a5e78e01a7976
On Wed, Sep 9, 2015 at 2:28 PM, Alex Thompson godfoder@acis.ufl.edu wrote:
I'm kind of seconding Rod here.
It might make more sense, depending on your use case and local computer resources, to just get a download of Plantae *AND* Brazil from GBIF periodically, then process that to exclude existing Brazilian datasets. You could then use something like Apache hadoop / spark to efficiently split the file by dataset or by institution code.
This would greatly simplify your interactions with GBIF (down to just periodically generating a download programmatically) and you would have an easy place to insert any additional data transformations you want. This is the path i take for my work at least - the incremental cost of a couple million more records is worth the reduction in complexity overall.
- Alex
On 09/09/2015 12:16 PM, Eduardo Dalcin wrote:
Hi Rod,
The real purpose is to have a list of UUID and the "source web page" for the data set. Thus, one way to do it is to select those resources that counts <> 0 for PLANTAE *AND* Brazil.
I don't want to do any stats analysis, but feed up one local harverster / agregator.
The problem is, considering the reply from Jan Legind at Sep 3, we have to check one by one (https://goo.gl/3wysaA) to check if it is a Herbarium / Preserved Specimen (Plantae) or not, from the request http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... .
Does it make sense?
Thanks for your curiosity! :)
Cheers,
Eduardo
*Eduardo Dalcin https://mailtrack.io/trace/link/5516ed5e4f903c6ee9bd9fb3876fb65ffffc687c?url=http%3A%2F%2Feduardo.dalc.in&signature=cda9e9bf584a828c* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
*e-mail alternativo / * *alternate email:** edalcin@jbrj.org
edalcin@jbrj.org*
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/3a5eaa1df56016285886497766577e5357ddc6c1?url=http%3A%2F%2Fagendar.dalc.in&signature=c4e8d8113c34937f
On Mon, Sep 7, 2015 at 12:33 PM, Roderic Page Roderic.Page@glasgow.ac.uk wrote:
Hi Eduardo,
I’m curious, is the purpose to get counts by dataset by country, or to get all the plant occurrences for Brazil? The later can be obtained by downloading all plant occurrences in Brazil http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR (you could then compute the per-dataset stats locally). I realise that this isn’t as convenient as having GBIF slice the data for you in the API.
Regards
Rod
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778 <%2B44%20141%20330%204778> Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com ORCID: http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ResearchGate https://www.researchgate.net/profile/Roderic_Page
On 4 Sep 2015, at 10:39, Eduardo Dalcin edalcin@jbrj.org wrote:
Hi Markus,
Yes, that's a shame I can't have country and "nub" together. There is any hope about it?
Eduardo
*Eduardo Dalcin https://mailtrack.io/trace/link/bac23864202354f3789938ce352a878faa0cd8b8?url=http%3A%2F%2Feduardo.dalc.in&signature=aea58ef6f439535b* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
*e-mail alternativo / * *alternate email:** edalcin@jbrj.org
edalcin@jbrj.org*
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5
On Thu, Sep 3, 2015 at 4:29 PM, Markus Döring mdoering@gbif.org wrote:
Eduardo,
as you might have seen from my issue comment the webservice uses a different parameter name for taxonKey which is a bug we need to fix at some point. Please use nubKey for now to use the service like that:
http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6
The real problem for you will be that we do not support the combination of the country and the taxon filter, just one of the two. So you cannot search for plants in Brazil I am afraid, just for datasets about Brazil and datasets with plant records.
Markus
On 03 Sep 2015, at 14:12, Eduardo Dalcin edalcin@jbrj.org wrote:
Thanks Jan. I'll keep exploring and I'll be in touch, if I need.
Best,
Eduardo
Eduardo Dalcin Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
e-mail alternativo / alternate email: edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in
On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF] jlegind@gbif.org
wrote:
Dear Eduardo,
Thanks for getting in touch with us about these issues.
The first request
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... returns the number of records located in Brazil for the facets in the request.
The second query
http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... uses the Occurrence Inventories web service http://www.gbif.org/developer/occurrence#inventories which does not support the basis-of-record facet in the /datasets request. I understand that it would be better if the API response yielded an error message in this instance.
Concerning the other issues – you are indeed right that the counts do
not make sense in the context of taxon key 6 which is Plantae. Actually the API does not handle the taxonKey search at all, contrary to what the documentation states:
/occurrence/counts/datasets
GET
Counts
Lists occurrence counts for datasets that cover a given taxon or
country.
country, taxonKey
As you can see here,
http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 , this request doesn’t return anything.
The GBIF developers will handle this issue in due time.
You can follow the issue in our bug tracking service here:
http://dev.gbif.org/issues/browse/POR-2828
With best regards,
Jan K. Legind
Data manager, GBIF Secretariat
From: API-users [mailto:api-users-bounces@lists.gbif.org] On Behalf
Of Eduardo Dalcin
Sent: 2. september 2015 20:06 To: api-users@lists.gbif.org; dev@gbif.org Cc: João Monnerat Lanna; Natália Queiroz; Diogo Silva; Laura; Ricardo
Avancini
Subject: [API-users] Some questions from a begginer
Hi folks,
This is my first message to the list. So, please, be nice :)
I'm working here at Rio de Janeiro Botanical Garden, together with the
guys at the National Center for Flora Conservation. We are doing the risk assessment of the Brazilian flora to the government. We assess, so far, the risk of ca. 6.000 species, but we still have to assess ca. 35.000. Access occurrence records for Brazil is crucial, and every occurrence is important.
That means that we have to put together occurrence data from different
sources and, after the first batch of the risk assessment, we realize that we need to build up our aggregator. We are planning to do this with the Lontra-harvester, with the help of the guys at Brazilian GBIF Node.
So, the one of the firsts steps was to list the available resources to
understand the dimension of the task and, that brings me to my questions.
First:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO...
returns 4.982.689 records
And the request:
http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&...
returns (here) 7.406.310 records
Comments?
Second:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO...
return things like this:
"197908d0-5565-11d8-b290-b8a03c50a862":27629
But the consult of the same dataset:
http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5...
Returns "null" (of course, is a FishBase!)
I have plenty of examples like this, on yellow here (not finished!):
https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYy...
Comments?
I think those two questions is a good start. Please, let me know if
I'm doing something wrong.
Cheers,
Eduardo
Eduardo Dalcin
Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
e-mail: edalcin@jbrj.gov.br
Trabalho / Work: +55 21 3204 2116
e-mail alternativo / alternate email: edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing listAPI-users@lists.gbif.orghttp://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
Eduardo,
another difference in using downloads periodically is that you get the interpreted data from us (together with the original if you want to). That already contains quite a bit of data cleaning and aligning to controlled vocabularies that might be painful to reproduce otherwise. Also publishers are *very* often offline. Especially for the long running xml harvesting protocols (biocase,tapir,digir) this can be a bit of a challenge to index them entirely.
Markus
On 09 Sep 2015, at 20:02, Eduardo Dalcin edalcin@jbrj.org wrote:
Thanks Alex. Food for thought.
Best,
Eduardo
Eduardo Dalcin https://mailtrack.io/trace/link/216b27d1db9fb6d2aa5a6d7b07aaf7a6b6b19ed9?url=http%3A%2F%2Feduardo.dalc.in&signature=fd3c844b73e3c400 Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br mailto:edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
e-mail alternativo / alternate email: edalcin@jbrj.org mailto:edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/dc51cf0b6ea2d4897c2ae7a70bab52163650de15?url=http%3A%2F%2Fagendar.dalc.in&signature=c87a5e78e01a7976
On Wed, Sep 9, 2015 at 2:28 PM, Alex Thompson <godfoder@acis.ufl.edu mailto:godfoder@acis.ufl.edu> wrote: I'm kind of seconding Rod here.
It might make more sense, depending on your use case and local computer resources, to just get a download of Plantae *AND* Brazil from GBIF periodically, then process that to exclude existing Brazilian datasets. You could then use something like Apache hadoop / spark to efficiently split the file by dataset or by institution code.
This would greatly simplify your interactions with GBIF (down to just periodically generating a download programmatically) and you would have an easy place to insert any additional data transformations you want. This is the path i take for my work at least - the incremental cost of a couple million more records is worth the reduction in complexity overall.
- Alex
On 09/09/2015 12:16 PM, Eduardo Dalcin wrote:
Hi Rod,
The real purpose is to have a list of UUID and the "source web page" for the data set. Thus, one way to do it is to select those resources that counts <> 0 for PLANTAE *AND* Brazil.
I don't want to do any stats analysis, but feed up one local harverster / agregator.
The problem is, considering the reply from Jan Legind at Sep 3, we have to check one by one (https://goo.gl/3wysaA https://goo.gl/3wysaA) to check if it is a Herbarium / Preserved Specimen (Plantae) or not, from the request http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN.
Does it make sense?
Thanks for your curiosity! :)
Cheers,
Eduardo
Eduardo Dalcin https://mailtrack.io/trace/link/5516ed5e4f903c6ee9bd9fb3876fb65ffffc687c?url=http%3A%2F%2Feduardo.dalc.in&signature=cda9e9bf584a828c Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br mailto:edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116 tel:%2B55%2021%203204%202116
e-mail alternativo / alternate email: edalcin@jbrj.org mailto:edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/3a5eaa1df56016285886497766577e5357ddc6c1?url=http%3A%2F%2Fagendar.dalc.in&signature=c4e8d8113c34937f On Mon, Sep 7, 2015 at 12:33 PM, Roderic Page <Roderic.Page@glasgow.ac.uk mailto:Roderic.Page@glasgow.ac.uk> wrote: Hi Eduardo,
I’m curious, is the purpose to get counts by dataset by country, or to get all the plant occurrences for Brazil? The later can be obtained by downloading all plant occurrences in Brazil http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR (you could then compute the per-dataset stats locally). I realise that this isn’t as convenient as having GBIF slice the data for you in the API.
Regards
Rod
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: Roderic.Page@glasgow.ac.uk mailto:Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778 tel:%2B44%20141%20330%204778 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com http://iphylo.blogspot.com/ ORCID: http://orcid.org/0000-0002-7101-9767 http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ResearchGate https://www.researchgate.net/profile/Roderic_Page https://www.researchgate.net/profile/Roderic_Page
On 4 Sep 2015, at 10:39, Eduardo Dalcin <edalcin@jbrj.org mailto:edalcin@jbrj.org> wrote:
Hi Markus,
Yes, that's a shame I can't have country and "nub" together. There is any hope about it?
Eduardo
Eduardo Dalcin https://mailtrack.io/trace/link/bac23864202354f3789938ce352a878faa0cd8b8?url=http%3A%2F%2Feduardo.dalc.in&signature=aea58ef6f439535b Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br mailto:edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116 tel:%2B55%2021%203204%202116
e-mail alternativo / alternate email: edalcin@jbrj.org mailto:edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5 On Thu, Sep 3, 2015 at 4:29 PM, Markus Döring <mdoering@gbif.org mailto:mdoering@gbif.org> wrote: Eduardo,
as you might have seen from my issue comment the webservice uses a different parameter name for taxonKey which is a bug we need to fix at some point. Please use nubKey for now to use the service like that:
http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6 http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6
The real problem for you will be that we do not support the combination of the country and the taxon filter, just one of the two. So you cannot search for plants in Brazil I am afraid, just for datasets about Brazil and datasets with plant records.
Markus
On 03 Sep 2015, at 14:12, Eduardo Dalcin <edalcin@jbrj.org mailto:edalcin@jbrj.org> wrote:
Thanks Jan. I'll keep exploring and I'll be in touch, if I need.
Best,
Eduardo
Eduardo Dalcin Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br mailto:edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116 tel:%2B55%2021%203204%202116
e-mail alternativo / alternate email: edalcin@jbrj.org mailto:edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5
On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF] <jlegind@gbif.org mailto:jlegind@gbif.org> wrote: Dear Eduardo,
Thanks for getting in touch with us about these issues.
The first request http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN returns the number of records located in Brazil for the facets in the request.
The second query http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN uses the Occurrence Inventories web service http://www.gbif.org/developer/occurrence#inventories http://www.gbif.org/developer/occurrence#inventories which does not support the basis-of-record facet in the /datasets request. I understand that it would be better if the API response yielded an error message in this instance.
Concerning the other issues – you are indeed right that the counts do not make sense in the context of taxon key 6 which is Plantae. Actually the API does not handle the taxonKey search at all, contrary to what the documentation states:
/occurrence/counts/datasets
GET
Counts
Lists occurrence counts for datasets that cover a given taxon or country.
country, taxonKey
As you can see here, http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 , this request doesn’t return anything.
The GBIF developers will handle this issue in due time.
You can follow the issue in our bug tracking service here: http://dev.gbif.org/issues/browse/POR-2828 http://dev.gbif.org/issues/browse/POR-2828
With best regards,
Jan K. Legind
Data manager, GBIF Secretariat
From: API-users [mailto:api-users-bounces@lists.gbif.org mailto:api-users-bounces@lists.gbif.org] On Behalf Of Eduardo Dalcin Sent: 2. september 2015 20:06 To: api-users@lists.gbif.org mailto:api-users@lists.gbif.org; dev@gbif.org mailto:dev@gbif.org Cc: João Monnerat Lanna; Natália Queiroz; Diogo Silva; Laura; Ricardo Avancini Subject: [API-users] Some questions from a begginer
Hi folks,
This is my first message to the list. So, please, be nice :)
I'm working here at Rio de Janeiro Botanical Garden, together with the guys at the National Center for Flora Conservation. We are doing the risk assessment of the Brazilian flora to the government. We assess, so far, the risk of ca. 6.000 species, but we still have to assess ca. 35.000. Access occurrence records for Brazil is crucial, and every occurrence is important.
That means that we have to put together occurrence data from different sources and, after the first batch of the risk assessment, we realize that we need to build up our aggregator. We are planning to do this with the Lontra-harvester, with the help of the guys at Brazilian GBIF Node.
So, the one of the firsts steps was to list the available resources to understand the dimension of the task and, that brings me to my questions.
First:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
returns 4.982.689 records
And the request:
http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
returns (here) 7.406.310 records
Comments?
Second:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
return things like this:
"197908d0-5565-11d8-b290-b8a03c50a862":27629
But the consult of the same dataset:
http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5... http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5565-11d8-b290-b8a03c50a862
Returns "null" (of course, is a FishBase!)
I have plenty of examples like this, on yellow here (not finished!):
https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYy... https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYyj459VTc/edit?usp=sharing
Comments?
I think those two questions is a good start. Please, let me know if I'm doing something wrong.
Cheers,
Eduardo
Eduardo Dalcin
Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
e-mail: edalcin@jbrj.gov.br mailto:edalcin@jbrj.gov.br
Trabalho / Work: +55 21 3204 2116 tel:%2B55%2021%203204%202116
e-mail alternativo / alternate email: edalcin@jbrj.org mailto:edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5
API-users mailing list API-users@lists.gbif.org mailto:API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org mailto:API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org mailto:API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
Good points Markus, Thanks!
However, other publishers are *very* online, like this example:
"The New York Botanical Garden Herbarium (NY) - Vascular Plant Collection" ( http://www.gbif.org/dataset/d415c253-4d61-4459-9d25-4015b9084fb0 https://mailtrack.io/trace/link/0a54ebc017ec4ddde255d8f470cf1d5eb58d6ff1?url=http%3A%2F%2Fwww.gbif.org%2Fdataset%2Fd415c253-4d61-4459-9d25-4015b9084fb0&signature=9fbac047f6b2d815) and the "Herbarium of The New York Botanical Garden" ( http://www.gbif.org/dataset/7133ff0a-f762-11e1-a439-00145eb45e9a https://mailtrack.io/trace/link/c5595e540f23c50c332c5d3aba65d9b857daec6c?url=http%3A%2F%2Fwww.gbif.org%2Fdataset%2F7133ff0a-f762-11e1-a439-00145eb45e9a&signature=3cd4b1e2eec64e92 ).
Same stuff, twice.
The thing is that when we search for, for instance, "Belemia fucsioides" we got a duplication of records of the same entity:
http://www.gbif.org/occurrence/216419815 https://mailtrack.io/trace/link/ec43e42a6e6e903eea24db7611a53591ef91ecff?url=http%3A%2F%2Fwww.gbif.org%2Foccurrence%2F216419815&signature=48bd3b924b438606 http://www.gbif.org/occurrence/1098393958 https://mailtrack.io/trace/link/9e4d9ffa65cef4747df77c1c708df94d1da1b929?url=http%3A%2F%2Fwww.gbif.org%2Foccurrence%2F1098393958&signature=6d279f2c395d493a
This is very annoying and give us a lot of work to clean up.
Cheers,
Eduardo
-------------------------------- *Eduardo Dalcin https://mailtrack.io/trace/link/12fd73de9c0d11461d2da7249c58967486d95ffb?url=http%3A%2F%2Feduardo.dalc.in&signature=b76aae61fa71c8a0* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116 -------------------------------- *e-mail alternativo / * *alternate email:** edalcin@jbrj.org edalcin@jbrj.org* -------------------------------- Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/8eb76452df5772642c41cbc47d035ab63fb88da6?url=http%3A%2F%2Fagendar.dalc.in&signature=db7d545fe68e0cb0
On Thu, Sep 10, 2015 at 4:50 AM, Markus Döring mdoering@gbif.org wrote:
Eduardo,
another difference in using downloads periodically is that you get the interpreted data from us (together with the original if you want to). That already contains quite a bit of data cleaning and aligning to controlled vocabularies that might be painful to reproduce otherwise. Also publishers are *very* often offline. Especially for the long running xml harvesting protocols (biocase,tapir,digir) this can be a bit of a challenge to index them entirely.
Markus
On 09 Sep 2015, at 20:02, Eduardo Dalcin edalcin@jbrj.org wrote:
Thanks Alex. Food for thought.
Best,
Eduardo
*Eduardo Dalcin https://mailtrack.io/trace/link/216b27d1db9fb6d2aa5a6d7b07aaf7a6b6b19ed9?url=http%3A%2F%2Feduardo.dalc.in&signature=fd3c844b73e3c400* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
*e-mail alternativo / * *alternate email:** edalcin@jbrj.org
edalcin@jbrj.org*
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/dc51cf0b6ea2d4897c2ae7a70bab52163650de15?url=http%3A%2F%2Fagendar.dalc.in&signature=c87a5e78e01a7976
On Wed, Sep 9, 2015 at 2:28 PM, Alex Thompson godfoder@acis.ufl.edu wrote:
I'm kind of seconding Rod here.
It might make more sense, depending on your use case and local computer resources, to just get a download of Plantae *AND* Brazil from GBIF periodically, then process that to exclude existing Brazilian datasets. You could then use something like Apache hadoop / spark to efficiently split the file by dataset or by institution code.
This would greatly simplify your interactions with GBIF (down to just periodically generating a download programmatically) and you would have an easy place to insert any additional data transformations you want. This is the path i take for my work at least - the incremental cost of a couple million more records is worth the reduction in complexity overall.
- Alex
On 09/09/2015 12:16 PM, Eduardo Dalcin wrote:
Hi Rod,
The real purpose is to have a list of UUID and the "source web page" for the data set. Thus, one way to do it is to select those resources that counts <> 0 for PLANTAE *AND* Brazil.
I don't want to do any stats analysis, but feed up one local harverster / agregator.
The problem is, considering the reply from Jan Legind at Sep 3, we have to check one by one (https://goo.gl/3wysaA) to check if it is a Herbarium / Preserved Specimen (Plantae) or not, from the request http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... .
Does it make sense?
Thanks for your curiosity! :)
Cheers,
Eduardo
*Eduardo Dalcin https://mailtrack.io/trace/link/5516ed5e4f903c6ee9bd9fb3876fb65ffffc687c?url=http%3A%2F%2Feduardo.dalc.in&signature=cda9e9bf584a828c* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
*e-mail alternativo / * *alternate email:** edalcin@jbrj.org
edalcin@jbrj.org*
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/3a5eaa1df56016285886497766577e5357ddc6c1?url=http%3A%2F%2Fagendar.dalc.in&signature=c4e8d8113c34937f
On Mon, Sep 7, 2015 at 12:33 PM, Roderic Page <Roderic.Page@glasgow.ac.uk
wrote:
Hi Eduardo,
I’m curious, is the purpose to get counts by dataset by country, or to get all the plant occurrences for Brazil? The later can be obtained by downloading all plant occurrences in Brazil http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR (you could then compute the per-dataset stats locally). I realise that this isn’t as convenient as having GBIF slice the data for you in the API.
Regards
Rod
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778 <%2B44%20141%20330%204778> Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com ORCID: http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ResearchGate https://www.researchgate.net/profile/Roderic_Page
On 4 Sep 2015, at 10:39, Eduardo Dalcin edalcin@jbrj.org wrote:
Hi Markus,
Yes, that's a shame I can't have country and "nub" together. There is any hope about it?
Eduardo
*Eduardo Dalcin https://mailtrack.io/trace/link/bac23864202354f3789938ce352a878faa0cd8b8?url=http%3A%2F%2Feduardo.dalc.in&signature=aea58ef6f439535b* Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
*e-mail alternativo / * *alternate email:** edalcin@jbrj.org
edalcin@jbrj.org*
Agendar reunião / Schedule a meeting: http://agendar.dalc.in https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5
On Thu, Sep 3, 2015 at 4:29 PM, Markus Döring mdoering@gbif.org wrote:
Eduardo,
as you might have seen from my issue comment the webservice uses a different parameter name for taxonKey which is a bug we need to fix at some point. Please use nubKey for now to use the service like that:
http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6
The real problem for you will be that we do not support the combination of the country and the taxon filter, just one of the two. So you cannot search for plants in Brazil I am afraid, just for datasets about Brazil and datasets with plant records.
Markus
On 03 Sep 2015, at 14:12, Eduardo Dalcin edalcin@jbrj.org wrote:
Thanks Jan. I'll keep exploring and I'll be in touch, if I need.
Best,
Eduardo
Eduardo Dalcin Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116
e-mail alternativo / alternate email: edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in
On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF] jlegind@gbif.org
wrote:
Dear Eduardo,
Thanks for getting in touch with us about these issues.
The first request
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... returns the number of records located in Brazil for the facets in the request.
The second query
http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... uses the Occurrence Inventories web service http://www.gbif.org/developer/occurrence#inventories which does not support the basis-of-record facet in the /datasets request. I understand that it would be better if the API response yielded an error message in this instance.
Concerning the other issues – you are indeed right that the counts do
not make sense in the context of taxon key 6 which is Plantae. Actually the API does not handle the taxonKey search at all, contrary to what the documentation states:
/occurrence/counts/datasets
GET
Counts
Lists occurrence counts for datasets that cover a given taxon or
country.
country, taxonKey
As you can see here,
http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 , this request doesn’t return anything.
The GBIF developers will handle this issue in due time.
You can follow the issue in our bug tracking service here:
http://dev.gbif.org/issues/browse/POR-2828
With best regards,
Jan K. Legind
Data manager, GBIF Secretariat
From: API-users [mailto:api-users-bounces@lists.gbif.org] On Behalf
Of Eduardo Dalcin
Sent: 2. september 2015 20:06 To: api-users@lists.gbif.org; dev@gbif.org Cc: João Monnerat Lanna; Natália Queiroz; Diogo Silva; Laura; Ricardo
Avancini
Subject: [API-users] Some questions from a begginer
Hi folks,
This is my first message to the list. So, please, be nice :)
I'm working here at Rio de Janeiro Botanical Garden, together with
the guys at the National Center for Flora Conservation. We are doing the risk assessment of the Brazilian flora to the government. We assess, so far, the risk of ca. 6.000 species, but we still have to assess ca. 35.000. Access occurrence records for Brazil is crucial, and every occurrence is important.
That means that we have to put together occurrence data from
different sources and, after the first batch of the risk assessment, we realize that we need to build up our aggregator. We are planning to do this with the Lontra-harvester, with the help of the guys at Brazilian GBIF Node.
So, the one of the firsts steps was to list the available resources
to understand the dimension of the task and, that brings me to my questions.
First:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO...
returns 4.982.689 records
And the request:
http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&...
returns (here) 7.406.310 records
Comments?
Second:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO...
return things like this:
"197908d0-5565-11d8-b290-b8a03c50a862":27629
But the consult of the same dataset:
http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5...
Returns "null" (of course, is a FishBase!)
I have plenty of examples like this, on yellow here (not finished!):
https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYy...
Comments?
I think those two questions is a good start. Please, let me know if
I'm doing something wrong.
Cheers,
Eduardo
Eduardo Dalcin
Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
e-mail: edalcin@jbrj.gov.br
Trabalho / Work: +55 21 3204 2116
e-mail alternativo / alternate email: edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.in
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing listAPI-users@lists.gbif.orghttp://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
API-users mailing list API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
Hi Eduardo,
Unfortunately this is a commonly encountered problem, usually because data providers change the data GBIF harvests. In this case, the difference between the data sets is that one has the field “occurrenceID” set and the other doesn’t, so the records appear to be two different records to GBIF. In an ideal world the older dataset would be deleted or otherwise deprecated, and only the newer data displayed.
Regards
Rod
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: Roderic.Page@glasgow.ac.ukmailto:Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com ORCID: http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ResearchGate https://www.researchgate.net/profile/Roderic_Page
On 14 Sep 2015, at 17:25, Eduardo Dalcin <edalcin@jbrj.orgmailto:edalcin@jbrj.org> wrote:
Good points Markus, Thanks!
However, other publishers are *very* online, like this example:
"The New York Botanical Garden Herbarium (NY) - Vascular Plant Collection" (http://www.gbif.org/dataset/d415c253-4d61-4459-9d25-4015b9084fb0https://mailtrack.io/trace/link/0a54ebc017ec4ddde255d8f470cf1d5eb58d6ff1?url=http%3A%2F%2Fwww.gbif.org%2Fdataset%2Fd415c253-4d61-4459-9d25-4015b9084fb0&signature=9fbac047f6b2d815) and the "Herbarium of The New York Botanical Garden" (http://www.gbif.org/dataset/7133ff0a-f762-11e1-a439-00145eb45e9ahttps://mailtrack.io/trace/link/c5595e540f23c50c332c5d3aba65d9b857daec6c?url=http%3A%2F%2Fwww.gbif.org%2Fdataset%2F7133ff0a-f762-11e1-a439-00145eb45e9a&signature=3cd4b1e2eec64e92).
Same stuff, twice.
The thing is that when we search for, for instance, "Belemia fucsioides" we got a duplication of records of the same entity:
<FireShot Pro Screen Capture #076 - 'Occurrence Search Results' - www_gbif_org_occurrence_search_TAXON_KEY=5553637.png> http://www.gbif.org/occurrence/216419815https://mailtrack.io/trace/link/ec43e42a6e6e903eea24db7611a53591ef91ecff?url=http%3A%2F%2Fwww.gbif.org%2Foccurrence%2F216419815&signature=48bd3b924b438606 http://www.gbif.org/occurrence/1098393958https://mailtrack.io/trace/link/9e4d9ffa65cef4747df77c1c708df94d1da1b929?url=http%3A%2F%2Fwww.gbif.org%2Foccurrence%2F1098393958&signature=6d279f2c395d493a
This is very annoying and give us a lot of work to clean up.
Cheers,
Eduardo
-------------------------------- Eduardo Dalcinhttps://mailtrack.io/trace/link/12fd73de9c0d11461d2da7249c58967486d95ffb?url=http%3A%2F%2Feduardo.dalc.in&signature=b76aae61fa71c8a0 Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.brmailto:edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116 -------------------------------- e-mail alternativo / alternate email: edalcin@jbrj.orgmailto:edalcin@jbrj.org -------------------------------- Agendar reunião / Schedule a meeting: http://agendar.dalc.inhttps://mailtrack.io/trace/link/8eb76452df5772642c41cbc47d035ab63fb88da6?url=http%3A%2F%2Fagendar.dalc.in&signature=db7d545fe68e0cb0
On Thu, Sep 10, 2015 at 4:50 AM, Markus Döring <mdoering@gbif.orgmailto:mdoering@gbif.org> wrote: Eduardo,
another difference in using downloads periodically is that you get the interpreted data from us (together with the original if you want to). That already contains quite a bit of data cleaning and aligning to controlled vocabularies that might be painful to reproduce otherwise. Also publishers are *very* often offline. Especially for the long running xml harvesting protocols (biocase,tapir,digir) this can be a bit of a challenge to index them entirely.
Markus
On 09 Sep 2015, at 20:02, Eduardo Dalcin <edalcin@jbrj.orgmailto:edalcin@jbrj.org> wrote:
Thanks Alex. Food for thought.
Best,
Eduardo
-------------------------------- Eduardo Dalcinhttps://mailtrack.io/trace/link/216b27d1db9fb6d2aa5a6d7b07aaf7a6b6b19ed9?url=http%3A%2F%2Feduardo.dalc.in&signature=fd3c844b73e3c400 Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.brmailto:edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116tel:%2B55%2021%203204%202116 -------------------------------- e-mail alternativo / alternate email: edalcin@jbrj.orgmailto:edalcin@jbrj.org -------------------------------- Agendar reunião / Schedule a meeting: http://agendar.dalc.inhttps://mailtrack.io/trace/link/dc51cf0b6ea2d4897c2ae7a70bab52163650de15?url=http%3A%2F%2Fagendar.dalc.in&signature=c87a5e78e01a7976
On Wed, Sep 9, 2015 at 2:28 PM, Alex Thompson <godfoder@acis.ufl.edumailto:godfoder@acis.ufl.edu> wrote: I'm kind of seconding Rod here.
It might make more sense, depending on your use case and local computer resources, to just get a download of Plantae *AND* Brazil from GBIF periodically, then process that to exclude existing Brazilian datasets. You could then use something like Apache hadoop / spark to efficiently split the file by dataset or by institution code.
This would greatly simplify your interactions with GBIF (down to just periodically generating a download programmatically) and you would have an easy place to insert any additional data transformations you want. This is the path i take for my work at least - the incremental cost of a couple million more records is worth the reduction in complexity overall.
- Alex
On 09/09/2015 12:16 PM, Eduardo Dalcin wrote: Hi Rod,
The real purpose is to have a list of UUID and the "source web page" for the data set. Thus, one way to do it is to select those resources that counts <> 0 for PLANTAE *AND* Brazil.
I don't want to do any stats analysis, but feed up one local harverster / agregator.
The problem is, considering the reply from Jan Legind at Sep 3, we have to check one by one (https://goo.gl/3wysaA) to check if it is a Herbarium / Preserved Specimen (Plantae) or not, from the request http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&....
Does it make sense?
Thanks for your curiosity! :)
Cheers,
Eduardo
-------------------------------- Eduardo Dalcinhttps://mailtrack.io/trace/link/5516ed5e4f903c6ee9bd9fb3876fb65ffffc687c?url=http%3A%2F%2Feduardo.dalc.in&signature=cda9e9bf584a828c Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.brmailto:edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116tel:%2B55%2021%203204%202116 -------------------------------- e-mail alternativo / alternate email: edalcin@jbrj.orgmailto:edalcin@jbrj.org -------------------------------- Agendar reunião / Schedule a meeting: http://agendar.dalc.inhttps://mailtrack.io/trace/link/3a5eaa1df56016285886497766577e5357ddc6c1?url=http%3A%2F%2Fagendar.dalc.in&signature=c4e8d8113c34937f
On Mon, Sep 7, 2015 at 12:33 PM, Roderic Page <Roderic.Page@glasgow.ac.ukmailto:Roderic.Page@glasgow.ac.uk> wrote: Hi Eduardo,
I’m curious, is the purpose to get counts by dataset by country, or to get all the plant occurrences for Brazil? The later can be obtained by downloading all plant occurrences in Brazil http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR (you could then compute the per-dataset stats locally). I realise that this isn’t as convenient as having GBIF slice the data for you in the API.
Regards
Rod
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: Roderic.Page@glasgow.ac.ukmailto:Roderic.Page@glasgow.ac.uk Tel: +44 141 330 4778tel:%2B44%20141%20330%204778 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.comhttp://iphylo.blogspot.com/ ORCID: http://orcid.org/0000-0002-7101-9767 Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ResearchGate https://www.researchgate.net/profile/Roderic_Page
On 4 Sep 2015, at 10:39, Eduardo Dalcin <edalcin@jbrj.orgmailto:edalcin@jbrj.org> wrote:
Hi Markus,
Yes, that's a shame I can't have country and "nub" together. There is any hope about it?
Eduardo
-------------------------------- Eduardo Dalcinhttps://mailtrack.io/trace/link/bac23864202354f3789938ce352a878faa0cd8b8?url=http%3A%2F%2Feduardo.dalc.in&signature=aea58ef6f439535b Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.brmailto:edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116tel:%2B55%2021%203204%202116 -------------------------------- e-mail alternativo / alternate email: edalcin@jbrj.orgmailto:edalcin@jbrj.org -------------------------------- Agendar reunião / Schedule a meeting: http://agendar.dalc.inhttps://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5
On Thu, Sep 3, 2015 at 4:29 PM, Markus Döring <mdoering@gbif.orgmailto:mdoering@gbif.org> wrote: Eduardo,
as you might have seen from my issue comment the webservice uses a different parameter name for taxonKey which is a bug we need to fix at some point. Please use nubKey for now to use the service like that:
http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6
The real problem for you will be that we do not support the combination of the country and the taxon filter, just one of the two. So you cannot search for plants in Brazil I am afraid, just for datasets about Brazil and datasets with plant records.
Markus
On 03 Sep 2015, at 14:12, Eduardo Dalcin <edalcin@jbrj.orgmailto:edalcin@jbrj.org> wrote:
Thanks Jan. I'll keep exploring and I'll be in touch, if I need.
Best,
Eduardo
Eduardo Dalcin Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ e-mail: edalcin@jbrj.gov.brmailto:edalcin@jbrj.gov.br Trabalho / Work: +55 21 3204 2116tel:%2B55%2021%203204%202116
e-mail alternativo / alternate email: edalcin@jbrj.orgmailto:edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.inhttps://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5
On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF] <jlegind@gbif.orgmailto:jlegind@gbif.org> wrote: Dear Eduardo,
Thanks for getting in touch with us about these issues.
The first request http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO... returns the number of records located in Brazil for the facets in the request.
The second query http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&... uses the Occurrence Inventories web service http://www.gbif.org/developer/occurrence#inventories which does not support the basis-of-record facet in the /datasets request. I understand that it would be better if the API response yielded an error message in this instance.
Concerning the other issues – you are indeed right that the counts do not make sense in the context of taxon key 6 which is Plantae. Actually the API does not handle the taxonKey search at all, contrary to what the documentation states:
/occurrence/counts/datasets
GET
Counts
Lists occurrence counts for datasets that cover a given taxon or country.
country, taxonKey
As you can see here, http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 , this request doesn’t return anything.
The GBIF developers will handle this issue in due time.
You can follow the issue in our bug tracking service here: http://dev.gbif.org/issues/browse/POR-2828
With best regards,
Jan K. Legind
Data manager, GBIF Secretariat
From: API-users [mailto:api-users-bounces@lists.gbif.orgmailto:api-users-bounces@lists.gbif.org] On Behalf Of Eduardo Dalcin Sent: 2. september 2015 20:06 To: api-users@lists.gbif.orgmailto:api-users@lists.gbif.org; dev@gbif.orgmailto:dev@gbif.org Cc: João Monnerat Lanna; Natália Queiroz; Diogo Silva; Laura; Ricardo Avancini Subject: [API-users] Some questions from a begginer
Hi folks,
This is my first message to the list. So, please, be nice :)
I'm working here at Rio de Janeiro Botanical Garden, together with the guys at the National Center for Flora Conservation. We are doing the risk assessment of the Brazilian flora to the government. We assess, so far, the risk of ca. 6.000 species, but we still have to assess ca. 35.000. Access occurrence records for Brazil is crucial, and every occurrence is important.
That means that we have to put together occurrence data from different sources and, after the first batch of the risk assessment, we realize that we need to build up our aggregator. We are planning to do this with the Lontra-harvester, with the help of the guys at Brazilian GBIF Node.
So, the one of the firsts steps was to list the available resources to understand the dimension of the task and, that brings me to my questions.
First:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO...
returns 4.982.689 records
And the request:
http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&...
returns (here) 7.406.310 records
Comments?
Second:
The request:
http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisO...
return things like this:
"197908d0-5565-11d8-b290-b8a03c50a862":27629
But the consult of the same dataset:
http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5...
Returns "null" (of course, is a FishBase!)
I have plenty of examples like this, on yellow here (not finished!):
https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYy...
Comments?
I think those two questions is a good start. Please, let me know if I'm doing something wrong.
Cheers,
Eduardo
Eduardo Dalcin
Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
e-mail: edalcin@jbrj.gov.brmailto:edalcin@jbrj.gov.br
Trabalho / Work: +55 21 3204 2116tel:%2B55%2021%203204%202116
e-mail alternativo / alternate email: edalcin@jbrj.orgmailto:edalcin@jbrj.org
Agendar reunião / Schedule a meeting: http://agendar.dalc.inhttps://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5
_______________________________________________ API-users mailing list API-users@lists.gbif.orgmailto:API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
_______________________________________________ API-users mailing list API-users@lists.gbif.orgmailto:API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
_______________________________________________ API-users mailing list API-users@lists.gbif.orgmailto:API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
_______________________________________________ API-users mailing list API-users@lists.gbif.orgmailto:API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
_______________________________________________ API-users mailing list API-users@lists.gbif.orgmailto:API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
participants (8)
-
Alex Thompson
-
Eduardo Dalcin
-
Jan Legind [GBIF]
-
Javier Otegui
-
Markus Döring
-
Mauro Cavalcanti
-
Roderic Page
-
Scott Chamberlain