[API-users] Some questions from a begginer

Alex Thompson godfoder at acis.ufl.edu
Wed Sep 9 19:28:41 CEST 2015


I'm kind of seconding Rod here.

It might make more sense, depending on your use case and local computer 
resources, to just get a download of Plantae *AND* Brazil from GBIF 
periodically, then process that to exclude existing Brazilian datasets. 
You could then use something like Apache hadoop / spark to efficiently 
split the file by dataset or by institution code.

This would greatly simplify your interactions with GBIF (down to just 
periodically generating a download programmatically) and you would have 
an easy place to insert any additional data transformations you want. 
This is the path i take for my work at least - the incremental cost of a 
couple million more records is worth the reduction in complexity overall.

- Alex

On 09/09/2015 12:16 PM, Eduardo Dalcin wrote:
> Hi Rod,
>
> The real purpose is to have a list of UUID and the "source web page" 
> for the data set. Thus, one way to do it is to select those resources 
> that counts <> 0 for PLANTAE *AND* Brazil.
>
> I don't want to do any stats analysis, but feed up one local 
> harverster / agregator.
>
> The problem is, considering the reply from Jan Legind at Sep 3, we 
> have to check one by one (https://goo.gl/3wysaA) to check if it is a 
> Herbarium / Preserved Specimen (Plantae) or not, from the request 
> http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN.
>
> Does it make sense?
>
> Thanks for your curiosity! :)
>
> Cheers,
>
> Eduardo
>
>
> --------------------------------
> *Eduardo Dalcin 
> <https://mailtrack.io/trace/link/5516ed5e4f903c6ee9bd9fb3876fb65ffffc687c?url=http%3A%2F%2Feduardo.dalc.in&signature=cda9e9bf584a828c>*
> **Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
> e-mail: edalcin at jbrj.gov.br <mailto:edalcin at jbrj.gov.br>
> Trabalho / Work: +55 21 3204 2116
> --------------------------------
> *e-mail alternativo / **alternate email:**edalcin at jbrj.org 
> <mailto:edalcin at jbrj.org>*
> --------------------------------
> Agendar reunião / Schedule a meeting: http://agendar.dalc.in 
> <https://mailtrack.io/trace/link/3a5eaa1df56016285886497766577e5357ddc6c1?url=http%3A%2F%2Fagendar.dalc.in&signature=c4e8d8113c34937f>
>
> On Mon, Sep 7, 2015 at 12:33 PM, Roderic Page 
> <Roderic.Page at glasgow.ac.uk <mailto:Roderic.Page at glasgow.ac.uk>> wrote:
>
>     Hi Eduardo,
>
>     I’m curious, is the purpose to get counts by dataset by country,
>     or to get all the plant occurrences for Brazil? The later can be
>     obtained by downloading all plant occurrences in Brazil
>     http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR (you
>     could then compute the per-dataset stats locally). I realise that
>     this isn’t as convenient as having GBIF slice the data for you in
>     the API.
>
>     Regards
>
>     Rod
>
>     ---------------------------------------------------------
>     Roderic Page
>     Professor of Taxonomy
>     Institute of Biodiversity, Animal Health and Comparative Medicine
>     College of Medical, Veterinary and Life Sciences
>     Graham Kerr Building
>     University of Glasgow
>     Glasgow G12 8QQ, UK
>
>     Email: Roderic.Page at glasgow.ac.uk <mailto:Roderic.Page at glasgow.ac.uk>
>     Tel: +44 141 330 4778 <tel:%2B44%20141%20330%204778>
>     Skype: rdmpage
>     Facebook: http://www.facebook.com/rdmpage
>     LinkedIn: http://uk.linkedin.com/in/rdmpage
>     Twitter: http://twitter.com/rdmpage
>     Blog: http://iphylo.blogspot.com
>     ORCID: http://orcid.org/0000-0002-7101-9767
>     Citations:
>     http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
>     ResearchGatehttps://www.researchgate.net/profile/Roderic_Page
>
>
>>     On 4 Sep 2015, at 10:39, Eduardo Dalcin <edalcin at jbrj.org
>>     <mailto:edalcin at jbrj.org>> wrote:
>>
>>     Hi Markus,
>>
>>     Yes, that's a shame I can't have country and "nub" together.
>>     There is any hope about it?
>>
>>     Eduardo
>>
>>
>>     --------------------------------
>>     *Eduardo Dalcin
>>     <https://mailtrack.io/trace/link/bac23864202354f3789938ce352a878faa0cd8b8?url=http%3A%2F%2Feduardo.dalc.in&signature=aea58ef6f439535b>*
>>     **Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
>>     e-mail: edalcin at jbrj.gov.br <mailto:edalcin at jbrj.gov.br>
>>     Trabalho / Work: +55 21 3204 2116 <tel:%2B55%2021%203204%202116>
>>     --------------------------------
>>     *e-mail alternativo / **alternate email:**edalcin at jbrj.org
>>     <mailto:edalcin at jbrj.org>*
>>     --------------------------------
>>     Agendar reunião / Schedule a meeting: http://agendar.dalc.in
>>     <https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5>
>>
>>     On Thu, Sep 3, 2015 at 4:29 PM, Markus Döring <mdoering at gbif.org
>>     <mailto:mdoering at gbif.org>> wrote:
>>
>>         Eduardo,
>>
>>         as you might have seen from my issue comment the webservice
>>         uses a different parameter name for taxonKey which is a bug
>>         we need to fix at some point.
>>         Please use nubKey for now to use the service like that:
>>
>>         http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6
>>
>>         The real problem for you will be that we do not support the
>>         combination of the country and the taxon filter, just one of
>>         the two. So you cannot search for plants in Brazil I am
>>         afraid, just for datasets about Brazil and datasets with
>>         plant records.
>>
>>         Markus
>>
>>
>>
>>         > On 03 Sep 2015, at 14:12, Eduardo Dalcin <edalcin at jbrj.org
>>         <mailto:edalcin at jbrj.org>> wrote:
>>         >
>>         > Thanks Jan. I'll keep exploring and I'll be in touch, if I
>>         need.
>>         >
>>         > Best,
>>         >
>>         > Eduardo
>>         >
>>         >
>>         >
>>         > --------------------------------
>>         > Eduardo Dalcin
>>         > Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
>>         > e-mail: edalcin at jbrj.gov.br <mailto:edalcin at jbrj.gov.br>
>>         > Trabalho / Work: +55 21 3204 2116
>>         <tel:%2B55%2021%203204%202116>
>>         > --------------------------------
>>         > e-mail alternativo / alternate email: edalcin at jbrj.org
>>         <mailto:edalcin at jbrj.org>
>>         > --------------------------------
>>         > Agendar reunião / Schedule a meeting:
>>         http://agendar.dalc.in
>>         <https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5>
>>         >
>>         > On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF]
>>         <jlegind at gbif.org <mailto:jlegind at gbif.org>> wrote:
>>         > Dear Eduardo,
>>         >
>>         >
>>         >
>>         > Thanks for getting in touch with us about these issues.
>>         >
>>         >
>>         >
>>         > The first request
>>         http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>>         returns the number of records located in Brazil for the
>>         facets in the request.
>>         >
>>         > The second query
>>         http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>>         uses the Occurrence Inventories web service
>>         http://www.gbif.org/developer/occurrence#inventories which
>>         does not support the basis-of-record facet in the /datasets
>>         request. I understand that it would be better if the API
>>         response yielded an error message in this instance.
>>         >
>>         >
>>         >
>>         > Concerning the other issues – you are indeed right that the
>>         counts do not make sense in the context of taxon key 6 which
>>         is Plantae. Actually the API does not handle the taxonKey
>>         search at all, contrary to what the documentation states:
>>         >
>>         >
>>         >
>>         > /occurrence/counts/datasets
>>         >
>>         > GET
>>         >
>>         > Counts
>>         >
>>         > Lists occurrence counts for datasets that cover a given
>>         taxon or country.
>>         >
>>         > country, taxonKey
>>         >
>>         >
>>         >
>>         > As you can see here,
>>         http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6
>>         , this request doesn’t return anything.
>>         >
>>         >
>>         >
>>         > The GBIF developers will handle this issue in due time.
>>         >
>>         > You can follow the issue in our bug tracking service here:
>>         http://dev.gbif.org/issues/browse/POR-2828
>>         >
>>         >
>>         >
>>         >
>>         >
>>         > With best regards,
>>         >
>>         >
>>         >
>>         > Jan K. Legind
>>         >
>>         > Data manager, GBIF Secretariat
>>         >
>>         >
>>         >
>>         >
>>         >
>>         > From: API-users [mailto:api-users-bounces at lists.gbif.org
>>         <mailto:api-users-bounces at lists.gbif.org>] On Behalf Of
>>         Eduardo Dalcin
>>         > Sent: 2. september 2015 20:06
>>         > To: api-users at lists.gbif.org
>>         <mailto:api-users at lists.gbif.org>; dev at gbif.org
>>         <mailto:dev at gbif.org>
>>         > Cc: João Monnerat Lanna; Natália Queiroz; Diogo Silva;
>>         Laura; Ricardo Avancini
>>         > Subject: [API-users] Some questions from a begginer
>>         >
>>         >
>>         >
>>         > Hi folks,
>>         >
>>         >
>>         >
>>         > This is my first message to the list. So, please, be nice :)
>>         >
>>         >
>>         >
>>         > I'm working here at Rio de Janeiro Botanical Garden,
>>         together with the guys at the National Center for Flora
>>         Conservation. We are doing the risk assessment of the
>>         Brazilian flora to the government. We assess, so far, the
>>         risk of ca. 6.000 species, but we still have to assess ca.
>>         35.000. Access occurrence records for Brazil is crucial, and
>>         every occurrence is important.
>>         >
>>         >
>>         >
>>         > That means that we have to put together occurrence data
>>         from different sources and, after the first batch of the risk
>>         assessment, we realize that we need to build up our
>>         aggregator. We are planning to do this with the
>>         Lontra-harvester, with the help of the guys at Brazilian GBIF
>>         Node.
>>         >
>>         >
>>         >
>>         > So, the one of the firsts steps was to list the available
>>         resources to understand the dimension of the task and, that
>>         brings me to my questions.
>>         >
>>         >
>>         >
>>         > First:
>>         >
>>         >
>>         >
>>         > The request:
>>         >
>>         >
>>         >
>>         >
>>         http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>>         >
>>         >
>>         >
>>         > returns 4.982.689 records
>>         >
>>         >
>>         >
>>         > And the request:
>>         >
>>         >
>>         >
>>         >
>>         http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>>         >
>>         >
>>         >
>>         > returns (here) 7.406.310 records
>>         >
>>         >
>>         >
>>         > Comments?
>>         >
>>         >
>>         >
>>         > Second:
>>         >
>>         >
>>         >
>>         > The request:
>>         >
>>         >
>>         >
>>         >
>>         http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>>         >
>>         >
>>         >
>>         > return things like this:
>>         >
>>         >
>>         >
>>         > "197908d0-5565-11d8-b290-b8a03c50a862":27629
>>         >
>>         >
>>         > But the consult of the same dataset:
>>         >
>>         >
>>         >
>>         >
>>         http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5565-11d8-b290-b8a03c50a862
>>         >
>>         >
>>         >
>>         > Returns "null" (of course, is a FishBase!)
>>         >
>>         >
>>         >
>>         > I have plenty of examples like this, on yellow here (not
>>         finished!):
>>         >
>>         >
>>         >
>>         >
>>         https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYyj459VTc/edit?usp=sharing
>>         >
>>         >
>>         >
>>         > Comments?
>>         >
>>         >
>>         >
>>         > I think those two questions is a good start. Please, let me
>>         know if I'm doing something wrong.
>>         >
>>         >
>>         >
>>         > Cheers,
>>         >
>>         >
>>         >
>>         > Eduardo
>>         >
>>         > --------------------------------
>>         >
>>         > Eduardo Dalcin
>>         >
>>         > Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
>>         >
>>         > e-mail: edalcin at jbrj.gov.br <mailto:edalcin at jbrj.gov.br>
>>         >
>>         > Trabalho / Work: +55 21 3204 2116
>>         <tel:%2B55%2021%203204%202116>
>>         >
>>         > --------------------------------
>>         >
>>         > e-mail alternativo / alternate email: edalcin at jbrj.org
>>         <mailto:edalcin at jbrj.org>
>>         >
>>         > --------------------------------
>>         >
>>         > Agendar reunião / Schedule a meeting:
>>         http://agendar.dalc.in
>>         <https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5>
>>         >
>>         >
>>         >
>>         >
>>
>>
>>     _______________________________________________
>>     API-users mailing list
>>     API-users at lists.gbif.org <mailto:API-users at lists.gbif.org>
>>     http://lists.gbif.org/mailman/listinfo/api-users
>
>
>
>
> _______________________________________________
> API-users mailing list
> API-users at lists.gbif.org
> http://lists.gbif.org/mailman/listinfo/api-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gbif.org/pipermail/api-users/attachments/20150909/6c8dba77/attachment-0001.html>


More information about the API-users mailing list