[API-users] Some questions from a begginer

Javier Otegui javier.otegui at gmail.com
Wed Sep 9 22:23:49 CEST 2015


Hi Eduardo (et al.),

If I understand correctly, the list at https://goo.gl/3wysaA shows the
resources with data from Brazil and you want to filter out those with
records other than Plants, am I right? Have you considered using OpenRefine
(http://openrefine.org/) for this task? OpenRefine has a service to fetch
URLs built based on data from other columns, which plays very well with
GBIF APIs. You can make the program dinamically build the API request URL
based on the dataset UUID, and fetch and parse the JSON response, without
having to download the data and without having to code almost anything. The
way I would go here is:

   1. Create a column based off of the value in column A of your table, to
   extract just the dataset UUID
   2. Create a new column fetching the GBIF API, adding the value in the
   previous column to a template URL:
   http://api.gbif.org/v1/occurrence/search?TAXON_KEY=6&limit=1&DATASET_KEY=
   <value>. The "limit:1" part makes things faster by avoiding having to
   show the default 20 records in the column
   3. Create yet another column parsing the JSON result from the previous
   column, extracting just the value in the field "count". The result is the
   number of plant records in that dataset (therefore, resources such as
   FishBase will have a value of zero)

Actually, you can add as many columns as you want, with as many API calls,
to fill the rest of the fields in your table. Using the "registry" API, you
can get the title, external data link and the protocol (IPT, DiGIR...).

Hope this helps. Let me know if you are interested in this approach and
need more help using OpenRefine.
Cheers!

Javier Otegui
http://www.jotegui.com

On Wed, Sep 9, 2015 at 8:07 PM, Mauro Cavalcanti <maurobio at gmail.com> wrote:

> Scott,
>
> That's my very point - that using R and rgbif should be the best path to
> take in this case, both because of the easier access to the GBIF API
> provided by rgbif and the HUGE data analytical capabilities of R itself. I
> had been working on a paper discussing this in the context of conservation
> databases (using R/rgbif and a Red-Listed group of mammals as an exemple),
> but unfortunately this work has been delayed by unexpected health problems.
> Hope it can be the light someday, however.
>
> Best regards,
> Em 09/09/2015 14:44, "Scott Chamberlain" <scott at ropensci.org> escreveu:
>
>> Note that the R client rgbif does interface with the GBIF download API in
>> addition to the search API - making it easier to deal with larger datasets.
>> This works even if you downloaded bulk data from the GBIF GUI.  Ignore this
>> if you don't use R :)
>>
>> Best, S
>>
>> On Wed, Sep 9, 2015 at 10:35 AM Alex Thompson <godfoder at acis.ufl.edu>
>> wrote:
>>
>>> I'm kind of seconding Rod here.
>>>
>>> It might make more sense, depending on your use case and local computer
>>> resources, to just get a download of Plantae *AND* Brazil from GBIF
>>> periodically, then process that to exclude existing Brazilian datasets. You
>>> could then use something like Apache hadoop / spark to efficiently split
>>> the file by dataset or by institution code.
>>>
>>> This would greatly simplify your interactions with GBIF (down to just
>>> periodically generating a download programmatically) and you would have an
>>> easy place to insert any additional data transformations you want. This is
>>> the path i take for my work at least - the incremental cost of a couple
>>> million more records is worth the reduction in complexity overall.
>>>
>>>
>>> - Alex
>>>
>>>
>>> On 09/09/2015 12:16 PM, Eduardo Dalcin wrote:
>>>
>>> Hi Rod,
>>>
>>> The real purpose is to have a list of UUID and the "source web page" for
>>> the data set. Thus, one way to do it is to select those resources that
>>> counts <> 0 for PLANTAE *AND* Brazil.
>>>
>>> I don't want to do any stats analysis, but feed up one local harverster
>>> / agregator.
>>>
>>> The problem is, considering the reply from Jan Legind at Sep 3, we have
>>> to check one by one (https://goo.gl/3wysaA) to check if it is a
>>> Herbarium / Preserved Specimen (Plantae) or not, from the request
>>> http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>>> .
>>>
>>> Does it make sense?
>>>
>>> Thanks for your curiosity! :)
>>>
>>> Cheers,
>>>
>>> Eduardo
>>>
>>>
>>> --------------------------------
>>> *Eduardo Dalcin
>>> <https://mailtrack.io/trace/link/5516ed5e4f903c6ee9bd9fb3876fb65ffffc687c?url=http%3A%2F%2Feduardo.dalc.in&signature=cda9e9bf584a828c>*
>>> Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
>>> e-mail: edalcin at jbrj.gov.br
>>> Trabalho / Work: +55 21 3204 2116
>>> --------------------------------
>>> *e-mail alternativo / * *alternate email:** edalcin at jbrj.org
>>> <edalcin at jbrj.org>*
>>> --------------------------------
>>> Agendar reunião / Schedule a meeting: http://agendar.dalc.in
>>> <https://mailtrack.io/trace/link/3a5eaa1df56016285886497766577e5357ddc6c1?url=http%3A%2F%2Fagendar.dalc.in&signature=c4e8d8113c34937f>
>>>
>>> On Mon, Sep 7, 2015 at 12:33 PM, Roderic Page <
>>> Roderic.Page at glasgow.ac.uk> wrote:
>>>
>>>> Hi Eduardo,
>>>>
>>>> I’m curious, is the purpose to get counts by dataset by country, or to
>>>> get all the plant occurrences for Brazil? The later can be obtained by
>>>> downloading all plant occurrences in Brazil
>>>> http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR (you
>>>> could then compute the per-dataset stats locally). I realise that this
>>>> isn’t as convenient as having GBIF slice the data for you in the API.
>>>>
>>>> Regards
>>>>
>>>> Rod
>>>>
>>>> ---------------------------------------------------------
>>>> Roderic Page
>>>> Professor of Taxonomy
>>>> Institute of Biodiversity, Animal Health and Comparative Medicine
>>>> College of Medical, Veterinary and Life Sciences
>>>> Graham Kerr Building
>>>> University of Glasgow
>>>> Glasgow G12 8QQ, UK
>>>>
>>>> Email:  Roderic.Page at glasgow.ac.uk
>>>> Tel:  +44 141 330 4778 <%2B44%20141%20330%204778>
>>>> Skype:  rdmpage
>>>> Facebook:  http://www.facebook.com/rdmpage
>>>> LinkedIn:  http://uk.linkedin.com/in/rdmpage
>>>> Twitter:  http://twitter.com/rdmpage
>>>> Blog:  http://iphylo.blogspot.com
>>>> ORCID:  http://orcid.org/0000-0002-7101-9767
>>>> Citations:
>>>> http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
>>>> ResearchGate https://www.researchgate.net/profile/Roderic_Page
>>>>
>>>>
>>>> On 4 Sep 2015, at 10:39, Eduardo Dalcin <edalcin at jbrj.org> wrote:
>>>>
>>>> Hi Markus,
>>>>
>>>> Yes, that's a shame I can't have country and "nub" together. There is
>>>> any hope about it?
>>>>
>>>> Eduardo
>>>>
>>>>
>>>> --------------------------------
>>>> *Eduardo Dalcin
>>>> <https://mailtrack.io/trace/link/bac23864202354f3789938ce352a878faa0cd8b8?url=http%3A%2F%2Feduardo.dalc.in&signature=aea58ef6f439535b>*
>>>> Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
>>>> e-mail: edalcin at jbrj.gov.br
>>>> Trabalho / Work: +55 21 3204 2116
>>>> --------------------------------
>>>> *e-mail alternativo / * *alternate email:** edalcin at jbrj.org
>>>> <edalcin at jbrj.org>*
>>>> --------------------------------
>>>> Agendar reunião / Schedule a meeting: http://agendar.dalc.in
>>>> <https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5>
>>>>
>>>> On Thu, Sep 3, 2015 at 4:29 PM, Markus Döring <mdoering at gbif.org>
>>>> wrote:
>>>>
>>>>> Eduardo,
>>>>>
>>>>> as you might have seen from my issue comment the webservice uses a
>>>>> different parameter name for taxonKey which is a bug we need to fix at some
>>>>> point.
>>>>> Please use nubKey for now to use the service like that:
>>>>>
>>>>> http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6
>>>>>
>>>>> The real problem for you will be that we do not support the
>>>>> combination of the country and the taxon filter, just one of the two. So
>>>>> you cannot search for plants in Brazil I am afraid, just for datasets about
>>>>> Brazil and datasets with plant records.
>>>>>
>>>>> Markus
>>>>>
>>>>>
>>>>>
>>>>> > On 03 Sep 2015, at 14:12, Eduardo Dalcin <edalcin at jbrj.org> wrote:
>>>>> >
>>>>> > Thanks Jan. I'll keep exploring and I'll be in touch, if I need.
>>>>> >
>>>>> > Best,
>>>>> >
>>>>> > Eduardo
>>>>> >
>>>>> >
>>>>> >
>>>>> > --------------------------------
>>>>> > Eduardo Dalcin
>>>>> > Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
>>>>> > e-mail: edalcin at jbrj.gov.br
>>>>> > Trabalho / Work: +55 21 3204 2116
>>>>> > --------------------------------
>>>>> > e-mail alternativo /  alternate email: edalcin at jbrj.org
>>>>> > --------------------------------
>>>>> > Agendar reunião / Schedule a meeting: http://agendar.dalc.in
>>>>> <https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5>
>>>>> >
>>>>> > On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF] <jlegind at gbif.org>
>>>>> wrote:
>>>>> > Dear Eduardo,
>>>>> >
>>>>> >
>>>>> >
>>>>> > Thanks for getting in touch with us about these issues.
>>>>> >
>>>>> >
>>>>> >
>>>>> > The first request
>>>>> http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>>>>> returns the number of records located in Brazil for the facets in the
>>>>> request.
>>>>> >
>>>>> > The second query
>>>>> http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>>>>> uses the Occurrence Inventories web service
>>>>> http://www.gbif.org/developer/occurrence#inventories which does not
>>>>> support the basis-of-record facet in the /datasets request. I understand
>>>>> that it would be better if the API response yielded an error message in
>>>>> this instance.
>>>>> >
>>>>> >
>>>>> >
>>>>> > Concerning the other issues – you are indeed right that the counts
>>>>> do not make sense in the context of taxon key 6 which is Plantae. Actually
>>>>> the API does not handle the taxonKey search at all, contrary to what the
>>>>> documentation states:
>>>>> >
>>>>> >
>>>>> >
>>>>> > /occurrence/counts/datasets
>>>>> >
>>>>> > GET
>>>>> >
>>>>> > Counts
>>>>> >
>>>>> > Lists occurrence counts for datasets that cover a given taxon or
>>>>> country.
>>>>> >
>>>>> > country, taxonKey
>>>>> >
>>>>> >
>>>>> >
>>>>> > As you can see here,
>>>>> http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 , this
>>>>> request doesn’t return anything.
>>>>> >
>>>>> >
>>>>> >
>>>>> > The GBIF developers will handle this issue in due time.
>>>>> >
>>>>> > You can follow the issue in our bug tracking service here:
>>>>> http://dev.gbif.org/issues/browse/POR-2828
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > With best regards,
>>>>> >
>>>>> >
>>>>> >
>>>>> > Jan K. Legind
>>>>> >
>>>>> > Data manager, GBIF Secretariat
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > From: API-users [mailto:api-users-bounces at lists.gbif.org] On Behalf
>>>>> Of Eduardo Dalcin
>>>>> > Sent: 2. september 2015 20:06
>>>>> > To: api-users at lists.gbif.org; dev at gbif.org
>>>>> > Cc: João Monnerat Lanna; Natália Queiroz; Diogo Silva; Laura;
>>>>> Ricardo Avancini
>>>>> > Subject: [API-users] Some questions from a begginer
>>>>> >
>>>>> >
>>>>> >
>>>>> > Hi folks,
>>>>> >
>>>>> >
>>>>> >
>>>>> > This is my first message to the list. So, please, be nice :)
>>>>> >
>>>>> >
>>>>> >
>>>>> > I'm working here at Rio de Janeiro Botanical Garden, together with
>>>>> the guys at the National Center for Flora Conservation. We are doing the
>>>>> risk assessment of the Brazilian flora to the government. We assess, so
>>>>> far, the risk of ca. 6.000 species, but we still have to assess ca. 35.000.
>>>>> Access occurrence records for Brazil is crucial, and every occurrence is
>>>>> important.
>>>>> >
>>>>> >
>>>>> >
>>>>> > That means that we have to put together occurrence data from
>>>>> different sources and, after the first batch of the risk assessment, we
>>>>> realize that we need to build up our aggregator. We are planning to do this
>>>>> with the Lontra-harvester, with the help of the guys at Brazilian GBIF Node.
>>>>> >
>>>>> >
>>>>> >
>>>>> > So, the one of the firsts steps was to list the available resources
>>>>> to understand the dimension of the task and, that brings me to my questions.
>>>>> >
>>>>> >
>>>>> >
>>>>> > First:
>>>>> >
>>>>> >
>>>>> >
>>>>> > The request:
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>>>>> >
>>>>> >
>>>>> >
>>>>> > returns 4.982.689 records
>>>>> >
>>>>> >
>>>>> >
>>>>> > And the request:
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>>>>> >
>>>>> >
>>>>> >
>>>>> > returns (here) 7.406.310 records
>>>>> >
>>>>> >
>>>>> >
>>>>> > Comments?
>>>>> >
>>>>> >
>>>>> >
>>>>> > Second:
>>>>> >
>>>>> >
>>>>> >
>>>>> > The request:
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>>>>> >
>>>>> >
>>>>> >
>>>>> > return things like this:
>>>>> >
>>>>> >
>>>>> >
>>>>> > "197908d0-5565-11d8-b290-b8a03c50a862":27629
>>>>> >
>>>>> >
>>>>> > But the consult of the same dataset:
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5565-11d8-b290-b8a03c50a862
>>>>> >
>>>>> >
>>>>> >
>>>>> > Returns "null" (of course, is a FishBase!)
>>>>> >
>>>>> >
>>>>> >
>>>>> > I have plenty of examples like this, on yellow here (not finished!):
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYyj459VTc/edit?usp=sharing
>>>>> >
>>>>> >
>>>>> >
>>>>> > Comments?
>>>>> >
>>>>> >
>>>>> >
>>>>> > I think those two questions is a good start. Please, let me know if
>>>>> I'm doing something wrong.
>>>>> >
>>>>> >
>>>>> >
>>>>> > Cheers,
>>>>> >
>>>>> >
>>>>> >
>>>>> > Eduardo
>>>>> >
>>>>> > --------------------------------
>>>>> >
>>>>> > Eduardo Dalcin
>>>>> >
>>>>> > Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
>>>>> >
>>>>> > e-mail: edalcin at jbrj.gov.br
>>>>> >
>>>>> > Trabalho / Work: +55 21 3204 2116
>>>>> >
>>>>> > --------------------------------
>>>>> >
>>>>> > e-mail alternativo /  alternate email: edalcin at jbrj.org
>>>>> >
>>>>> > --------------------------------
>>>>> >
>>>>> > Agendar reunião / Schedule a meeting: http://agendar.dalc.in
>>>>> <https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5>
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>> _______________________________________________
>>>> API-users mailing list
>>>> API-users at lists.gbif.org
>>>> http://lists.gbif.org/mailman/listinfo/api-users
>>>>
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> API-users mailing listAPI-users at lists.gbif.orghttp://lists.gbif.org/mailman/listinfo/api-users
>>>
>>>
>>> _______________________________________________
>>> API-users mailing list
>>> API-users at lists.gbif.org
>>> http://lists.gbif.org/mailman/listinfo/api-users
>>>
>>
>> _______________________________________________
>> API-users mailing list
>> API-users at lists.gbif.org
>> http://lists.gbif.org/mailman/listinfo/api-users
>>
>>
> _______________________________________________
> API-users mailing list
> API-users at lists.gbif.org
> http://lists.gbif.org/mailman/listinfo/api-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gbif.org/pipermail/api-users/attachments/20150909/aceb9bbd/attachment-0001.html>


More information about the API-users mailing list