[API-users] Some questions from a begginer

Javier Otegui javier.otegui at gmail.com
Wed Sep 9 23:16:16 CEST 2015


Mauro,

Agreed 100% (I'm actually a regular R user), but according to Eduardo's
needs (or my understanding of his needs), I think that might be an overkill
here. I have replicated and completed Eduardo's table in less than 30
minutes with OpenRefine, while using R would probably take a larger amount
of time just to get the data. But again, for more serious analytics, few
things (if any) beat R.

Cheers!

El mié 09/09/2015, 22:59, Mauro Cavalcanti <maurobio at gmail.com> escribió:

>
> Javier,
>
> The problem with these tools (LontraHarvest, OpenRefine, etc.) is that
> they are just data *retrieval* tools, not providing for data analytical and
> representation functionalities -- one or more different tools should be
> used after retrieving the data of interest for plotting maps, charts,
> tabulation, statistical analyses, etc.
> This is just where R excels, allowing to perform all these operations in a
> unified, straightforward workflow.
>
> Salud!
>
> 2015-09-09 17:23 GMT-03:00 Javier Otegui <javier.otegui at gmail.com>:
>
>> Hi Eduardo (et al.),
>>
>> If I understand correctly, the list at https://goo.gl/3wysaA shows the
>> resources with data from Brazil and you want to filter out those with
>> records other than Plants, am I right? Have you considered using OpenRefine
>> (http://openrefine.org/) for this task? OpenRefine has a service to
>> fetch URLs built based on data from other columns, which plays very well
>> with GBIF APIs. You can make the program dinamically build the API request
>> URL based on the dataset UUID, and fetch and parse the JSON response,
>> without having to download the data and without having to code almost
>> anything. The way I would go here is:
>>
>>    1. Create a column based off of the value in column A of your table,
>>    to extract just the dataset UUID
>>    2. Create a new column fetching the GBIF API, adding the value in the
>>    previous column to a template URL:
>>    http://api.gbif.org/v1/occurrence/search?TAXON_KEY=6&limit=1&DATASET_KEY=
>>    <value>. The "limit:1" part makes things faster by avoiding having to
>>    show the default 20 records in the column
>>    3. Create yet another column parsing the JSON result from the
>>    previous column, extracting just the value in the field "count". The result
>>    is the number of plant records in that dataset (therefore, resources such
>>    as FishBase will have a value of zero)
>>
>> Actually, you can add as many columns as you want, with as many API
>> calls, to fill the rest of the fields in your table. Using the "registry"
>> API, you can get the title, external data link and the protocol (IPT,
>> DiGIR...).
>>
>> Hope this helps. Let me know if you are interested in this approach and
>> need more help using OpenRefine.
>> Cheers!
>>
>> Javier Otegui
>> http://www.jotegui.com
>>
>> On Wed, Sep 9, 2015 at 8:07 PM, Mauro Cavalcanti <maurobio at gmail.com>
>> wrote:
>>
>>> Scott,
>>>
>>> That's my very point - that using R and rgbif should be the best path to
>>> take in this case, both because of the easier access to the GBIF API
>>> provided by rgbif and the HUGE data analytical capabilities of R itself. I
>>> had been working on a paper discussing this in the context of conservation
>>> databases (using R/rgbif and a Red-Listed group of mammals as an exemple),
>>> but unfortunately this work has been delayed by unexpected health problems.
>>> Hope it can be the light someday, however.
>>>
>>> Best regards,
>>> Em 09/09/2015 14:44, "Scott Chamberlain" <scott at ropensci.org> escreveu:
>>>
>>>> Note that the R client rgbif does interface with the GBIF download API
>>>> in addition to the search API - making it easier to deal with larger
>>>> datasets. This works even if you downloaded bulk data from the GBIF GUI.
>>>> Ignore this if you don't use R :)
>>>>
>>>> Best, S
>>>>
>>>> On Wed, Sep 9, 2015 at 10:35 AM Alex Thompson <godfoder at acis.ufl.edu>
>>>> wrote:
>>>>
>>>>> I'm kind of seconding Rod here.
>>>>>
>>>>> It might make more sense, depending on your use case and local
>>>>> computer resources, to just get a download of Plantae *AND* Brazil from
>>>>> GBIF periodically, then process that to exclude existing Brazilian
>>>>> datasets. You could then use something like Apache hadoop / spark to
>>>>> efficiently split the file by dataset or by institution code.
>>>>>
>>>>> This would greatly simplify your interactions with GBIF (down to just
>>>>> periodically generating a download programmatically) and you would have an
>>>>> easy place to insert any additional data transformations you want. This is
>>>>> the path i take for my work at least - the incremental cost of a couple
>>>>> million more records is worth the reduction in complexity overall.
>>>>>
>>>>>
>>>>> - Alex
>>>>>
>>>>>
>>>>> On 09/09/2015 12:16 PM, Eduardo Dalcin wrote:
>>>>>
>>>>> Hi Rod,
>>>>>
>>>>> The real purpose is to have a list of UUID and the "source web page"
>>>>> for the data set. Thus, one way to do it is to select those resources that
>>>>> counts <> 0 for PLANTAE *AND* Brazil.
>>>>>
>>>>> I don't want to do any stats analysis, but feed up one local
>>>>> harverster / agregator.
>>>>>
>>>>> The problem is, considering the reply from Jan Legind at Sep 3, we
>>>>> have to check one by one (https://goo.gl/3wysaA) to check if it is a
>>>>> Herbarium / Preserved Specimen (Plantae) or not, from the request
>>>>> http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>>>>> .
>>>>>
>>>>> Does it make sense?
>>>>>
>>>>> Thanks for your curiosity! :)
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Eduardo
>>>>>
>>>>>
>>>>> --------------------------------
>>>>> *Eduardo Dalcin
>>>>> <https://mailtrack.io/trace/link/5516ed5e4f903c6ee9bd9fb3876fb65ffffc687c?url=http%3A%2F%2Feduardo.dalc.in&signature=cda9e9bf584a828c>*
>>>>> Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
>>>>> e-mail: edalcin at jbrj.gov.br
>>>>> Trabalho / Work: +55 21 3204 2116
>>>>> --------------------------------
>>>>> *e-mail alternativo / * *alternate email:** edalcin at jbrj.org
>>>>> <edalcin at jbrj.org>*
>>>>> --------------------------------
>>>>> Agendar reunião / Schedule a meeting: http://agendar.dalc.in
>>>>> <https://mailtrack.io/trace/link/3a5eaa1df56016285886497766577e5357ddc6c1?url=http%3A%2F%2Fagendar.dalc.in&signature=c4e8d8113c34937f>
>>>>>
>>>>> On Mon, Sep 7, 2015 at 12:33 PM, Roderic Page <
>>>>> Roderic.Page at glasgow.ac.uk> wrote:
>>>>>
>>>>>> Hi Eduardo,
>>>>>>
>>>>>> I’m curious, is the purpose to get counts by dataset by country, or
>>>>>> to get all the plant occurrences for Brazil? The later can be obtained by
>>>>>> downloading all plant occurrences in Brazil
>>>>>> http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR (you
>>>>>> could then compute the per-dataset stats locally). I realise that this
>>>>>> isn’t as convenient as having GBIF slice the data for you in the API.
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Rod
>>>>>>
>>>>>> ---------------------------------------------------------
>>>>>> Roderic Page
>>>>>> Professor of Taxonomy
>>>>>> Institute of Biodiversity, Animal Health and Comparative Medicine
>>>>>> College of Medical, Veterinary and Life Sciences
>>>>>> Graham Kerr Building
>>>>>> University of Glasgow
>>>>>> Glasgow G12 8QQ, UK
>>>>>>
>>>>>> Email:  Roderic.Page at glasgow.ac.uk
>>>>>> Tel:  +44 141 330 4778 <%2B44%20141%20330%204778>
>>>>>> Skype:  rdmpage
>>>>>> Facebook:  http://www.facebook.com/rdmpage
>>>>>> LinkedIn:  http://uk.linkedin.com/in/rdmpage
>>>>>> Twitter:  http://twitter.com/rdmpage
>>>>>> Blog:  http://iphylo.blogspot.com
>>>>>> ORCID:  http://orcid.org/0000-0002-7101-9767
>>>>>> Citations:
>>>>>> http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
>>>>>> ResearchGate https://www.researchgate.net/profile/Roderic_Page
>>>>>>
>>>>>>
>>>>>> On 4 Sep 2015, at 10:39, Eduardo Dalcin <edalcin at jbrj.org> wrote:
>>>>>>
>>>>>> Hi Markus,
>>>>>>
>>>>>> Yes, that's a shame I can't have country and "nub" together. There is
>>>>>> any hope about it?
>>>>>>
>>>>>> Eduardo
>>>>>>
>>>>>>
>>>>>> --------------------------------
>>>>>> *Eduardo Dalcin
>>>>>> <https://mailtrack.io/trace/link/bac23864202354f3789938ce352a878faa0cd8b8?url=http%3A%2F%2Feduardo.dalc.in&signature=aea58ef6f439535b>*
>>>>>> Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
>>>>>> e-mail: edalcin at jbrj.gov.br
>>>>>> Trabalho / Work: +55 21 3204 2116
>>>>>> --------------------------------
>>>>>> *e-mail alternativo / * *alternate email:** edalcin at jbrj.org
>>>>>> <edalcin at jbrj.org>*
>>>>>> --------------------------------
>>>>>> Agendar reunião / Schedule a meeting: http://agendar.dalc.in
>>>>>> <https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5>
>>>>>>
>>>>>> On Thu, Sep 3, 2015 at 4:29 PM, Markus Döring <mdoering at gbif.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Eduardo,
>>>>>>>
>>>>>>> as you might have seen from my issue comment the webservice uses a
>>>>>>> different parameter name for taxonKey which is a bug we need to fix at some
>>>>>>> point.
>>>>>>> Please use nubKey for now to use the service like that:
>>>>>>>
>>>>>>> http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6
>>>>>>>
>>>>>>> The real problem for you will be that we do not support the
>>>>>>> combination of the country and the taxon filter, just one of the two. So
>>>>>>> you cannot search for plants in Brazil I am afraid, just for datasets about
>>>>>>> Brazil and datasets with plant records.
>>>>>>>
>>>>>>> Markus
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> > On 03 Sep 2015, at 14:12, Eduardo Dalcin <edalcin at jbrj.org> wrote:
>>>>>>> >
>>>>>>> > Thanks Jan. I'll keep exploring and I'll be in touch, if I need.
>>>>>>> >
>>>>>>> > Best,
>>>>>>> >
>>>>>>> > Eduardo
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > --------------------------------
>>>>>>> > Eduardo Dalcin
>>>>>>> > Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
>>>>>>> > e-mail: edalcin at jbrj.gov.br
>>>>>>> > Trabalho / Work: +55 21 3204 2116
>>>>>>> > --------------------------------
>>>>>>> > e-mail alternativo /  alternate email: edalcin at jbrj.org
>>>>>>> > --------------------------------
>>>>>>> > Agendar reunião / Schedule a meeting: http://agendar.dalc.in
>>>>>>> <https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5>
>>>>>>> >
>>>>>>> > On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF] <
>>>>>>> jlegind at gbif.org> wrote:
>>>>>>> > Dear Eduardo,
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > Thanks for getting in touch with us about these issues.
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > The first request
>>>>>>> http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>>>>>>> returns the number of records located in Brazil for the facets in the
>>>>>>> request.
>>>>>>> >
>>>>>>> > The second query
>>>>>>> http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>>>>>>> uses the Occurrence Inventories web service
>>>>>>> http://www.gbif.org/developer/occurrence#inventories which does not
>>>>>>> support the basis-of-record facet in the /datasets request. I understand
>>>>>>> that it would be better if the API response yielded an error message in
>>>>>>> this instance.
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > Concerning the other issues – you are indeed right that the counts
>>>>>>> do not make sense in the context of taxon key 6 which is Plantae. Actually
>>>>>>> the API does not handle the taxonKey search at all, contrary to what the
>>>>>>> documentation states:
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > /occurrence/counts/datasets
>>>>>>> >
>>>>>>> > GET
>>>>>>> >
>>>>>>> > Counts
>>>>>>> >
>>>>>>> > Lists occurrence counts for datasets that cover a given taxon or
>>>>>>> country.
>>>>>>> >
>>>>>>> > country, taxonKey
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > As you can see here,
>>>>>>> http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 , this
>>>>>>> request doesn’t return anything.
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > The GBIF developers will handle this issue in due time.
>>>>>>> >
>>>>>>> > You can follow the issue in our bug tracking service here:
>>>>>>> http://dev.gbif.org/issues/browse/POR-2828
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > With best regards,
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > Jan K. Legind
>>>>>>> >
>>>>>>> > Data manager, GBIF Secretariat
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > From: API-users [mailto:api-users-bounces at lists.gbif.org] On
>>>>>>> Behalf Of Eduardo Dalcin
>>>>>>> > Sent: 2. september 2015 20:06
>>>>>>> > To: api-users at lists.gbif.org; dev at gbif.org
>>>>>>> > Cc: João Monnerat Lanna; Natália Queiroz; Diogo Silva; Laura;
>>>>>>> Ricardo Avancini
>>>>>>> > Subject: [API-users] Some questions from a begginer
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > Hi folks,
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > This is my first message to the list. So, please, be nice :)
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > I'm working here at Rio de Janeiro Botanical Garden, together with
>>>>>>> the guys at the National Center for Flora Conservation. We are doing the
>>>>>>> risk assessment of the Brazilian flora to the government. We assess, so
>>>>>>> far, the risk of ca. 6.000 species, but we still have to assess ca. 35.000.
>>>>>>> Access occurrence records for Brazil is crucial, and every occurrence is
>>>>>>> important.
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > That means that we have to put together occurrence data from
>>>>>>> different sources and, after the first batch of the risk assessment, we
>>>>>>> realize that we need to build up our aggregator. We are planning to do this
>>>>>>> with the Lontra-harvester, with the help of the guys at Brazilian GBIF Node.
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > So, the one of the firsts steps was to list the available
>>>>>>> resources to understand the dimension of the task and, that brings me to my
>>>>>>> questions.
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > First:
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > The request:
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > returns 4.982.689 records
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > And the request:
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > returns (here) 7.406.310 records
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > Comments?
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > Second:
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > The request:
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > return things like this:
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > "197908d0-5565-11d8-b290-b8a03c50a862":27629
>>>>>>> >
>>>>>>> >
>>>>>>> > But the consult of the same dataset:
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5565-11d8-b290-b8a03c50a862
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > Returns "null" (of course, is a FishBase!)
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > I have plenty of examples like this, on yellow here (not
>>>>>>> finished!):
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYyj459VTc/edit?usp=sharing
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > Comments?
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > I think those two questions is a good start. Please, let me know
>>>>>>> if I'm doing something wrong.
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > Cheers,
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > Eduardo
>>>>>>> >
>>>>>>> > --------------------------------
>>>>>>> >
>>>>>>> > Eduardo Dalcin
>>>>>>> >
>>>>>>> > Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
>>>>>>> >
>>>>>>> > e-mail: edalcin at jbrj.gov.br
>>>>>>> >
>>>>>>> > Trabalho / Work: +55 21 3204 2116
>>>>>>> >
>>>>>>> > --------------------------------
>>>>>>> >
>>>>>>> > e-mail alternativo /  alternate email: edalcin at jbrj.org
>>>>>>> >
>>>>>>> > --------------------------------
>>>>>>> >
>>>>>>> > Agendar reunião / Schedule a meeting: http://agendar.dalc.in
>>>>>>> <https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5>
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> API-users mailing list
>>>>>> API-users at lists.gbif.org
>>>>>> http://lists.gbif.org/mailman/listinfo/api-users
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> API-users mailing listAPI-users at lists.gbif.orghttp://lists.gbif.org/mailman/listinfo/api-users
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> API-users mailing list
>>>>> API-users at lists.gbif.org
>>>>> http://lists.gbif.org/mailman/listinfo/api-users
>>>>>
>>>>
>>>> _______________________________________________
>>>> API-users mailing list
>>>> API-users at lists.gbif.org
>>>> http://lists.gbif.org/mailman/listinfo/api-users
>>>>
>>>>
>>> _______________________________________________
>>> API-users mailing list
>>> API-users at lists.gbif.org
>>> http://lists.gbif.org/mailman/listinfo/api-users
>>>
>>>
>>
>
>
> --
> Dr. Mauro J. Cavalcanti
> E-mail: maurobio at gmail.com
> Web: http://sites.google.com/site/maurobio
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gbif.org/pipermail/api-users/attachments/20150909/b4ca9a13/attachment-0001.html>


More information about the API-users mailing list