[API-users] Some questions from a begginer

Eduardo Dalcin edalcin at jbrj.org
Mon Sep 14 18:25:59 CEST 2015


Good points Markus, Thanks!

However, other publishers are *very* online, like this example:

"The New York Botanical Garden Herbarium (NY) - Vascular Plant Collection" (
http://www.gbif.org/dataset/d415c253-4d61-4459-9d25-4015b9084fb0
<https://mailtrack.io/trace/link/0a54ebc017ec4ddde255d8f470cf1d5eb58d6ff1?url=http%3A%2F%2Fwww.gbif.org%2Fdataset%2Fd415c253-4d61-4459-9d25-4015b9084fb0&signature=9fbac047f6b2d815>)
and the "Herbarium of The New York Botanical Garden" (
http://www.gbif.org/dataset/7133ff0a-f762-11e1-a439-00145eb45e9a
<https://mailtrack.io/trace/link/c5595e540f23c50c332c5d3aba65d9b857daec6c?url=http%3A%2F%2Fwww.gbif.org%2Fdataset%2F7133ff0a-f762-11e1-a439-00145eb45e9a&signature=3cd4b1e2eec64e92>
).

Same stuff, twice.

The thing is that when we search for, for instance, "Belemia fucsioides" we
got a duplication of records of the same entity:


​
http://www.gbif.org/occurrence/216419815
<https://mailtrack.io/trace/link/ec43e42a6e6e903eea24db7611a53591ef91ecff?url=http%3A%2F%2Fwww.gbif.org%2Foccurrence%2F216419815&signature=48bd3b924b438606>
http://www.gbif.org/occurrence/1098393958
<https://mailtrack.io/trace/link/9e4d9ffa65cef4747df77c1c708df94d1da1b929?url=http%3A%2F%2Fwww.gbif.org%2Foccurrence%2F1098393958&signature=6d279f2c395d493a>

This is very annoying and give us a lot of work to clean up.

Cheers,

Eduardo




--------------------------------
*Eduardo Dalcin
<https://mailtrack.io/trace/link/12fd73de9c0d11461d2da7249c58967486d95ffb?url=http%3A%2F%2Feduardo.dalc.in&signature=b76aae61fa71c8a0>*
Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
e-mail: edalcin at jbrj.gov.br
Trabalho / Work: +55 21 3204 2116
--------------------------------
*e-mail alternativo / * *alternate email:** edalcin at jbrj.org
<edalcin at jbrj.org>*
--------------------------------
Agendar reunião / Schedule a meeting: http://agendar.dalc.in
<https://mailtrack.io/trace/link/8eb76452df5772642c41cbc47d035ab63fb88da6?url=http%3A%2F%2Fagendar.dalc.in&signature=db7d545fe68e0cb0>

On Thu, Sep 10, 2015 at 4:50 AM, Markus Döring <mdoering at gbif.org> wrote:

> Eduardo,
>
> another difference in using downloads periodically is that you get the
> interpreted data from us (together with the original if you want to).
> That already contains quite a bit of data cleaning and aligning to
> controlled vocabularies that might be painful to reproduce otherwise.
> Also publishers are *very* often offline. Especially for the long running
> xml harvesting protocols (biocase,tapir,digir) this can be a bit of a
> challenge to index them entirely.
>
> Markus
>
>
> On 09 Sep 2015, at 20:02, Eduardo Dalcin <edalcin at jbrj.org> wrote:
>
> Thanks Alex. Food for thought.
>
> Best,
>
> Eduardo
>
>
> --------------------------------
> *Eduardo Dalcin
> <https://mailtrack.io/trace/link/216b27d1db9fb6d2aa5a6d7b07aaf7a6b6b19ed9?url=http%3A%2F%2Feduardo.dalc.in&signature=fd3c844b73e3c400>*
> Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
> e-mail: edalcin at jbrj.gov.br
> Trabalho / Work: +55 21 3204 2116
> --------------------------------
> *e-mail alternativo / * *alternate email:** edalcin at jbrj.org
> <edalcin at jbrj.org>*
> --------------------------------
> Agendar reunião / Schedule a meeting: http://agendar.dalc.in
> <https://mailtrack.io/trace/link/dc51cf0b6ea2d4897c2ae7a70bab52163650de15?url=http%3A%2F%2Fagendar.dalc.in&signature=c87a5e78e01a7976>
>
> On Wed, Sep 9, 2015 at 2:28 PM, Alex Thompson <godfoder at acis.ufl.edu>
> wrote:
>
>> I'm kind of seconding Rod here.
>>
>> It might make more sense, depending on your use case and local computer
>> resources, to just get a download of Plantae *AND* Brazil from GBIF
>> periodically, then process that to exclude existing Brazilian datasets. You
>> could then use something like Apache hadoop / spark to efficiently split
>> the file by dataset or by institution code.
>>
>> This would greatly simplify your interactions with GBIF (down to just
>> periodically generating a download programmatically) and you would have an
>> easy place to insert any additional data transformations you want. This is
>> the path i take for my work at least - the incremental cost of a couple
>> million more records is worth the reduction in complexity overall.
>>
>> - Alex
>>
>>
>> On 09/09/2015 12:16 PM, Eduardo Dalcin wrote:
>>
>> Hi Rod,
>>
>> The real purpose is to have a list of UUID and the "source web page" for
>> the data set. Thus, one way to do it is to select those resources that
>> counts <> 0 for PLANTAE *AND* Brazil.
>>
>> I don't want to do any stats analysis, but feed up one local harverster /
>> agregator.
>>
>> The problem is, considering the reply from Jan Legind at Sep 3, we have
>> to check one by one (https://goo.gl/3wysaA) to check if it is a
>> Herbarium / Preserved Specimen (Plantae) or not, from the request
>> http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>> .
>>
>> Does it make sense?
>>
>> Thanks for your curiosity! :)
>>
>> Cheers,
>>
>> Eduardo
>>
>>
>> --------------------------------
>> *Eduardo Dalcin
>> <https://mailtrack.io/trace/link/5516ed5e4f903c6ee9bd9fb3876fb65ffffc687c?url=http%3A%2F%2Feduardo.dalc.in&signature=cda9e9bf584a828c>*
>> Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
>> e-mail: edalcin at jbrj.gov.br
>> Trabalho / Work: +55 21 3204 2116
>> --------------------------------
>> *e-mail alternativo / * *alternate email:** edalcin at jbrj.org
>> <edalcin at jbrj.org>*
>> --------------------------------
>> Agendar reunião / Schedule a meeting: http://agendar.dalc.in
>> <https://mailtrack.io/trace/link/3a5eaa1df56016285886497766577e5357ddc6c1?url=http%3A%2F%2Fagendar.dalc.in&signature=c4e8d8113c34937f>
>>
>> On Mon, Sep 7, 2015 at 12:33 PM, Roderic Page <Roderic.Page at glasgow.ac.uk
>> > wrote:
>>
>>> Hi Eduardo,
>>>
>>> I’m curious, is the purpose to get counts by dataset by country, or to
>>> get all the plant occurrences for Brazil? The later can be obtained by
>>> downloading all plant occurrences in Brazil
>>> http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR (you could
>>> then compute the per-dataset stats locally). I realise that this isn’t as
>>> convenient as having GBIF slice the data for you in the API.
>>>
>>> Regards
>>>
>>> Rod
>>>
>>> ---------------------------------------------------------
>>> Roderic Page
>>> Professor of Taxonomy
>>> Institute of Biodiversity, Animal Health and Comparative Medicine
>>> College of Medical, Veterinary and Life Sciences
>>> Graham Kerr Building
>>> University of Glasgow
>>> Glasgow G12 8QQ, UK
>>>
>>> Email:  Roderic.Page at glasgow.ac.uk
>>> Tel:  +44 141 330 4778 <%2B44%20141%20330%204778>
>>> Skype:  rdmpage
>>> Facebook:  http://www.facebook.com/rdmpage
>>> LinkedIn:  http://uk.linkedin.com/in/rdmpage
>>> Twitter:  http://twitter.com/rdmpage
>>> Blog:  http://iphylo.blogspot.com
>>> ORCID:  http://orcid.org/0000-0002-7101-9767
>>> Citations:
>>> http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
>>> ResearchGate https://www.researchgate.net/profile/Roderic_Page
>>>
>>>
>>> On 4 Sep 2015, at 10:39, Eduardo Dalcin <edalcin at jbrj.org> wrote:
>>>
>>> Hi Markus,
>>>
>>> Yes, that's a shame I can't have country and "nub" together. There is
>>> any hope about it?
>>>
>>> Eduardo
>>>
>>>
>>> --------------------------------
>>> *Eduardo Dalcin
>>> <https://mailtrack.io/trace/link/bac23864202354f3789938ce352a878faa0cd8b8?url=http%3A%2F%2Feduardo.dalc.in&signature=aea58ef6f439535b>*
>>> Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
>>> e-mail: edalcin at jbrj.gov.br
>>> Trabalho / Work: +55 21 3204 2116
>>> --------------------------------
>>> *e-mail alternativo / * *alternate email:** edalcin at jbrj.org
>>> <edalcin at jbrj.org>*
>>> --------------------------------
>>> Agendar reunião / Schedule a meeting: http://agendar.dalc.in
>>> <https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5>
>>>
>>> On Thu, Sep 3, 2015 at 4:29 PM, Markus Döring <mdoering at gbif.org> wrote:
>>>
>>>> Eduardo,
>>>>
>>>> as you might have seen from my issue comment the webservice uses a
>>>> different parameter name for taxonKey which is a bug we need to fix at some
>>>> point.
>>>> Please use nubKey for now to use the service like that:
>>>>
>>>> http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6
>>>>
>>>> The real problem for you will be that we do not support the combination
>>>> of the country and the taxon filter, just one of the two. So you cannot
>>>> search for plants in Brazil I am afraid, just for datasets about Brazil and
>>>> datasets with plant records.
>>>>
>>>> Markus
>>>>
>>>>
>>>>
>>>> > On 03 Sep 2015, at 14:12, Eduardo Dalcin <edalcin at jbrj.org> wrote:
>>>> >
>>>> > Thanks Jan. I'll keep exploring and I'll be in touch, if I need.
>>>> >
>>>> > Best,
>>>> >
>>>> > Eduardo
>>>> >
>>>> >
>>>> >
>>>> > --------------------------------
>>>> > Eduardo Dalcin
>>>> > Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
>>>> > e-mail: edalcin at jbrj.gov.br
>>>> > Trabalho / Work: +55 21 3204 2116
>>>> > --------------------------------
>>>> > e-mail alternativo /  alternate email: edalcin at jbrj.org
>>>> > --------------------------------
>>>> > Agendar reunião / Schedule a meeting: http://agendar.dalc.in
>>>> <https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5>
>>>> >
>>>> > On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF] <jlegind at gbif.org>
>>>> wrote:
>>>> > Dear Eduardo,
>>>> >
>>>> >
>>>> >
>>>> > Thanks for getting in touch with us about these issues.
>>>> >
>>>> >
>>>> >
>>>> > The first request
>>>> http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>>>> returns the number of records located in Brazil for the facets in the
>>>> request.
>>>> >
>>>> > The second query
>>>> http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>>>> uses the Occurrence Inventories web service
>>>> http://www.gbif.org/developer/occurrence#inventories which does not
>>>> support the basis-of-record facet in the /datasets request. I understand
>>>> that it would be better if the API response yielded an error message in
>>>> this instance.
>>>> >
>>>> >
>>>> >
>>>> > Concerning the other issues – you are indeed right that the counts do
>>>> not make sense in the context of taxon key 6 which is Plantae. Actually the
>>>> API does not handle the taxonKey search at all, contrary to what the
>>>> documentation states:
>>>> >
>>>> >
>>>> >
>>>> > /occurrence/counts/datasets
>>>> >
>>>> > GET
>>>> >
>>>> > Counts
>>>> >
>>>> > Lists occurrence counts for datasets that cover a given taxon or
>>>> country.
>>>> >
>>>> > country, taxonKey
>>>> >
>>>> >
>>>> >
>>>> > As you can see here,
>>>> http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 , this
>>>> request doesn’t return anything.
>>>> >
>>>> >
>>>> >
>>>> > The GBIF developers will handle this issue in due time.
>>>> >
>>>> > You can follow the issue in our bug tracking service here:
>>>> http://dev.gbif.org/issues/browse/POR-2828
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > With best regards,
>>>> >
>>>> >
>>>> >
>>>> > Jan K. Legind
>>>> >
>>>> > Data manager, GBIF Secretariat
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > From: API-users [mailto:api-users-bounces at lists.gbif.org] On Behalf
>>>> Of Eduardo Dalcin
>>>> > Sent: 2. september 2015 20:06
>>>> > To: api-users at lists.gbif.org; dev at gbif.org
>>>> > Cc: João Monnerat Lanna; Natália Queiroz; Diogo Silva; Laura; Ricardo
>>>> Avancini
>>>> > Subject: [API-users] Some questions from a begginer
>>>> >
>>>> >
>>>> >
>>>> > Hi folks,
>>>> >
>>>> >
>>>> >
>>>> > This is my first message to the list. So, please, be nice :)
>>>> >
>>>> >
>>>> >
>>>> > I'm working here at Rio de Janeiro Botanical Garden, together with
>>>> the guys at the National Center for Flora Conservation. We are doing the
>>>> risk assessment of the Brazilian flora to the government. We assess, so
>>>> far, the risk of ca. 6.000 species, but we still have to assess ca. 35.000.
>>>> Access occurrence records for Brazil is crucial, and every occurrence is
>>>> important.
>>>> >
>>>> >
>>>> >
>>>> > That means that we have to put together occurrence data from
>>>> different sources and, after the first batch of the risk assessment, we
>>>> realize that we need to build up our aggregator. We are planning to do this
>>>> with the Lontra-harvester, with the help of the guys at Brazilian GBIF Node.
>>>> >
>>>> >
>>>> >
>>>> > So, the one of the firsts steps was to list the available resources
>>>> to understand the dimension of the task and, that brings me to my questions.
>>>> >
>>>> >
>>>> >
>>>> > First:
>>>> >
>>>> >
>>>> >
>>>> > The request:
>>>> >
>>>> >
>>>> >
>>>> >
>>>> http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>>>> >
>>>> >
>>>> >
>>>> > returns 4.982.689 records
>>>> >
>>>> >
>>>> >
>>>> > And the request:
>>>> >
>>>> >
>>>> >
>>>> >
>>>> http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>>>> >
>>>> >
>>>> >
>>>> > returns (here) 7.406.310 records
>>>> >
>>>> >
>>>> >
>>>> > Comments?
>>>> >
>>>> >
>>>> >
>>>> > Second:
>>>> >
>>>> >
>>>> >
>>>> > The request:
>>>> >
>>>> >
>>>> >
>>>> >
>>>> http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>>>> >
>>>> >
>>>> >
>>>> > return things like this:
>>>> >
>>>> >
>>>> >
>>>> > "197908d0-5565-11d8-b290-b8a03c50a862":27629
>>>> >
>>>> >
>>>> > But the consult of the same dataset:
>>>> >
>>>> >
>>>> >
>>>> >
>>>> http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5565-11d8-b290-b8a03c50a862
>>>> >
>>>> >
>>>> >
>>>> > Returns "null" (of course, is a FishBase!)
>>>> >
>>>> >
>>>> >
>>>> > I have plenty of examples like this, on yellow here (not finished!):
>>>> >
>>>> >
>>>> >
>>>> >
>>>> https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYyj459VTc/edit?usp=sharing
>>>> >
>>>> >
>>>> >
>>>> > Comments?
>>>> >
>>>> >
>>>> >
>>>> > I think those two questions is a good start. Please, let me know if
>>>> I'm doing something wrong.
>>>> >
>>>> >
>>>> >
>>>> > Cheers,
>>>> >
>>>> >
>>>> >
>>>> > Eduardo
>>>> >
>>>> > --------------------------------
>>>> >
>>>> > Eduardo Dalcin
>>>> >
>>>> > Instituto de Pesquisas Jardim Botânico do Rio de Janeiro - JBRJ
>>>> >
>>>> > e-mail: edalcin at jbrj.gov.br
>>>> >
>>>> > Trabalho / Work: +55 21 3204 2116
>>>> >
>>>> > --------------------------------
>>>> >
>>>> > e-mail alternativo /  alternate email: edalcin at jbrj.org
>>>> >
>>>> > --------------------------------
>>>> >
>>>> > Agendar reunião / Schedule a meeting: http://agendar.dalc.in
>>>> <https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5>
>>>> >
>>>> >
>>>> >
>>>> >
>>>>
>>>>
>>> _______________________________________________
>>> API-users mailing list
>>> API-users at lists.gbif.org
>>> http://lists.gbif.org/mailman/listinfo/api-users
>>>
>>>
>>>
>>
>>
>> _______________________________________________
>> API-users mailing listAPI-users at lists.gbif.orghttp://lists.gbif.org/mailman/listinfo/api-users
>>
>>
>>
>> _______________________________________________
>> API-users mailing list
>> API-users at lists.gbif.org
>> http://lists.gbif.org/mailman/listinfo/api-users
>>
>>
> _______________________________________________
> API-users mailing list
> API-users at lists.gbif.org
> http://lists.gbif.org/mailman/listinfo/api-users
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gbif.org/pipermail/api-users/attachments/20150914/64560c76/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: FireShot Pro Screen Capture #076 - 'Occurrence Search Results' - www_gbif_org_occurrence_search_TAXON_KEY=5553637.png
Type: image/png
Size: 21327 bytes
Desc: not available
URL: <http://lists.gbif.org/pipermail/api-users/attachments/20150914/64560c76/attachment-0001.png>


More information about the API-users mailing list