[API-users] birdhouse meets GBIF

Thu Jun 2 14:25:45 CEST 2016

Dear all

Here is an current problem to solve :-) for the birdhouse WPS and GBIF 
developers.
There is a species distribution process in the birdhouse WPS, processing 
Climate data based on occurrence coordinates from GBIF.

Currently the GBIF data are given to the process via an url of the GBIF 
csv (which had to be generated manually before).
I was implementing a direct occurrence search from birdhouse to GBIF 
with pygbif. Worked fine BUT: there is a limit of 300 records which is 
far to less to train the Species distribution model.

So here the question: Is there a way to request the species occurrence 
coordinates somehow directly (e.g. like a WPS request)?

(The previous conversation is following)

And some link for the background informations:

Birdhouse:
http://birdhouse.readthedocs.io/en/latest/
GBIF:
http://www.gbif.org/

Species distribution process:
http://flyingpigeon.readthedocs.io/en/latest/tutorials/sdm.html

Birdhouse architecture:
https://github.com/bird-house/birdhouse-docs/blob/master/slides/birdhouse-architecture/birdhouse-architecture.pdf

Merci
Nils

-------- Forwarded Message -------
Subject: 	Re: pygbif for occurrence coordinates
Date: 	Thu, 2 Jun 2016 09:00:49 -0300
From: 	Mauro Cavalcanti <maurobio at gmail.com>
To: 	Nils Hempelmann <info at nilshempelmann.de>
CC: 	wps-dev at lists.dkrz.de, Carsten Ehbrecht <ehbrecht at dkrz.de>, 
Wolfgang.Falk at lwf.bayern.de, Scott Chamberlain <myrmecocystus at gmail.com>

Nils,

No, there is no other way to get at once a larger number of records 
using the GBIF API (although one could be able to achieve this by 
sequentially querying the server using "batches" of 200 or 300 records 
each). As I said, there are good operational reasons for the GBIF 
developers to have imposed such limit.

As of your other question, it should better be put to the GBIF 
developers themselves (in the discussion list, so that we can all 
benefit from the answers! :-))

With warmest regards,

--
Dr. Mauro J. Cavalcanti
E-mail: maurobio at gmail.com <mailto:maurobio at gmail.com>
Web: http://sites.google.com/site/maurobio

Em 02/06/2016 08:49, "Nils Hempelmann" <info at nilshempelmann.de 
<mailto:info at nilshempelmann.de>> escreveu:

    Hi Mauro

    Oh ...
    No way to set it unlimited?
    The 'manual via browser' option and parsing the returning csv is the
    current status.
    or any alternative to pygbif?

    If I understood it correctly, GBIF database is organized as Web
    Server, so I gues there should be a way to connect to the birdhouse
    WPS, am I right?

    (put the web-dev list in copy)

    Merci
    Nils

    On 02/06/2016 12:40, Mauro Cavalcanti wrote:
>
>     Nils,
>
>     That's a limit imposed (for good operational reasons) by the GBIF
>     API. If you want a larger number of records, you'll have to
>     download them "manually" (that is, via browser) and then parse
>     locally the csv file returned from the GBIF server.
>
>     Hope this helps.
>
>     Best regards,
>
>     --
>     Dr. Mauro J. Cavalcanti
>     E-mail: maurobio at gmail.com <mailto:maurobio at gmail.com>
>     Web: http://sites.google.com/site/maurobio
>
>     Em 02/06/2016 04:48, "Nils Hempelmann" <info at nilshempelmann.de
>     <mailto:info at nilshempelmann.de>> escreveu:
>
>         Hi Scott
>
>         works fine. Thanks a lot. Just have a question for the search
>         limits:
>
>         The maximal records seems to be limited to 300. Is that on
>         purpose?
>         And requesting more than 200000 gives a 'bad request'
>
>         Merci
>         Nils
>
>
>         In [68]: len( occurrences.search(taxonKey=key,
>         limit=100)['results'])
>         Out[68]: 100
>
>         In [69]: len( occurrences.search(taxonKey=key,
>         limit=300)['results'])
>         Out[69]: 300
>
>         In [70]: len( occurrences.search(taxonKey=key,
>         limit=3000)['results'])
>         Out[70]: 300
>
>         In [71]: len( occurrences.search(taxonKey=key,
>         limit=200000)['results'])
>         Out[71]: 300
>
>         In [72]: len( occurrences.search(taxonKey=key,
>         limit=200001)['results'])
>         ---------------------------------------------------------------------------
>         HTTPError                                 Traceback (most
>         recent call last)
>         <ipython-input-72-2f7d7b4ccba0> in <module>()
>         ----> 1 len( occurrences.search(taxonKey=key,
>         limit=200001)['results'])
>
>         /home/nils/.conda/envs/birdhouse/lib/python2.7/site-packages/pygbif/occurrences/search.pyc
>         in search(taxonKey, scientificName, country,
>         publishingCountry, hasCoordinate, typeStatus, recordNumber,
>         lastInterpreted, continent, geometry, recordedBy,
>         basisOfRecord, datasetKey, eventDate, catalogNumber, year,
>         month, decimalLatitude, decimalLongitude, elevation, depth,
>         institutionCode, collectionCode, hasGeospatialIssue, issue, q,
>         mediatype, limit, offset, **kwargs)
>             251         'collectionCode': collectionCode,
>         'hasGeospatialIssue': hasGeospatialIssue,
>             252         'issue': issue, 'q': q, 'mediatype':
>         mediatype, 'limit': limit,
>         --> 253         'offset': offset}, **kwargs)
>             254     return out
>
>         /home/nils/.conda/envs/birdhouse/lib/python2.7/site-packages/pygbif/gbifutils.pyc
>         in gbif_GET(url, args, **kwargs)
>              17 def gbif_GET(url, args, **kwargs):
>              18   out = requests.get(url, params=args,
>         headers=make_ua(), **kwargs)
>         ---> 19   out.raise_for_status()
>              20   stopifnot(out.headers['content-type'])
>              21   return out.json()
>
>         /home/nils/.conda/envs/birdhouse/lib/python2.7/site-packages/requests/models.pyc
>         in raise_for_status(self)
>             842
>             843         if http_error_msg:
>         --> 844             raise HTTPError(http_error_msg, response=self)
>             845
>             846     def close(self):
>
>         HTTPError: 400 Client Error: Bad Request for url:
>         http://api.gbif.org/v1/occurrence/search?taxonKey=2882316&limit=200001&offset=0
>
>         In [73]:
>
>
>         On 02/06/2016 02:22, Scott Chamberlain wrote:
>>         Fixes to docs and fix to download_get are up on Github now.
>>         Will push new version to pypi soon. Let me know if some thing
>>         still don't work for you after reinstalling from github
>>
>>         S
>>
>>         On Wed, Jun 1, 2016 at 4:51 PM Nils Hempelmann
>>         <info at nilshempelmann.de <mailto:info at nilshempelmann.de>> wrote:
>>
>>             Hi Scott and Mauro
>>
>>             Mauro sended me a snipped of code which worked fine:
>>             https://github.com/bird-house/flyingpigeon/blob/develop/scripts/pygbif_occurence.py
>>
>>             I found the example here:
>>             https://github.com/maurobio/pygbif#occurrence-data
>>
>>             and here:
>>             https://github.com/sckott/pygbif#occurrences-module
>>
>>             Thanks a lot very great. That enables a lot ;-)
>>             I ll keep you posted
>>
>>             merci
>>             Nils
>>
>>
>>
>>
>>
>>             On 02/06/2016 01:45, Scott Chamberlain wrote:
>>>             And where are those example from exactly? I don't see
>>>             those examples searching the repo (which includes all
>>>             docs).
>>>
>>>             `pygbif.name_suggest` wouldn't work because
>>>             `name_suggest` is a method in the `species` module, so
>>>             you'd have to do pygbif.species.name_suggest, or `from
>>>             pygbif import species`, then `species.name_suggest`
>>>
>>>             Looks like occ.get(taxonKey = 252408386)  is a
>>>             documentation bug, that should be `key` instead of
>>>             `taxonKey`, a copy paste error. will fix that.
>>>
>>>             The `occ.download_get` call has a small bug, will fix that
>>>
>>>             All other calls work for me
>>>
>>>             S
>>>
>>>             On Wed, Jun 1, 2016 at 4:34 PM Scott Chamberlain
>>>             <myrmecocystus at gmail.com
>>>             <mailto:myrmecocystus at gmail.com>> wrote:
>>>
>>>                 Hi,
>>>
>>>                 What version of pygbif are you on? And what version
>>>                 of Python>
>>>
>>>                 Best, Scott
>>>
>>>                 On Wed, Jun 1, 2016 at 4:02 PM Nils Hempelmann
>>>                 <info at nilshempelmann.de
>>>                 <mailto:info at nilshempelmann.de>> wrote:
>>>
>>>                     Hi Mauro and Scott
>>>
>>>                     Was checking out pygbif. seems to be a very
>>>                     useful tool.
>>>
>>>                     Can you help me with the syntax (or forward me
>>>                     to the appropriate person
>>>                     ;-) )
>>>                     The given snippets of code are outdated.
>>>
>>>                     I am basically just looking for the occurrence
>>>                     coordinates:
>>>                     here is my first try :
>>>
>>>                     import pygbif
>>>                     occ =
>>>                     pygbif.occurrences.search(scientificName='Fagus
>>>                     sylvatica')
>>>                     occ['count']
>>>
>>>                     ... and further? ;-)
>>>
>>>                     the examples in the docu are throwing errors:
>>>
>>>                     key= pygbif.name_suggest(q='Helianthus
>>>                     annuus',rank='species')['key']
>>>                     pygbif.search(taxonKey=key[0]['key'],limit=2)
>>>
>>>                     from pygbif import occurrences as occ
>>>                     occ.search(taxonKey = 3329049)
>>>                     occ.get(taxonKey = 252408386)
>>>                     occ.count(isGeoreferenced = True)
>>>                     occ.download_list(user  = "sckott",  limit  =  5)
>>>                     occ.download_meta(key  = "0000099-140929101555934")
>>>                     occ.download_get("0000099-140929101555934")
>>>
>>>
>>>                     Thanks
>>>                     Nils
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gbif.org/pipermail/api-users/attachments/20160602/777ed9c2/attachment-0001.html>