[API-users] birdhouse meets GBIF

Nils Hempelmann info at nilshempelmann.de
Thu Jun 2 15:00:03 CEST 2016


Hi Tim

Thanks for the quick answer. If not OGC service, how is the download API 
callable from outside of GBIF?

Merci
Nils


On 02/06/2016 14:45, Tim Robertson wrote:
> Hi Nils
>
> We don’t have any OGC services, but there is an asynchronous download 
> API which can deliver CSVs.
> Off the top of my head, the only way you can automate this at the 
> moment would be to do periodically issue a download (e.g. Daily) 
> process as you see fit, and cache the result for your app.
>
> In the download API you can get any size from 1 - > 660 million 
> records, which is why it is asynchronous.  It’s used a lot by various 
> applications and communities.
>
> I hope this helps,
> Tim
>
>
> From: API-users <api-users-bounces at lists.gbif.org 
> <mailto:api-users-bounces at lists.gbif.org>> on behalf of Nils 
> Hempelmann <info at nilshempelmann.de <mailto:info at nilshempelmann.de>>
> Date: Thursday 2 June 2016 at 14:25
> To: "api-users at lists.gbif.org <mailto:api-users at lists.gbif.org>" 
> <api-users at lists.gbif.org <mailto:api-users at lists.gbif.org>>, 
> "wps-dev at lists.dkrz.de <mailto:wps-dev at lists.dkrz.de>" 
> <wps-dev at lists.dkrz.de <mailto:wps-dev at lists.dkrz.de>>
> Subject: [API-users] birdhouse meets GBIF
>
> Dear all
>
> Here is an current problem to solve :-) for the birdhouse WPS and GBIF 
> developers.
> There is a species distribution process in the birdhouse WPS, 
> processing Climate data based on occurrence coordinates from GBIF.
>
> Currently the GBIF data are given to the process via an url of the 
> GBIF csv (which had to be generated manually before).
> I was implementing a direct occurrence search from birdhouse to GBIF 
> with pygbif. Worked fine BUT: there is a limit of 300 records which is 
> far to less to train the Species distribution model.
>
> So here the question: Is there a way to request the species occurrence 
> coordinates somehow directly (e.g. like a WPS request)?
>
> (The previous conversation is following)
>
> And some link for the background informations:
>
> Birdhouse:
> http://birdhouse.readthedocs.io/en/latest/
> GBIF:
> http://www.gbif.org/
>
> Species distribution process:
> http://flyingpigeon.readthedocs.io/en/latest/tutorials/sdm.html
>
> Birdhouse architecture:
> https://github.com/bird-house/birdhouse-docs/blob/master/slides/birdhouse-architecture/birdhouse-architecture.pdf
>
> Merci
> Nils
>
> -------- Forwarded Message -------
> Subject: 	Re: pygbif for occurrence coordinates
> Date: 	Thu, 2 Jun 2016 09:00:49 -0300
> From: 	Mauro Cavalcanti <maurobio at gmail.com>
> To: 	Nils Hempelmann <info at nilshempelmann.de>
> CC: 	wps-dev at lists.dkrz.de, Carsten Ehbrecht <ehbrecht at dkrz.de>, 
> Wolfgang.Falk at lwf.bayern.de, Scott Chamberlain <myrmecocystus at gmail.com>
>
>
>
> Nils,
>
> No, there is no other way to get at once a larger number of records 
> using the GBIF API (although one could be able to achieve this by 
> sequentially querying the server using "batches" of 200 or 300 records 
> each). As I said, there are good operational reasons for the GBIF 
> developers to have imposed such limit.
>
> As of your other question, it should better be put to the GBIF 
> developers themselves (in the discussion list, so that we can all 
> benefit from the answers! :-))
>
> With warmest regards,
>
> --
> Dr. Mauro J. Cavalcanti
> E-mail: maurobio at gmail.com <mailto:maurobio at gmail.com>
> Web: http://sites.google.com/site/maurobio
>
> Em 02/06/2016 08:49, "Nils Hempelmann" <info at nilshempelmann.de 
> <mailto:info at nilshempelmann.de>> escreveu:
>
>     Hi Mauro
>
>     Oh ...
>     No way to set it unlimited?
>     The 'manual via browser' option and parsing the returning csv is
>     the current status.
>     or any alternative to pygbif?
>
>     If I understood it correctly, GBIF database is organized as Web
>     Server, so I gues there should be a way to connect to the
>     birdhouse WPS, am I right?
>
>     (put the web-dev list in copy)
>
>     Merci
>     Nils
>
>
>     On 02/06/2016 12:40, Mauro Cavalcanti wrote:
>>
>>     Nils,
>>
>>     That's a limit imposed (for good operational reasons) by the GBIF
>>     API. If you want a larger number of records, you'll have to
>>     download them "manually" (that is, via browser) and then parse
>>     locally the csv file returned from the GBIF server.
>>
>>     Hope this helps.
>>
>>     Best regards,
>>
>>     --
>>     Dr. Mauro J. Cavalcanti
>>     E-mail: maurobio at gmail.com <mailto:maurobio at gmail.com>
>>     Web: http://sites.google.com/site/maurobio
>>     <http://sites.google.com/site/maurobio>
>>
>>     Em 02/06/2016 04:48, "Nils Hempelmann" <info at nilshempelmann.de
>>     <mailto:info at nilshempelmann.de>> escreveu:
>>
>>         Hi Scott
>>
>>         works fine. Thanks a lot. Just have a question for the search
>>         limits:
>>
>>         The maximal records seems to be limited to 300. Is that on
>>         purpose?
>>         And requesting more than 200000 gives a 'bad request'
>>
>>         Merci
>>         Nils
>>
>>
>>         In [68]: len( occurrences.search(taxonKey=key,
>>         limit=100)['results'])
>>         Out[68]: 100
>>
>>         In [69]: len( occurrences.search(taxonKey=key,
>>         limit=300)['results'])
>>         Out[69]: 300
>>
>>         In [70]: len( occurrences.search(taxonKey=key,
>>         limit=3000)['results'])
>>         Out[70]: 300
>>
>>         In [71]: len( occurrences.search(taxonKey=key,
>>         limit=200000)['results'])
>>         Out[71]: 300
>>
>>         In [72]: len( occurrences.search(taxonKey=key,
>>         limit=200001)['results'])
>>         ---------------------------------------------------------------------------
>>         HTTPError Traceback (most recent call last)
>>         <ipython-input-72-2f7d7b4ccba0> in <module>()
>>         ----> 1 len( occurrences.search(taxonKey=key,
>>         limit=200001)['results'])
>>
>>         /home/nils/.conda/envs/birdhouse/lib/python2.7/site-packages/pygbif/occurrences/search.pyc
>>         in search(taxonKey, scientificName, country,
>>         publishingCountry, hasCoordinate, typeStatus, recordNumber,
>>         lastInterpreted, continent, geometry, recordedBy,
>>         basisOfRecord, datasetKey, eventDate, catalogNumber, year,
>>         month, decimalLatitude, decimalLongitude, elevation, depth,
>>         institutionCode, collectionCode, hasGeospatialIssue, issue,
>>         q, mediatype, limit, offset, **kwargs)
>>             251         'collectionCode': collectionCode,
>>         'hasGeospatialIssue': hasGeospatialIssue,
>>             252         'issue': issue, 'q': q, 'mediatype':
>>         mediatype, 'limit': limit,
>>         --> 253         'offset': offset}, **kwargs)
>>             254     return out
>>
>>         /home/nils/.conda/envs/birdhouse/lib/python2.7/site-packages/pygbif/gbifutils.pyc
>>         in gbif_GET(url, args, **kwargs)
>>              17 def gbif_GET(url, args, **kwargs):
>>              18   out = requests.get(url, params=args,
>>         headers=make_ua(), **kwargs)
>>         ---> 19   out.raise_for_status()
>>              20 stopifnot(out.headers['content-type'])
>>              21   return out.json()
>>
>>         /home/nils/.conda/envs/birdhouse/lib/python2.7/site-packages/requests/models.pyc
>>         in raise_for_status(self)
>>             842
>>             843         if http_error_msg:
>>         --> 844             raise HTTPError(http_error_msg,
>>         response=self)
>>             845
>>             846     def close(self):
>>
>>         HTTPError: 400 Client Error: Bad Request for url:
>>         <http://api.gbif.org/v1/occurrence/search?taxonKey=2882316&limit=200001&offset=0>http://api.gbif.org/v1/occurrence/search?taxonKey=2882316&limit=200001&offset=0
>>
>>         In [73]:
>>
>>
>>         On 02/06/2016 02:22, Scott Chamberlain wrote:
>>>         Fixes to docs and fix to download_get are up on Github now.
>>>         Will push new version to pypi soon. Let me know if some
>>>         thing still don't work for you after reinstalling from github
>>>
>>>         S
>>>
>>>         On Wed, Jun 1, 2016 at 4:51 PM Nils Hempelmann
>>>         <info at nilshempelmann.de <mailto:info at nilshempelmann.de>> wrote:
>>>
>>>             Hi Scott and Mauro
>>>
>>>             Mauro sended me a snipped of code which worked fine:
>>>             https://github.com/bird-house/flyingpigeon/blob/develop/scripts/pygbif_occurence.py
>>>
>>>             I found the example here:
>>>             https://github.com/maurobio/pygbif#occurrence-data
>>>
>>>             and here:
>>>             https://github.com/sckott/pygbif#occurrences-module
>>>
>>>             Thanks a lot very great. That enables a lot ;-)
>>>             I ll keep you posted
>>>
>>>             merci
>>>             Nils
>>>
>>>
>>>
>>>
>>>
>>>             On 02/06/2016 01:45, Scott Chamberlain wrote:
>>>>             And where are those example from exactly? I don't see
>>>>             those examples searching the repo (which includes all
>>>>             docs).
>>>>
>>>>             `pygbif.name_suggest` wouldn't work because
>>>>             `name_suggest` is a method in the `species` module, so
>>>>             you'd have to do pygbif.species.name_suggest, or `from
>>>>             pygbif import species`, then `species.name_suggest`
>>>>
>>>>             Looks like occ.get(taxonKey = 252408386)  is a
>>>>             documentation bug, that should be `key` instead of
>>>>             `taxonKey`, a copy paste error. will fix that.
>>>>
>>>>             The `occ.download_get` call has a small bug, will fix that
>>>>
>>>>             All other calls work for me
>>>>
>>>>             S
>>>>
>>>>             On Wed, Jun 1, 2016 at 4:34 PM Scott Chamberlain
>>>>             <myrmecocystus at gmail.com> wrote:
>>>>
>>>>                 Hi,
>>>>
>>>>                 What version of pygbif are you on? And what version
>>>>                 of Python>
>>>>
>>>>                 Best, Scott
>>>>
>>>>                 On Wed, Jun 1, 2016 at 4:02 PM Nils Hempelmann
>>>>                 <info at nilshempelmann.de> wrote:
>>>>
>>>>                     Hi Mauro and Scott
>>>>
>>>>                     Was checking out pygbif. seems to be a very
>>>>                     useful tool.
>>>>
>>>>                     Can you help me with the syntax (or forward me
>>>>                     to the appropriate person
>>>>                     ;-) )
>>>>                     The given snippets of code are outdated.
>>>>
>>>>                     I am basically just looking for the occurrence
>>>>                     coordinates:
>>>>                     here is my first try :
>>>>
>>>>                     import pygbif
>>>>                     occ =
>>>>                     pygbif.occurrences.search(scientificName='Fagus
>>>>                     sylvatica')
>>>>                     occ['count']
>>>>
>>>>                     ... and further? ;-)
>>>>
>>>>                     the examples in the docu are throwing errors:
>>>>
>>>>                     key= pygbif.name_suggest(q='Helianthus
>>>>                     annuus',rank='species')['key']
>>>>                     pygbif.search(taxonKey=key[0]['key'],limit=2)
>>>>
>>>>                     from pygbif import occurrences as occ
>>>>                     occ.search(taxonKey = 3329049)
>>>>                     occ.get(taxonKey = 252408386)
>>>>                     occ.count(isGeoreferenced = True)
>>>>                     occ.download_list(user  = "sckott",  limit  =  5)
>>>>                     occ.download_meta(key  = "0000099-140929101555934")
>>>>                     occ.download_get("0000099-140929101555934")
>>>>
>>>>
>>>>                     Thanks
>>>>                     Nils
>>>>
>>>
>>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gbif.org/pipermail/api-users/attachments/20160602/bbd05783/attachment-0001.html>


More information about the API-users mailing list