[API-users] birdhouse meets GBIF

Scott Chamberlain scott at ropensci.org
Thu Jun 2 16:17:45 CEST 2016


Nils,

We do have some download API functions in pygbif, and there's a pull
request now to add the methods to request downloads. We'll add some
functionality to read darwin core dumps as well.

Scott

On Thu, Jun 2, 2016 at 6:06 AM Tim Robertson <trobertson at gbif.org> wrote:

> Hi Nils,
>
> It’s documented here:  http://www.gbif.org/developer/occurrence#download
> You POST a JSON doc with the query using HTTP basic authentication.
>
> If you need any help, please say and we can provide some examples using
> CURL or python.
>
> I am actually working on a revision to the mapping API, and will see what
> may be possible returning distinct locations.  It’s tricky to do in real
> time though.
> What kind of accuracy do you need please?
>
> Thanks,
> Tim
>
> From: Nils Hempelmann <info at nilshempelmann.de>
> Date: Thursday 2 June 2016 at 15:00
> To: Tim Robertson <trobertson at gbif.org>, "api-users at lists.gbif.org" <
> api-users at lists.gbif.org>, "wps-dev at lists.dkrz.de" <wps-dev at lists.dkrz.de>
> Subject: Re: [API-users] birdhouse meets GBIF
>
> Hi Tim
>
> Thanks for the quick answer. If not OGC service, how is the download API
> callable from outside of GBIF?
>
> Merci
> Nils
>
>
> On 02/06/2016 14:45, Tim Robertson wrote:
>
> Hi Nils
>
> We don’t have any OGC services, but there is an asynchronous download API
> which can deliver CSVs.
> Off the top of my head, the only way you can automate this at the moment
> would be to do periodically issue a download (e.g. Daily) process as you
> see fit, and cache the result for your app.
>
> In the download API you can get any size from 1 - > 660 million records,
> which is why it is asynchronous.  It’s used a lot by various applications
> and communities.
>
> I hope this helps,
> Tim
>
>
> From: API-users < <api-users-bounces at lists.gbif.org>
> api-users-bounces at lists.gbif.org> on behalf of Nils Hempelmann <
> info at nilshempelmann.de>
> Date: Thursday 2 June 2016 at 14:25
> To: " <api-users at lists.gbif.org>api-users at lists.gbif.org" <
> api-users at lists.gbif.org>, "wps-dev at lists.dkrz.de" <wps-dev at lists.dkrz.de>
> Subject: [API-users] birdhouse meets GBIF
>
> Dear all
>
> Here is an current problem to solve :-) for the birdhouse WPS and GBIF
> developers.
> There is a species distribution process in the birdhouse WPS, processing
> Climate data based on occurrence coordinates from GBIF.
>
> Currently the GBIF data are given to the process via an url of the GBIF
> csv (which had to be generated manually before).
> I was implementing a direct occurrence search from birdhouse to GBIF with
> pygbif. Worked fine BUT: there is a limit of 300 records which is far to
> less to train the Species distribution model.
>
> So here the question: Is there a way to request the species occurrence
> coordinates somehow directly (e.g. like a WPS request)?
>
> (The previous conversation is following)
>
> And some link for the background informations:
>
> Birdhouse:
> http://birdhouse.readthedocs.io/en/latest/
> GBIF:
> http://www.gbif.org/
>
> Species distribution process:
> http://flyingpigeon.readthedocs.io/en/latest/tutorials/sdm.html
>
> Birdhouse architecture:
>
> https://github.com/bird-house/birdhouse-docs/blob/master/slides/birdhouse-architecture/birdhouse-architecture.pdf
>
> Merci
> Nils
>
> -------- Forwarded Message -------
> Subject: Re: pygbif for occurrence coordinates
> Date: Thu, 2 Jun 2016 09:00:49 -0300
> From: Mauro Cavalcanti <maurobio at gmail.com> <maurobio at gmail.com>
> To: Nils Hempelmann <info at nilshempelmann.de> <info at nilshempelmann.de>
> CC: wps-dev at lists.dkrz.de, Carsten Ehbrecht <ehbrecht at dkrz.de>
> <ehbrecht at dkrz.de>, Wolfgang.Falk at lwf.bayern.de, Scott Chamberlain
> <myrmecocystus at gmail.com> <myrmecocystus at gmail.com>
>
> Nils,
>
> No, there is no other way to get at once a larger number of records using
> the GBIF API (although one could be able to achieve this by sequentially
> querying the server using "batches" of 200 or 300 records each). As I said,
> there are good operational reasons for the GBIF developers to have imposed
> such limit.
>
> As of your other question, it should better be put to the GBIF developers
> themselves (in the discussion list, so that we can all benefit from the
> answers! :-))
>
> With warmest regards,
>
> --
> Dr. Mauro J. Cavalcanti
> E-mail: maurobio at gmail.com
> Web: http://sites.google.com/site/maurobio
> Em 02/06/2016 08:49, "Nils Hempelmann" <info at nilshempelmann.de> escreveu:
>
>> Hi Mauro
>>
>> Oh ...
>> No way to set it unlimited?
>> The 'manual via browser' option and parsing the returning csv is the
>> current status.
>> or any alternative to pygbif?
>>
>> If I understood it correctly, GBIF database is organized as Web Server,
>> so I gues there should be a way to connect to the birdhouse WPS, am I
>> right?
>>
>> (put the web-dev list in copy)
>>
>> Merci
>> Nils
>>
>>
>> On 02/06/2016 12:40, Mauro Cavalcanti wrote:
>>
>> Nils,
>>
>> That's a limit imposed (for good operational reasons) by the GBIF API. If
>> you want a larger number of records, you'll have to download them
>> "manually" (that is, via browser) and then parse locally the csv file
>> returned from the GBIF server.
>>
>> Hope this helps.
>>
>> Best regards,
>>
>> --
>> Dr. Mauro J. Cavalcanti
>> E-mail: maurobio at gmail.com
>> Web: http://sites.google.com/site/maurobio
>> Em 02/06/2016 04:48, "Nils Hempelmann" <info at nilshempelmann.de> escreveu:
>>
>>> Hi Scott
>>>
>>> works fine. Thanks a lot. Just have a question for the search limits:
>>>
>>> The maximal records seems to be limited to 300. Is that on purpose?
>>> And requesting more than 200000 gives a 'bad request'
>>>
>>> Merci
>>> Nils
>>>
>>>
>>> In [68]: len( occurrences.search(taxonKey=key, limit=100)['results'])
>>> Out[68]: 100
>>>
>>> In [69]: len( occurrences.search(taxonKey=key, limit=300)['results'])
>>> Out[69]: 300
>>>
>>> In [70]: len( occurrences.search(taxonKey=key, limit=3000)['results'])
>>> Out[70]: 300
>>>
>>> In [71]: len( occurrences.search(taxonKey=key, limit=200000)['results'])
>>> Out[71]: 300
>>>
>>> In [72]: len( occurrences.search(taxonKey=key, limit=200001)['results'])
>>>
>>> ---------------------------------------------------------------------------
>>> HTTPError                                 Traceback (most recent call
>>> last)
>>> <ipython-input-72-2f7d7b4ccba0> in <module>()
>>> ----> 1 len( occurrences.search(taxonKey=key, limit=200001)['results'])
>>>
>>> /home/nils/.conda/envs/birdhouse/lib/python2.7/site-packages/pygbif/occurrences/search.pyc
>>> in search(taxonKey, scientificName, country, publishingCountry,
>>> hasCoordinate, typeStatus, recordNumber, lastInterpreted, continent,
>>> geometry, recordedBy, basisOfRecord, datasetKey, eventDate, catalogNumber,
>>> year, month, decimalLatitude, decimalLongitude, elevation, depth,
>>> institutionCode, collectionCode, hasGeospatialIssue, issue, q, mediatype,
>>> limit, offset, **kwargs)
>>>     251         'collectionCode': collectionCode, 'hasGeospatialIssue':
>>> hasGeospatialIssue,
>>>     252         'issue': issue, 'q': q, 'mediatype': mediatype, 'limit':
>>> limit,
>>> --> 253         'offset': offset}, **kwargs)
>>>     254     return out
>>>
>>> /home/nils/.conda/envs/birdhouse/lib/python2.7/site-packages/pygbif/gbifutils.pyc
>>> in gbif_GET(url, args, **kwargs)
>>>      17 def gbif_GET(url, args, **kwargs):
>>>      18   out = requests.get(url, params=args, headers=make_ua(),
>>> **kwargs)
>>> ---> 19   out.raise_for_status()
>>>      20   stopifnot(out.headers['content-type'])
>>>      21   return out.json()
>>>
>>> /home/nils/.conda/envs/birdhouse/lib/python2.7/site-packages/requests/models.pyc
>>> in raise_for_status(self)
>>>     842
>>>     843         if http_error_msg:
>>> --> 844             raise HTTPError(http_error_msg, response=self)
>>>     845
>>>     846     def close(self):
>>>
>>> HTTPError: 400 Client Error: Bad Request for url:
>>> <http://api.gbif.org/v1/occurrence/search?taxonKey=2882316&limit=200001&offset=0>
>>> http://api.gbif.org/v1/occurrence/search?taxonKey=2882316&limit=200001&offset=0
>>>
>>> In [73]:
>>>
>>>
>>> On 02/06/2016 02:22, Scott Chamberlain wrote:
>>>
>>> Fixes to docs and fix to download_get are up on Github now. Will push
>>> new version to pypi soon. Let me know if some thing still don't work for
>>> you after reinstalling from github
>>>
>>> S
>>>
>>> On Wed, Jun 1, 2016 at 4:51 PM Nils Hempelmann <
>>> <info at nilshempelmann.de>info at nilshempelmann.de> wrote:
>>>
>>>> Hi Scott and Mauro
>>>>
>>>> Mauro sended me a snipped of code which worked fine:
>>>>
>>>> https://github.com/bird-house/flyingpigeon/blob/develop/scripts/pygbif_occurence.py
>>>>
>>>> I found the example here:
>>>> https://github.com/maurobio/pygbif#occurrence-data
>>>>
>>>> and here:
>>>> https://github.com/sckott/pygbif#occurrences-module
>>>>
>>>> Thanks a lot very great. That enables a lot ;-)
>>>> I ll keep you posted
>>>>
>>>> merci
>>>> Nils
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 02/06/2016 01:45, Scott Chamberlain wrote:
>>>>
>>>> And where are those example from exactly? I don't see those examples
>>>> searching the repo (which includes all docs).
>>>>
>>>> `pygbif.name_suggest` wouldn't work because `name_suggest` is a method
>>>> in the `species` module, so you'd have to do pygbif.species.name_suggest,
>>>> or `from pygbif import species`, then `species.name_suggest`
>>>>
>>>> Looks like occ.get(taxonKey = 252408386)  is a documentation bug, that
>>>> should be `key` instead of `taxonKey`, a copy paste error. will fix that.
>>>>
>>>> The `occ.download_get` call has a small bug, will fix that
>>>>
>>>> All other calls work for me
>>>>
>>>> S
>>>>
>>>> On Wed, Jun 1, 2016 at 4:34 PM Scott Chamberlain <
>>>> <myrmecocystus at gmail.com>myrmecocystus at gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> What version of pygbif are you on? And what version of Python>
>>>>>
>>>>> Best, Scott
>>>>>
>>>>> On Wed, Jun 1, 2016 at 4:02 PM Nils Hempelmann <
>>>>> <info at nilshempelmann.de>info at nilshempelmann.de> wrote:
>>>>>
>>>>>> Hi Mauro and Scott
>>>>>>
>>>>>> Was checking out pygbif. seems to be a very useful tool.
>>>>>>
>>>>>> Can you help me with the syntax (or forward me to the appropriate
>>>>>> person
>>>>>> ;-) )
>>>>>> The given snippets of code are outdated.
>>>>>>
>>>>>> I am basically just looking for the occurrence coordinates:
>>>>>> here is my first try :
>>>>>>
>>>>>> import pygbif
>>>>>> occ = pygbif.occurrences.search(scientificName='Fagus sylvatica')
>>>>>> occ['count']
>>>>>>
>>>>>> ... and further? ;-)
>>>>>>
>>>>>> the examples in the docu are throwing errors:
>>>>>>
>>>>>> key=  pygbif.name_suggest(q='Helianthus annuus',rank='species')['key']
>>>>>> pygbif.search(taxonKey=key[0]['key'],limit=2)
>>>>>>
>>>>>> from pygbif import occurrences as occ occ.search(taxonKey = 3329049)
>>>>>> occ.get(taxonKey = 252408386) occ.count(isGeoreferenced = True)
>>>>>> occ.download_list(user  =  "sckott",  limit  =  5)
>>>>>> occ.download_meta(key  =  "0000099-140929101555934")
>>>>>> occ.download_get("0000099-140929101555934")
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Nils
>>>>>>
>>>>>>
>>>>
>>>
>>
>
>
> _______________________________________________
> API-users mailing list
> API-users at lists.gbif.org
> http://lists.gbif.org/mailman/listinfo/api-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gbif.org/pipermail/api-users/attachments/20160602/a03b7f83/attachment-0001.html>


More information about the API-users mailing list