[API-users] birdhouse meets GBIF

Scott Chamberlain scott at ropensci.org
Fri Jun 3 19:10:54 CEST 2016


Hi Nico,

Of course, I'd rather use something that works already than write my own
code. I'll let you know if I have any questions.

Best, Scott

On Fri, Jun 3, 2016 at 12:41 AM Nicolas Noé <n.noe at biodiversity.be> wrote:

> Hey Scott,
>
> About the functionality to read Darwin Core dumps: don't hesitate to
> interfacee with python-dwca-reader
> <https://github.com/BelgianBiodiversityPlatform/python-dwca-reader> to
> avoid reinventing the wheel. It's been written exactly for this purpose, is
> now rather mature and used by several group of users.
>
> I'm also always willing to put some effort in order to make its
> integration with projects like pygbif easier. Just fill a GitHub issue or
> send me a mail for any clarification/bug/missing feature.
>
> Best,
>
> Nico
>
> Le 2/06/16 16:17, Scott Chamberlain a écrit :
>
> Nils,
>
> We do have some download API functions in pygbif, and there's a pull
> request now to add the methods to request downloads. We'll add some
> functionality to read darwin core dumps as well.
>
> Scott
>
> On Thu, Jun 2, 2016 at 6:06 AM Tim Robertson <trobertson at gbif.org> wrote:
>
>> Hi Nils,
>>
>> It’s documented here:  http://www.gbif.org/developer/occurrence#download
>> You POST a JSON doc with the query using HTTP basic authentication.
>>
>> If you need any help, please say and we can provide some examples using
>> CURL or python.
>>
>> I am actually working on a revision to the mapping API, and will see what
>> may be possible returning distinct locations.  It’s tricky to do in real
>> time though.
>> What kind of accuracy do you need please?
>>
>> Thanks,
>> Tim
>>
>> From: Nils Hempelmann <info at nilshempelmann.de>
>> Date: Thursday 2 June 2016 at 15:00
>> To: Tim Robertson <trobertson at gbif.org>, "api-users at lists.gbif.org" <
>> api-users at lists.gbif.org>, "wps-dev at lists.dkrz.de" <wps-dev at lists.dkrz.de
>> >
>> Subject: Re: [API-users] birdhouse meets GBIF
>>
>> Hi Tim
>>
>> Thanks for the quick answer. If not OGC service, how is the download API
>> callable from outside of GBIF?
>>
>> Merci
>> Nils
>>
>>
>> On 02/06/2016 14:45, Tim Robertson wrote:
>>
>> Hi Nils
>>
>> We don’t have any OGC services, but there is an asynchronous download API
>> which can deliver CSVs.
>> Off the top of my head, the only way you can automate this at the moment
>> would be to do periodically issue a download (e.g. Daily) process as you
>> see fit, and cache the result for your app.
>>
>> In the download API you can get any size from 1 - > 660 million records,
>> which is why it is asynchronous.  It’s used a lot by various applications
>> and communities.
>>
>> I hope this helps,
>> Tim
>>
>>
>> From: API-users <api-users-bounces at lists.gbif.org> on behalf of Nils
>> Hempelmann <info at nilshempelmann.de>
>> Date: Thursday 2 June 2016 at 14:25
>> To: "api-users at lists.gbif.org" <api-users at lists.gbif.org>, "
>> wps-dev at lists.dkrz.de" <wps-dev at lists.dkrz.de>
>> Subject: [API-users] birdhouse meets GBIF
>>
>> Dear all
>>
>> Here is an current problem to solve :-) for the birdhouse WPS and GBIF
>> developers.
>> There is a species distribution process in the birdhouse WPS, processing
>> Climate data based on occurrence coordinates from GBIF.
>>
>> Currently the GBIF data are given to the process via an url of the GBIF
>> csv (which had to be generated manually before).
>> I was implementing a direct occurrence search from birdhouse to GBIF with
>> pygbif. Worked fine BUT: there is a limit of 300 records which is far to
>> less to train the Species distribution model.
>>
>> So here the question: Is there a way to request the species occurrence
>> coordinates somehow directly (e.g. like a WPS request)?
>>
>> (The previous conversation is following)
>>
>> And some link for the background informations:
>>
>> Birdhouse:
>> http://birdhouse.readthedocs.io/en/latest/
>> GBIF:
>> http://www.gbif.org/
>>
>> Species distribution process:
>> http://flyingpigeon.readthedocs.io/en/latest/tutorials/sdm.html
>>
>> Birdhouse architecture:
>>
>> https://github.com/bird-house/birdhouse-docs/blob/master/slides/birdhouse-architecture/birdhouse-architecture.pdf
>>
>> Merci
>> Nils
>>
>> -------- Forwarded Message -------
>> Subject: Re: pygbif for occurrence coordinates
>> Date: Thu, 2 Jun 2016 09:00:49 -0300
>> From: Mauro Cavalcanti <maurobio at gmail.com> <maurobio at gmail.com>
>> <maurobio at gmail.com>
>> To: Nils Hempelmann <info at nilshempelmann.de> <info at nilshempelmann.de>
>> <info at nilshempelmann.de>
>> CC: wps-dev at lists.dkrz.de, Carsten Ehbrecht <ehbrecht at dkrz.de>
>> <ehbrecht at dkrz.de>, Wolfgang.Falk at lwf.bayern.de, Scott Chamberlain
>> <myrmecocystus at gmail.com> <myrmecocystus at gmail.com>
>> <myrmecocystus at gmail.com>
>>
>> Nils,
>>
>> No, there is no other way to get at once a larger number of records using
>> the GBIF API (although one could be able to achieve this by sequentially
>> querying the server using "batches" of 200 or 300 records each). As I said,
>> there are good operational reasons for the GBIF developers to have imposed
>> such limit.
>>
>> As of your other question, it should better be put to the GBIF developers
>> themselves (in the discussion list, so that we can all benefit from the
>> answers! :-))
>>
>> With warmest regards,
>>
>> --
>> Dr. Mauro J. Cavalcanti
>> E-mail: maurobio at gmail.com
>> Web: http://sites.google.com/site/maurobio
>> Em 02/06/2016 08:49, "Nils Hempelmann" <info at nilshempelmann.de> escreveu:
>>
>>> Hi Mauro
>>>
>>> Oh ...
>>> No way to set it unlimited?
>>> The 'manual via browser' option and parsing the returning csv is the
>>> current status.
>>> or any alternative to pygbif?
>>>
>>> If I understood it correctly, GBIF database is organized as Web Server,
>>> so I gues there should be a way to connect to the birdhouse WPS, am I
>>> right?
>>>
>>> (put the web-dev list in copy)
>>>
>>> Merci
>>> Nils
>>>
>>>
>>> On 02/06/2016 12:40, Mauro Cavalcanti wrote:
>>>
>>> Nils,
>>>
>>> That's a limit imposed (for good operational reasons) by the GBIF API.
>>> If you want a larger number of records, you'll have to download them
>>> "manually" (that is, via browser) and then parse locally the csv file
>>> returned from the GBIF server.
>>>
>>> Hope this helps.
>>>
>>> Best regards,
>>>
>>> --
>>> Dr. Mauro J. Cavalcanti
>>> E-mail: maurobio at gmail.com
>>> Web: http://sites.google.com/site/maurobio
>>> Em 02/06/2016 04:48, "Nils Hempelmann" <info at nilshempelmann.de>
>>> escreveu:
>>>
>>>> Hi Scott
>>>>
>>>> works fine. Thanks a lot. Just have a question for the search limits:
>>>>
>>>> The maximal records seems to be limited to 300. Is that on purpose?
>>>> And requesting more than 200000 gives a 'bad request'
>>>>
>>>> Merci
>>>> Nils
>>>>
>>>>
>>>> In [68]: len( occurrences.search(taxonKey=key, limit=100)['results'])
>>>> Out[68]: 100
>>>>
>>>> In [69]: len( occurrences.search(taxonKey=key, limit=300)['results'])
>>>> Out[69]: 300
>>>>
>>>> In [70]: len( occurrences.search(taxonKey=key, limit=3000)['results'])
>>>> Out[70]: 300
>>>>
>>>> In [71]: len( occurrences.search(taxonKey=key, limit=200000)['results'])
>>>> Out[71]: 300
>>>>
>>>> In [72]: len( occurrences.search(taxonKey=key, limit=200001)['results'])
>>>>
>>>> ---------------------------------------------------------------------------
>>>> HTTPError                                 Traceback (most recent call
>>>> last)
>>>> <ipython-input-72-2f7d7b4ccba0> in <module>()
>>>> ----> 1 len( occurrences.search(taxonKey=key, limit=200001)['results'])
>>>>
>>>> /home/nils/.conda/envs/birdhouse/lib/python2.7/site-packages/pygbif/occurrences/search.pyc
>>>> in search(taxonKey, scientificName, country, publishingCountry,
>>>> hasCoordinate, typeStatus, recordNumber, lastInterpreted, continent,
>>>> geometry, recordedBy, basisOfRecord, datasetKey, eventDate, catalogNumber,
>>>> year, month, decimalLatitude, decimalLongitude, elevation, depth,
>>>> institutionCode, collectionCode, hasGeospatialIssue, issue, q, mediatype,
>>>> limit, offset, **kwargs)
>>>>     251         'collectionCode': collectionCode, 'hasGeospatialIssue':
>>>> hasGeospatialIssue,
>>>>     252         'issue': issue, 'q': q, 'mediatype': mediatype,
>>>> 'limit': limit,
>>>> --> 253         'offset': offset}, **kwargs)
>>>>     254     return out
>>>>
>>>> /home/nils/.conda/envs/birdhouse/lib/python2.7/site-packages/pygbif/gbifutils.pyc
>>>> in gbif_GET(url, args, **kwargs)
>>>>      17 def gbif_GET(url, args, **kwargs):
>>>>      18   out = requests.get(url, params=args, headers=make_ua(),
>>>> **kwargs)
>>>> ---> 19   out.raise_for_status()
>>>>      20   stopifnot(out.headers['content-type'])
>>>>      21   return out.json()
>>>>
>>>> /home/nils/.conda/envs/birdhouse/lib/python2.7/site-packages/requests/models.pyc
>>>> in raise_for_status(self)
>>>>     842
>>>>     843         if http_error_msg:
>>>> --> 844             raise HTTPError(http_error_msg, response=self)
>>>>     845
>>>>     846     def close(self):
>>>>
>>>> HTTPError: 400 Client Error: Bad Request for url:
>>>> <http://api.gbif.org/v1/occurrence/search?taxonKey=2882316&limit=200001&offset=0>
>>>> http://api.gbif.org/v1/occurrence/search?taxonKey=2882316&limit=200001&offset=0
>>>>
>>>> In [73]:
>>>>
>>>>
>>>> On 02/06/2016 02:22, Scott Chamberlain wrote:
>>>>
>>>> Fixes to docs and fix to download_get are up on Github now. Will push
>>>> new version to pypi soon. Let me know if some thing still don't work for
>>>> you after reinstalling from github
>>>>
>>>> S
>>>>
>>>> On Wed, Jun 1, 2016 at 4:51 PM Nils Hempelmann <info at nilshempelmann.de>
>>>> wrote:
>>>>
>>>>> Hi Scott and Mauro
>>>>>
>>>>> Mauro sended me a snipped of code which worked fine:
>>>>>
>>>>> https://github.com/bird-house/flyingpigeon/blob/develop/scripts/pygbif_occurence.py
>>>>>
>>>>> I found the example here:
>>>>> https://github.com/maurobio/pygbif#occurrence-data
>>>>>
>>>>> and here:
>>>>> https://github.com/sckott/pygbif#occurrences-module
>>>>>
>>>>> Thanks a lot very great. That enables a lot ;-)
>>>>> I ll keep you posted
>>>>>
>>>>> merci
>>>>> Nils
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 02/06/2016 01:45, Scott Chamberlain wrote:
>>>>>
>>>>> And where are those example from exactly? I don't see those examples
>>>>> searching the repo (which includes all docs).
>>>>>
>>>>> `pygbif.name_suggest` wouldn't work because `name_suggest` is a method
>>>>> in the `species` module, so you'd have to do pygbif.species.name_suggest,
>>>>> or `from pygbif import species`, then `species.name_suggest`
>>>>>
>>>>> Looks like occ.get(taxonKey = 252408386)  is a documentation bug,
>>>>> that should be `key` instead of `taxonKey`, a copy paste error. will fix
>>>>> that.
>>>>>
>>>>> The `occ.download_get` call has a small bug, will fix that
>>>>>
>>>>> All other calls work for me
>>>>>
>>>>> S
>>>>>
>>>>> On Wed, Jun 1, 2016 at 4:34 PM Scott Chamberlain <
>>>>> myrmecocystus at gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> What version of pygbif are you on? And what version of Python>
>>>>>>
>>>>>> Best, Scott
>>>>>>
>>>>>> On Wed, Jun 1, 2016 at 4:02 PM Nils Hempelmann <
>>>>>> info at nilshempelmann.de> wrote:
>>>>>>
>>>>>>> Hi Mauro and Scott
>>>>>>>
>>>>>>> Was checking out pygbif. seems to be a very useful tool.
>>>>>>>
>>>>>>> Can you help me with the syntax (or forward me to the appropriate
>>>>>>> person
>>>>>>> ;-) )
>>>>>>> The given snippets of code are outdated.
>>>>>>>
>>>>>>> I am basically just looking for the occurrence coordinates:
>>>>>>> here is my first try :
>>>>>>>
>>>>>>> import pygbif
>>>>>>> occ = pygbif.occurrences.search(scientificName='Fagus sylvatica')
>>>>>>> occ['count']
>>>>>>>
>>>>>>> ... and further? ;-)
>>>>>>>
>>>>>>> the examples in the docu are throwing errors:
>>>>>>>
>>>>>>> key=  pygbif.name_suggest(q='Helianthus
>>>>>>> annuus',rank='species')['key']
>>>>>>> pygbif.search(taxonKey=key[0]['key'],limit=2)
>>>>>>>
>>>>>>> from pygbif import occurrences as occ occ.search(taxonKey = 3329049)
>>>>>>> occ.get(taxonKey = 252408386) occ.count(isGeoreferenced = True)
>>>>>>> occ.download_list(user  =  "sckott",  limit  =  5)
>>>>>>> occ.download_meta(key  =  "0000099-140929101555934")
>>>>>>> occ.download_get("0000099-140929101555934")
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Nils
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> _______________________________________________
>> API-users mailing list
>> API-users at lists.gbif.org
>> http://lists.gbif.org/mailman/listinfo/api-users
>>
>
>
> _______________________________________________
> API-users mailing listAPI-users at lists.gbif.orghttp://lists.gbif.org/mailman/listinfo/api-users
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gbif.org/pipermail/api-users/attachments/20160603/95683a46/attachment-0001.html>


More information about the API-users mailing list