[API-users] birdhouse meets GBIF

Mon Jun 6 15:16:29 CEST 2016

Hi Scott et al.

I implemented the pygbif in the SDM in birdhouse. Since there all 
required dependencies are fetched as conda buildouts Carsten (from DKRZ) 
was building a conda buildout.
For those who uses anaconda (https://www.continuum.io/why-anaconda) it 
can be installed with

conda install -c birdhouse pygbif

Best
Nils

On 02/06/2016 16:17, Scott Chamberlain wrote:
> Nils,
>
> We do have some download API functions in pygbif, and there's a pull 
> request now to add the methods to request downloads. We'll add some 
> functionality to read darwin core dumps as well.
>
> Scott
>
> On Thu, Jun 2, 2016 at 6:06 AM Tim Robertson <trobertson at gbif.org 
> <mailto:trobertson at gbif.org>> wrote:
>
>     Hi Nils,
>
>     It’s documented here:
>     http://www.gbif.org/developer/occurrence#download
>     You POST a JSON doc with the query using HTTP basic authentication.
>
>     If you need any help, please say and we can provide some examples
>     using CURL or python.
>
>     I am actually working on a revision to the mapping API, and will
>     see what may be possible returning distinct locations.  It’s
>     tricky to do in real time though.
>     What kind of accuracy do you need please?
>
>     Thanks,
>     Tim
>
>     From: Nils Hempelmann <info at nilshempelmann.de
>     <mailto:info at nilshempelmann.de>>
>     Date: Thursday 2 June 2016 at 15:00
>     To: Tim Robertson <trobertson at gbif.org
>     <mailto:trobertson at gbif.org>>, "api-users at lists.gbif.org
>     <mailto:api-users at lists.gbif.org>" <api-users at lists.gbif.org
>     <mailto:api-users at lists.gbif.org>>, "wps-dev at lists.dkrz.de
>     <mailto:wps-dev at lists.dkrz.de>" <wps-dev at lists.dkrz.de
>     <mailto:wps-dev at lists.dkrz.de>>
>     Subject: Re: [API-users] birdhouse meets GBIF
>
>     Hi Tim
>
>     Thanks for the quick answer. If not OGC service, how is the
>     download API callable from outside of GBIF?
>
>     Merci
>     Nils
>
>
>     On 02/06/2016 14:45, Tim Robertson wrote:
>>     Hi Nils
>>
>>     We don’t have any OGC services, but there is an asynchronous
>>     download API which can deliver CSVs.
>>     Off the top of my head, the only way you can automate this at the
>>     moment would be to do periodically issue a download (e.g. Daily)
>>     process as you see fit, and cache the result for your app.
>>
>>     In the download API you can get any size from 1 - > 660 million
>>     records, which is why it is asynchronous.  It’s used a lot by
>>     various applications and communities.
>>
>>     I hope this helps,
>>     Tim
>>
>>
>>     From: API-users <api-users-bounces at lists.gbif.org
>>     <mailto:api-users-bounces at lists.gbif.org>> on behalf of Nils
>>     Hempelmann <info at nilshempelmann.de <mailto:info at nilshempelmann.de>>
>>     Date: Thursday 2 June 2016 at 14:25
>>     To: "api-users at lists.gbif.org <mailto:api-users at lists.gbif.org>"
>>     <api-users at lists.gbif.org <mailto:api-users at lists.gbif.org>>,
>>     "wps-dev at lists.dkrz.de <mailto:wps-dev at lists.dkrz.de>"
>>     <wps-dev at lists.dkrz.de <mailto:wps-dev at lists.dkrz.de>>
>>     Subject: [API-users] birdhouse meets GBIF
>>
>>     Dear all
>>
>>     Here is an current problem to solve :-) for the birdhouse WPS and
>>     GBIF developers.
>>     There is a species distribution process in the birdhouse WPS,
>>     processing Climate data based on occurrence coordinates from GBIF.
>>
>>     Currently the GBIF data are given to the process via an url of
>>     the GBIF csv (which had to be generated manually before).
>>     I was implementing a direct occurrence search from birdhouse to
>>     GBIF with pygbif. Worked fine BUT: there is a limit of 300
>>     records which is far to less to train the Species distribution
>>     model.
>>
>>     So here the question: Is there a way to request the species
>>     occurrence coordinates somehow directly (e.g. like a WPS request)?
>>
>>     (The previous conversation is following)
>>
>>     And some link for the background informations:
>>
>>     Birdhouse:
>>     http://birdhouse.readthedocs.io/en/latest/
>>     GBIF:
>>     http://www.gbif.org/
>>
>>     Species distribution process:
>>     http://flyingpigeon.readthedocs.io/en/latest/tutorials/sdm.html
>>
>>     Birdhouse architecture:
>>     https://github.com/bird-house/birdhouse-docs/blob/master/slides/birdhouse-architecture/birdhouse-architecture.pdf
>>
>>     Merci
>>     Nils
>>
>>     -------- Forwarded Message -------
>>     Subject: 	Re: pygbif for occurrence coordinates
>>     Date: 	Thu, 2 Jun 2016 09:00:49 -0300
>>     From: 	Mauro Cavalcanti <maurobio at gmail.com>
>>     <mailto:maurobio at gmail.com>
>>     To: 	Nils Hempelmann <info at nilshempelmann.de>
>>     <mailto:info at nilshempelmann.de>
>>     CC: 	wps-dev at lists.dkrz.de <mailto:wps-dev at lists.dkrz.de>,
>>     Carsten Ehbrecht <ehbrecht at dkrz.de> <mailto:ehbrecht at dkrz.de>,
>>     Wolfgang.Falk at lwf.bayern.de <mailto:Wolfgang.Falk at lwf.bayern.de>,
>>     Scott Chamberlain <myrmecocystus at gmail.com>
>>     <mailto:myrmecocystus at gmail.com>
>>
>>
>>
>>     Nils,
>>
>>     No, there is no other way to get at once a larger number of
>>     records using the GBIF API (although one could be able to achieve
>>     this by sequentially querying the server using "batches" of 200
>>     or 300 records each). As I said, there are good operational
>>     reasons for the GBIF developers to have imposed such limit.
>>
>>     As of your other question, it should better be put to the GBIF
>>     developers themselves (in the discussion list, so that we can all
>>     benefit from the answers! :-))
>>
>>     With warmest regards,
>>
>>     --
>>     Dr. Mauro J. Cavalcanti
>>     E-mail: maurobio at gmail.com <mailto:maurobio at gmail.com>
>>     Web: http://sites.google.com/site/maurobio
>>
>>     Em 02/06/2016 08:49, "Nils Hempelmann" <info at nilshempelmann.de
>>     <mailto:info at nilshempelmann.de>> escreveu:
>>
>>         Hi Mauro
>>
>>         Oh ...
>>         No way to set it unlimited?
>>         The 'manual via browser' option and parsing the returning csv
>>         is the current status.
>>         or any alternative to pygbif?
>>
>>         If I understood it correctly, GBIF database is organized as
>>         Web Server, so I gues there should be a way to connect to the
>>         birdhouse WPS, am I right?
>>
>>         (put the web-dev list in copy)
>>
>>         Merci
>>         Nils
>>
>>
>>         On 02/06/2016 12:40, Mauro Cavalcanti wrote:
>>>
>>>         Nils,
>>>
>>>         That's a limit imposed (for good operational reasons) by the
>>>         GBIF API. If you want a larger number of records, you'll
>>>         have to download them "manually" (that is, via browser) and
>>>         then parse locally the csv file returned from the GBIF server.
>>>
>>>         Hope this helps.
>>>
>>>         Best regards,
>>>
>>>         --
>>>         Dr. Mauro J. Cavalcanti
>>>         E-mail: maurobio at gmail.com <mailto:maurobio at gmail.com>
>>>         Web: http://sites.google.com/site/maurobio
>>>
>>>         Em 02/06/2016 04:48, "Nils Hempelmann"
>>>         <info at nilshempelmann.de <mailto:info at nilshempelmann.de>>
>>>         escreveu:
>>>
>>>             Hi Scott
>>>
>>>             works fine. Thanks a lot. Just have a question for the
>>>             search limits:
>>>
>>>             The maximal records seems to be limited to 300. Is that
>>>             on purpose?
>>>             And requesting more than 200000 gives a 'bad request'
>>>
>>>             Merci
>>>             Nils
>>>
>>>
>>>             In [68]: len( occurrences.search(taxonKey=key,
>>>             limit=100)['results'])
>>>             Out[68]: 100
>>>
>>>             In [69]: len( occurrences.search(taxonKey=key,
>>>             limit=300)['results'])
>>>             Out[69]: 300
>>>
>>>             In [70]: len( occurrences.search(taxonKey=key,
>>>             limit=3000)['results'])
>>>             Out[70]: 300
>>>
>>>             In [71]: len( occurrences.search(taxonKey=key,
>>>             limit=200000)['results'])
>>>             Out[71]: 300
>>>
>>>             In [72]: len( occurrences.search(taxonKey=key,
>>>             limit=200001)['results'])
>>>             ---------------------------------------------------------------------------
>>>             HTTPError Traceback (most recent call last)
>>>             <ipython-input-72-2f7d7b4ccba0> in <module>()
>>>             ----> 1 len( occurrences.search(taxonKey=key,
>>>             limit=200001)['results'])
>>>
>>>             /home/nils/.conda/envs/birdhouse/lib/python2.7/site-packages/pygbif/occurrences/search.pyc
>>>             in search(taxonKey, scientificName, country,
>>>             publishingCountry, hasCoordinate, typeStatus,
>>>             recordNumber, lastInterpreted, continent, geometry,
>>>             recordedBy, basisOfRecord, datasetKey, eventDate,
>>>             catalogNumber, year, month, decimalLatitude,
>>>             decimalLongitude, elevation, depth, institutionCode,
>>>             collectionCode, hasGeospatialIssue, issue, q, mediatype,
>>>             limit, offset, **kwargs)
>>>                 251 'collectionCode': collectionCode,
>>>             'hasGeospatialIssue': hasGeospatialIssue,
>>>                 252         'issue': issue, 'q': q, 'mediatype':
>>>             mediatype, 'limit': limit,
>>>             --> 253         'offset': offset}, **kwargs)
>>>                 254     return out
>>>
>>>             /home/nils/.conda/envs/birdhouse/lib/python2.7/site-packages/pygbif/gbifutils.pyc
>>>             in gbif_GET(url, args, **kwargs)
>>>                  17 def gbif_GET(url, args, **kwargs):
>>>                  18   out = requests.get(url, params=args,
>>>             headers=make_ua(), **kwargs)
>>>             ---> 19 out.raise_for_status()
>>>                  20 stopifnot(out.headers['content-type'])
>>>                  21   return out.json()
>>>
>>>             /home/nils/.conda/envs/birdhouse/lib/python2.7/site-packages/requests/models.pyc
>>>             in raise_for_status(self)
>>>                 842
>>>                 843         if http_error_msg:
>>>             --> 844             raise HTTPError(http_error_msg,
>>>             response=self)
>>>                 845
>>>                 846     def close(self):
>>>
>>>             HTTPError: 400 Client Error: Bad Request for url:
>>>             <http://api.gbif.org/v1/occurrence/search?taxonKey=2882316&limit=200001&offset=0>http://api.gbif.org/v1/occurrence/search?taxonKey=2882316&limit=200001&offset=0
>>>
>>>             In [73]:
>>>
>>>
>>>             On 02/06/2016 02:22, Scott Chamberlain wrote:
>>>>             Fixes to docs and fix to download_get are up on Github
>>>>             now. Will push new version to pypi soon. Let me know if
>>>>             some thing still don't work for you after reinstalling
>>>>             from github
>>>>
>>>>             S
>>>>
>>>>             On Wed, Jun 1, 2016 at 4:51 PM Nils Hempelmann
>>>>             <info at nilshempelmann.de
>>>>             <mailto:info at nilshempelmann.de>> wrote:
>>>>
>>>>                 Hi Scott and Mauro
>>>>
>>>>                 Mauro sended me a snipped of code which worked fine:
>>>>                 https://github.com/bird-house/flyingpigeon/blob/develop/scripts/pygbif_occurence.py
>>>>
>>>>                 I found the example here:
>>>>                 https://github.com/maurobio/pygbif#occurrence-data
>>>>
>>>>                 and here:
>>>>                 https://github.com/sckott/pygbif#occurrences-module
>>>>
>>>>                 Thanks a lot very great. That enables a lot ;-)
>>>>                 I ll keep you posted
>>>>
>>>>                 merci
>>>>                 Nils
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>                 On 02/06/2016 01:45, Scott Chamberlain wrote:
>>>>>                 And where are those example from exactly? I don't
>>>>>                 see those examples searching the repo (which
>>>>>                 includes all docs).
>>>>>
>>>>>                 `pygbif.name_suggest` wouldn't work because
>>>>>                 `name_suggest` is a method in the `species`
>>>>>                 module, so you'd have to do
>>>>>                 pygbif.species.name_suggest, or `from pygbif
>>>>>                 import species`, then `species.name_suggest`
>>>>>
>>>>>                 Looks like occ.get(taxonKey = 252408386)is a
>>>>>                 documentation bug, that should be `key` instead of
>>>>>                 `taxonKey`, a copy paste error. will fix that.
>>>>>
>>>>>                 The `occ.download_get` call has a small bug, will
>>>>>                 fix that
>>>>>
>>>>>                 All other calls work for me
>>>>>
>>>>>                 S
>>>>>
>>>>>                 On Wed, Jun 1, 2016 at 4:34 PM Scott Chamberlain
>>>>>                 <myrmecocystus at gmail.com
>>>>>                 <mailto:myrmecocystus at gmail.com>> wrote:
>>>>>
>>>>>                     Hi,
>>>>>
>>>>>                     What version of pygbif are you on? And what
>>>>>                     version of Python>
>>>>>
>>>>>                     Best, Scott
>>>>>
>>>>>                     On Wed, Jun 1, 2016 at 4:02 PM Nils Hempelmann
>>>>>                     <info at nilshempelmann.de
>>>>>                     <mailto:info at nilshempelmann.de>> wrote:
>>>>>
>>>>>                         Hi Mauro and Scott
>>>>>
>>>>>                         Was checking out pygbif. seems to be a
>>>>>                         very useful tool.
>>>>>
>>>>>                         Can you help me with the syntax (or
>>>>>                         forward me to the appropriate person
>>>>>                         ;-) )
>>>>>                         The given snippets of code are outdated.
>>>>>
>>>>>                         I am basically just looking for the
>>>>>                         occurrence coordinates:
>>>>>                         here is my first try :
>>>>>
>>>>>                         import pygbif
>>>>>                         occ =
>>>>>                         pygbif.occurrences.search(scientificName='Fagus
>>>>>                         sylvatica')
>>>>>                         occ['count']
>>>>>
>>>>>                         ... and further? ;-)
>>>>>
>>>>>                         the examples in the docu are throwing errors:
>>>>>
>>>>>                         key= pygbif.name_suggest(q='Helianthus
>>>>>                         annuus',rank='species')['key']
>>>>>                         pygbif.search(taxonKey=key[0]['key'],limit=2)
>>>>>
>>>>>                         from pygbif import occurrences as occ
>>>>>                         occ.search(taxonKey = 3329049)
>>>>>                         occ.get(taxonKey = 252408386)
>>>>>                         occ.count(isGeoreferenced = True)
>>>>>                         occ.download_list(user =  "sckott", limit 
>>>>>                         =  5)
>>>>>                         occ.download_meta(key =
>>>>>                         "0000099-140929101555934")
>>>>>                         occ.download_get("0000099-140929101555934")
>>>>>
>>>>>
>>>>>                         Thanks
>>>>>                         Nils
>>>>>
>>>>
>>>
>>
>>
>>
>
>     _______________________________________________
>     API-users mailing list
>     API-users at lists.gbif.org <mailto:API-users at lists.gbif.org>
>     http://lists.gbif.org/mailman/listinfo/api-users
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gbif.org/pipermail/api-users/attachments/20160606/824e3b28/attachment-0001.html>