[API-users] birdhouse meets GBIF

Fri Jun 3 09:41:49 CEST 2016

Hey Scott,

About the functionality to read Darwin Core dumps: don't hesitate to 
interfacee with python-dwca-reader 
<https://github.com/BelgianBiodiversityPlatform/python-dwca-reader> to 
avoid reinventing the wheel. It's been written exactly for this purpose, 
is now rather mature and used by several group of users.

I'm also always willing to put some effort in order to make its 
integration with projects like pygbif easier. Just fill a GitHub issue 
or send me a mail for any clarification/bug/missing feature.

Best,

Nico

Le 2/06/16 16:17, Scott Chamberlain a écrit :
> Nils,
>
> We do have some download API functions in pygbif, and there's a pull 
> request now to add the methods to request downloads. We'll add some 
> functionality to read darwin core dumps as well.
>
> Scott
>
> On Thu, Jun 2, 2016 at 6:06 AM Tim Robertson <trobertson at gbif.org 
> <mailto:trobertson at gbif.org>> wrote:
>
>     Hi Nils,
>
>     It’s documented here:
>     http://www.gbif.org/developer/occurrence#download
>     You POST a JSON doc with the query using HTTP basic authentication.
>
>     If you need any help, please say and we can provide some examples
>     using CURL or python.
>
>     I am actually working on a revision to the mapping API, and will
>     see what may be possible returning distinct locations.  It’s
>     tricky to do in real time though.
>     What kind of accuracy do you need please?
>
>     Thanks,
>     Tim
>
>     From: Nils Hempelmann <info at nilshempelmann.de
>     <mailto:info at nilshempelmann.de>>
>     Date: Thursday 2 June 2016 at 15:00
>     To: Tim Robertson <trobertson at gbif.org
>     <mailto:trobertson at gbif.org>>, "api-users at lists.gbif.org
>     <mailto:api-users at lists.gbif.org>" <api-users at lists.gbif.org
>     <mailto:api-users at lists.gbif.org>>, "wps-dev at lists.dkrz.de
>     <mailto:wps-dev at lists.dkrz.de>" <wps-dev at lists.dkrz.de
>     <mailto:wps-dev at lists.dkrz.de>>
>     Subject: Re: [API-users] birdhouse meets GBIF
>
>     Hi Tim
>
>     Thanks for the quick answer. If not OGC service, how is the
>     download API callable from outside of GBIF?
>
>     Merci
>     Nils
>
>
>     On 02/06/2016 14:45, Tim Robertson wrote:
>>     Hi Nils
>>
>>     We don’t have any OGC services, but there is an asynchronous
>>     download API which can deliver CSVs.
>>     Off the top of my head, the only way you can automate this at the
>>     moment would be to do periodically issue a download (e.g. Daily)
>>     process as you see fit, and cache the result for your app.
>>
>>     In the download API you can get any size from 1 - > 660 million
>>     records, which is why it is asynchronous.  It’s used a lot by
>>     various applications and communities.
>>
>>     I hope this helps,
>>     Tim
>>
>>
>>     From: API-users <api-users-bounces at lists.gbif.org
>>     <mailto:api-users-bounces at lists.gbif.org>> on behalf of Nils
>>     Hempelmann <info at nilshempelmann.de <mailto:info at nilshempelmann.de>>
>>     Date: Thursday 2 June 2016 at 14:25
>>     To: "api-users at lists.gbif.org <mailto:api-users at lists.gbif.org>"
>>     <api-users at lists.gbif.org <mailto:api-users at lists.gbif.org>>,
>>     "wps-dev at lists.dkrz.de <mailto:wps-dev at lists.dkrz.de>"
>>     <wps-dev at lists.dkrz.de <mailto:wps-dev at lists.dkrz.de>>
>>     Subject: [API-users] birdhouse meets GBIF
>>
>>     Dear all
>>
>>     Here is an current problem to solve :-) for the birdhouse WPS and
>>     GBIF developers.
>>     There is a species distribution process in the birdhouse WPS,
>>     processing Climate data based on occurrence coordinates from GBIF.
>>
>>     Currently the GBIF data are given to the process via an url of
>>     the GBIF csv (which had to be generated manually before).
>>     I was implementing a direct occurrence search from birdhouse to
>>     GBIF with pygbif. Worked fine BUT: there is a limit of 300
>>     records which is far to less to train the Species distribution
>>     model.
>>
>>     So here the question: Is there a way to request the species
>>     occurrence coordinates somehow directly (e.g. like a WPS request)?
>>
>>     (The previous conversation is following)
>>
>>     And some link for the background informations:
>>
>>     Birdhouse:
>>     http://birdhouse.readthedocs.io/en/latest/
>>     GBIF:
>>     http://www.gbif.org/
>>
>>     Species distribution process:
>>     http://flyingpigeon.readthedocs.io/en/latest/tutorials/sdm.html
>>
>>     Birdhouse architecture:
>>     https://github.com/bird-house/birdhouse-docs/blob/master/slides/birdhouse-architecture/birdhouse-architecture.pdf
>>
>>     Merci
>>     Nils
>>
>>     -------- Forwarded Message -------
>>     Subject: 	Re: pygbif for occurrence coordinates
>>     Date: 	Thu, 2 Jun 2016 09:00:49 -0300
>>     From: 	Mauro Cavalcanti <maurobio at gmail.com>
>>     <mailto:maurobio at gmail.com>
>>     To: 	Nils Hempelmann <info at nilshempelmann.de>
>>     <mailto:info at nilshempelmann.de>
>>     CC: 	wps-dev at lists.dkrz.de <mailto:wps-dev at lists.dkrz.de>,
>>     Carsten Ehbrecht <ehbrecht at dkrz.de> <mailto:ehbrecht at dkrz.de>,
>>     Wolfgang.Falk at lwf.bayern.de <mailto:Wolfgang.Falk at lwf.bayern.de>,
>>     Scott Chamberlain <myrmecocystus at gmail.com>
>>     <mailto:myrmecocystus at gmail.com>
>>
>>
>>
>>     Nils,
>>
>>     No, there is no other way to get at once a larger number of
>>     records using the GBIF API (although one could be able to achieve
>>     this by sequentially querying the server using "batches" of 200
>>     or 300 records each). As I said, there are good operational
>>     reasons for the GBIF developers to have imposed such limit.
>>
>>     As of your other question, it should better be put to the GBIF
>>     developers themselves (in the discussion list, so that we can all
>>     benefit from the answers! :-))
>>
>>     With warmest regards,
>>
>>     --
>>     Dr. Mauro J. Cavalcanti
>>     E-mail: maurobio at gmail.com <mailto:maurobio at gmail.com>
>>     Web: http://sites.google.com/site/maurobio
>>
>>     Em 02/06/2016 08:49, "Nils Hempelmann" <info at nilshempelmann.de
>>     <mailto:info at nilshempelmann.de>> escreveu:
>>
>>         Hi Mauro
>>
>>         Oh ...
>>         No way to set it unlimited?
>>         The 'manual via browser' option and parsing the returning csv
>>         is the current status.
>>         or any alternative to pygbif?
>>
>>         If I understood it correctly, GBIF database is organized as
>>         Web Server, so I gues there should be a way to connect to the
>>         birdhouse WPS, am I right?
>>
>>         (put the web-dev list in copy)
>>
>>         Merci
>>         Nils
>>
>>
>>         On 02/06/2016 12:40, Mauro Cavalcanti wrote:
>>>
>>>         Nils,
>>>
>>>         That's a limit imposed (for good operational reasons) by the
>>>         GBIF API. If you want a larger number of records, you'll
>>>         have to download them "manually" (that is, via browser) and
>>>         then parse locally the csv file returned from the GBIF server.
>>>
>>>         Hope this helps.
>>>
>>>         Best regards,
>>>
>>>         --
>>>         Dr. Mauro J. Cavalcanti
>>>         E-mail: maurobio at gmail.com <mailto:maurobio at gmail.com>
>>>         Web: http://sites.google.com/site/maurobio
>>>
>>>         Em 02/06/2016 04:48, "Nils Hempelmann"
>>>         <info at nilshempelmann.de <mailto:info at nilshempelmann.de>>
>>>         escreveu:
>>>
>>>             Hi Scott
>>>
>>>             works fine. Thanks a lot. Just have a question for the
>>>             search limits:
>>>
>>>             The maximal records seems to be limited to 300. Is that
>>>             on purpose?
>>>             And requesting more than 200000 gives a 'bad request'
>>>
>>>             Merci
>>>             Nils
>>>
>>>
>>>             In [68]: len( occurrences.search(taxonKey=key,
>>>             limit=100)['results'])
>>>             Out[68]: 100
>>>
>>>             In [69]: len( occurrences.search(taxonKey=key,
>>>             limit=300)['results'])
>>>             Out[69]: 300
>>>
>>>             In [70]: len( occurrences.search(taxonKey=key,
>>>             limit=3000)['results'])
>>>             Out[70]: 300
>>>
>>>             In [71]: len( occurrences.search(taxonKey=key,
>>>             limit=200000)['results'])
>>>             Out[71]: 300
>>>
>>>             In [72]: len( occurrences.search(taxonKey=key,
>>>             limit=200001)['results'])
>>>             ---------------------------------------------------------------------------
>>>             HTTPError Traceback (most recent call last)
>>>             <ipython-input-72-2f7d7b4ccba0> in <module>()
>>>             ----> 1 len( occurrences.search(taxonKey=key,
>>>             limit=200001)['results'])
>>>
>>>             /home/nils/.conda/envs/birdhouse/lib/python2.7/site-packages/pygbif/occurrences/search.pyc
>>>             in search(taxonKey, scientificName, country,
>>>             publishingCountry, hasCoordinate, typeStatus,
>>>             recordNumber, lastInterpreted, continent, geometry,
>>>             recordedBy, basisOfRecord, datasetKey, eventDate,
>>>             catalogNumber, year, month, decimalLatitude,
>>>             decimalLongitude, elevation, depth, institutionCode,
>>>             collectionCode, hasGeospatialIssue, issue, q, mediatype,
>>>             limit, offset, **kwargs)
>>>                 251 'collectionCode': collectionCode,
>>>             'hasGeospatialIssue': hasGeospatialIssue,
>>>                 252         'issue': issue, 'q': q, 'mediatype':
>>>             mediatype, 'limit': limit,
>>>             --> 253         'offset': offset}, **kwargs)
>>>                 254     return out
>>>
>>>             /home/nils/.conda/envs/birdhouse/lib/python2.7/site-packages/pygbif/gbifutils.pyc
>>>             in gbif_GET(url, args, **kwargs)
>>>                  17 def gbif_GET(url, args, **kwargs):
>>>                  18   out = requests.get(url, params=args,
>>>             headers=make_ua(), **kwargs)
>>>             ---> 19 out.raise_for_status()
>>>                  20 stopifnot(out.headers['content-type'])
>>>                  21   return out.json()
>>>
>>>             /home/nils/.conda/envs/birdhouse/lib/python2.7/site-packages/requests/models.pyc
>>>             in raise_for_status(self)
>>>                 842
>>>                 843         if http_error_msg:
>>>             --> 844             raise HTTPError(http_error_msg,
>>>             response=self)
>>>                 845
>>>                 846     def close(self):
>>>
>>>             HTTPError: 400 Client Error: Bad Request for url:
>>>             <http://api.gbif.org/v1/occurrence/search?taxonKey=2882316&limit=200001&offset=0>http://api.gbif.org/v1/occurrence/search?taxonKey=2882316&limit=200001&offset=0
>>>
>>>             In [73]:
>>>
>>>
>>>             On 02/06/2016 02:22, Scott Chamberlain wrote:
>>>>             Fixes to docs and fix to download_get are up on Github
>>>>             now. Will push new version to pypi soon. Let me know if
>>>>             some thing still don't work for you after reinstalling
>>>>             from github
>>>>
>>>>             S
>>>>
>>>>             On Wed, Jun 1, 2016 at 4:51 PM Nils Hempelmann
>>>>             <info at nilshempelmann.de
>>>>             <mailto:info at nilshempelmann.de>> wrote:
>>>>
>>>>                 Hi Scott and Mauro
>>>>
>>>>                 Mauro sended me a snipped of code which worked fine:
>>>>                 https://github.com/bird-house/flyingpigeon/blob/develop/scripts/pygbif_occurence.py
>>>>
>>>>                 I found the example here:
>>>>                 https://github.com/maurobio/pygbif#occurrence-data
>>>>
>>>>                 and here:
>>>>                 https://github.com/sckott/pygbif#occurrences-module
>>>>
>>>>                 Thanks a lot very great. That enables a lot ;-)
>>>>                 I ll keep you posted
>>>>
>>>>                 merci
>>>>                 Nils
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>                 On 02/06/2016 01:45, Scott Chamberlain wrote:
>>>>>                 And where are those example from exactly? I don't
>>>>>                 see those examples searching the repo (which
>>>>>                 includes all docs).
>>>>>
>>>>>                 `pygbif.name_suggest` wouldn't work because
>>>>>                 `name_suggest` is a method in the `species`
>>>>>                 module, so you'd have to do
>>>>>                 pygbif.species.name_suggest, or `from pygbif
>>>>>                 import species`, then `species.name_suggest`
>>>>>
>>>>>                 Looks like occ.get(taxonKey = 252408386)is a
>>>>>                 documentation bug, that should be `key` instead of
>>>>>                 `taxonKey`, a copy paste error. will fix that.
>>>>>
>>>>>                 The `occ.download_get` call has a small bug, will
>>>>>                 fix that
>>>>>
>>>>>                 All other calls work for me
>>>>>
>>>>>                 S
>>>>>
>>>>>                 On Wed, Jun 1, 2016 at 4:34 PM Scott Chamberlain
>>>>>                 <myrmecocystus at gmail.com
>>>>>                 <mailto:myrmecocystus at gmail.com>> wrote:
>>>>>
>>>>>                     Hi,
>>>>>
>>>>>                     What version of pygbif are you on? And what
>>>>>                     version of Python>
>>>>>
>>>>>                     Best, Scott
>>>>>
>>>>>                     On Wed, Jun 1, 2016 at 4:02 PM Nils Hempelmann
>>>>>                     <info at nilshempelmann.de
>>>>>                     <mailto:info at nilshempelmann.de>> wrote:
>>>>>
>>>>>                         Hi Mauro and Scott
>>>>>
>>>>>                         Was checking out pygbif. seems to be a
>>>>>                         very useful tool.
>>>>>
>>>>>                         Can you help me with the syntax (or
>>>>>                         forward me to the appropriate person
>>>>>                         ;-) )
>>>>>                         The given snippets of code are outdated.
>>>>>
>>>>>                         I am basically just looking for the
>>>>>                         occurrence coordinates:
>>>>>                         here is my first try :
>>>>>
>>>>>                         import pygbif
>>>>>                         occ =
>>>>>                         pygbif.occurrences.search(scientificName='Fagus
>>>>>                         sylvatica')
>>>>>                         occ['count']
>>>>>
>>>>>                         ... and further? ;-)
>>>>>
>>>>>                         the examples in the docu are throwing errors:
>>>>>
>>>>>                         key= pygbif.name_suggest(q='Helianthus
>>>>>                         annuus',rank='species')['key']
>>>>>                         pygbif.search(taxonKey=key[0]['key'],limit=2)
>>>>>
>>>>>                         from pygbif import occurrences as occ
>>>>>                         occ.search(taxonKey = 3329049)
>>>>>                         occ.get(taxonKey = 252408386)
>>>>>                         occ.count(isGeoreferenced = True)
>>>>>                         occ.download_list(user =  "sckott", limit 
>>>>>                         =  5)
>>>>>                         occ.download_meta(key =
>>>>>                         "0000099-140929101555934")
>>>>>                         occ.download_get("0000099-140929101555934")
>>>>>
>>>>>
>>>>>                         Thanks
>>>>>                         Nils
>>>>>
>>>>
>>>
>>
>>
>>
>
>     _______________________________________________
>     API-users mailing list
>     API-users at lists.gbif.org <mailto:API-users at lists.gbif.org>
>     http://lists.gbif.org/mailman/listinfo/api-users
>
>
>
> _______________________________________________
> API-users mailing list
> API-users at lists.gbif.org
> http://lists.gbif.org/mailman/listinfo/api-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gbif.org/pipermail/api-users/attachments/20160603/00650447/attachment-0001.html>