[API-users] birdhouse meets GBIF
Nils Hempelmann
info at nilshempelmann.de
Mon Jun 6 15:16:29 CEST 2016
Hi Scott et al.
I implemented the pygbif in the SDM in birdhouse. Since there all
required dependencies are fetched as conda buildouts Carsten (from DKRZ)
was building a conda buildout.
For those who uses anaconda (https://www.continuum.io/why-anaconda) it
can be installed with
conda install -c birdhouse pygbif
Best
Nils
On 02/06/2016 16:17, Scott Chamberlain wrote:
> Nils,
>
> We do have some download API functions in pygbif, and there's a pull
> request now to add the methods to request downloads. We'll add some
> functionality to read darwin core dumps as well.
>
> Scott
>
> On Thu, Jun 2, 2016 at 6:06 AM Tim Robertson <trobertson at gbif.org
> <mailto:trobertson at gbif.org>> wrote:
>
> Hi Nils,
>
> It’s documented here:
> http://www.gbif.org/developer/occurrence#download
> You POST a JSON doc with the query using HTTP basic authentication.
>
> If you need any help, please say and we can provide some examples
> using CURL or python.
>
> I am actually working on a revision to the mapping API, and will
> see what may be possible returning distinct locations. It’s
> tricky to do in real time though.
> What kind of accuracy do you need please?
>
> Thanks,
> Tim
>
> From: Nils Hempelmann <info at nilshempelmann.de
> <mailto:info at nilshempelmann.de>>
> Date: Thursday 2 June 2016 at 15:00
> To: Tim Robertson <trobertson at gbif.org
> <mailto:trobertson at gbif.org>>, "api-users at lists.gbif.org
> <mailto:api-users at lists.gbif.org>" <api-users at lists.gbif.org
> <mailto:api-users at lists.gbif.org>>, "wps-dev at lists.dkrz.de
> <mailto:wps-dev at lists.dkrz.de>" <wps-dev at lists.dkrz.de
> <mailto:wps-dev at lists.dkrz.de>>
> Subject: Re: [API-users] birdhouse meets GBIF
>
> Hi Tim
>
> Thanks for the quick answer. If not OGC service, how is the
> download API callable from outside of GBIF?
>
> Merci
> Nils
>
>
> On 02/06/2016 14:45, Tim Robertson wrote:
>> Hi Nils
>>
>> We don’t have any OGC services, but there is an asynchronous
>> download API which can deliver CSVs.
>> Off the top of my head, the only way you can automate this at the
>> moment would be to do periodically issue a download (e.g. Daily)
>> process as you see fit, and cache the result for your app.
>>
>> In the download API you can get any size from 1 - > 660 million
>> records, which is why it is asynchronous. It’s used a lot by
>> various applications and communities.
>>
>> I hope this helps,
>> Tim
>>
>>
>> From: API-users <api-users-bounces at lists.gbif.org
>> <mailto:api-users-bounces at lists.gbif.org>> on behalf of Nils
>> Hempelmann <info at nilshempelmann.de <mailto:info at nilshempelmann.de>>
>> Date: Thursday 2 June 2016 at 14:25
>> To: "api-users at lists.gbif.org <mailto:api-users at lists.gbif.org>"
>> <api-users at lists.gbif.org <mailto:api-users at lists.gbif.org>>,
>> "wps-dev at lists.dkrz.de <mailto:wps-dev at lists.dkrz.de>"
>> <wps-dev at lists.dkrz.de <mailto:wps-dev at lists.dkrz.de>>
>> Subject: [API-users] birdhouse meets GBIF
>>
>> Dear all
>>
>> Here is an current problem to solve :-) for the birdhouse WPS and
>> GBIF developers.
>> There is a species distribution process in the birdhouse WPS,
>> processing Climate data based on occurrence coordinates from GBIF.
>>
>> Currently the GBIF data are given to the process via an url of
>> the GBIF csv (which had to be generated manually before).
>> I was implementing a direct occurrence search from birdhouse to
>> GBIF with pygbif. Worked fine BUT: there is a limit of 300
>> records which is far to less to train the Species distribution
>> model.
>>
>> So here the question: Is there a way to request the species
>> occurrence coordinates somehow directly (e.g. like a WPS request)?
>>
>> (The previous conversation is following)
>>
>> And some link for the background informations:
>>
>> Birdhouse:
>> http://birdhouse.readthedocs.io/en/latest/
>> GBIF:
>> http://www.gbif.org/
>>
>> Species distribution process:
>> http://flyingpigeon.readthedocs.io/en/latest/tutorials/sdm.html
>>
>> Birdhouse architecture:
>> https://github.com/bird-house/birdhouse-docs/blob/master/slides/birdhouse-architecture/birdhouse-architecture.pdf
>>
>> Merci
>> Nils
>>
>> -------- Forwarded Message -------
>> Subject: Re: pygbif for occurrence coordinates
>> Date: Thu, 2 Jun 2016 09:00:49 -0300
>> From: Mauro Cavalcanti <maurobio at gmail.com>
>> <mailto:maurobio at gmail.com>
>> To: Nils Hempelmann <info at nilshempelmann.de>
>> <mailto:info at nilshempelmann.de>
>> CC: wps-dev at lists.dkrz.de <mailto:wps-dev at lists.dkrz.de>,
>> Carsten Ehbrecht <ehbrecht at dkrz.de> <mailto:ehbrecht at dkrz.de>,
>> Wolfgang.Falk at lwf.bayern.de <mailto:Wolfgang.Falk at lwf.bayern.de>,
>> Scott Chamberlain <myrmecocystus at gmail.com>
>> <mailto:myrmecocystus at gmail.com>
>>
>>
>>
>> Nils,
>>
>> No, there is no other way to get at once a larger number of
>> records using the GBIF API (although one could be able to achieve
>> this by sequentially querying the server using "batches" of 200
>> or 300 records each). As I said, there are good operational
>> reasons for the GBIF developers to have imposed such limit.
>>
>> As of your other question, it should better be put to the GBIF
>> developers themselves (in the discussion list, so that we can all
>> benefit from the answers! :-))
>>
>> With warmest regards,
>>
>> --
>> Dr. Mauro J. Cavalcanti
>> E-mail: maurobio at gmail.com <mailto:maurobio at gmail.com>
>> Web: http://sites.google.com/site/maurobio
>>
>> Em 02/06/2016 08:49, "Nils Hempelmann" <info at nilshempelmann.de
>> <mailto:info at nilshempelmann.de>> escreveu:
>>
>> Hi Mauro
>>
>> Oh ...
>> No way to set it unlimited?
>> The 'manual via browser' option and parsing the returning csv
>> is the current status.
>> or any alternative to pygbif?
>>
>> If I understood it correctly, GBIF database is organized as
>> Web Server, so I gues there should be a way to connect to the
>> birdhouse WPS, am I right?
>>
>> (put the web-dev list in copy)
>>
>> Merci
>> Nils
>>
>>
>> On 02/06/2016 12:40, Mauro Cavalcanti wrote:
>>>
>>> Nils,
>>>
>>> That's a limit imposed (for good operational reasons) by the
>>> GBIF API. If you want a larger number of records, you'll
>>> have to download them "manually" (that is, via browser) and
>>> then parse locally the csv file returned from the GBIF server.
>>>
>>> Hope this helps.
>>>
>>> Best regards,
>>>
>>> --
>>> Dr. Mauro J. Cavalcanti
>>> E-mail: maurobio at gmail.com <mailto:maurobio at gmail.com>
>>> Web: http://sites.google.com/site/maurobio
>>>
>>> Em 02/06/2016 04:48, "Nils Hempelmann"
>>> <info at nilshempelmann.de <mailto:info at nilshempelmann.de>>
>>> escreveu:
>>>
>>> Hi Scott
>>>
>>> works fine. Thanks a lot. Just have a question for the
>>> search limits:
>>>
>>> The maximal records seems to be limited to 300. Is that
>>> on purpose?
>>> And requesting more than 200000 gives a 'bad request'
>>>
>>> Merci
>>> Nils
>>>
>>>
>>> In [68]: len( occurrences.search(taxonKey=key,
>>> limit=100)['results'])
>>> Out[68]: 100
>>>
>>> In [69]: len( occurrences.search(taxonKey=key,
>>> limit=300)['results'])
>>> Out[69]: 300
>>>
>>> In [70]: len( occurrences.search(taxonKey=key,
>>> limit=3000)['results'])
>>> Out[70]: 300
>>>
>>> In [71]: len( occurrences.search(taxonKey=key,
>>> limit=200000)['results'])
>>> Out[71]: 300
>>>
>>> In [72]: len( occurrences.search(taxonKey=key,
>>> limit=200001)['results'])
>>> ---------------------------------------------------------------------------
>>> HTTPError Traceback (most recent call last)
>>> <ipython-input-72-2f7d7b4ccba0> in <module>()
>>> ----> 1 len( occurrences.search(taxonKey=key,
>>> limit=200001)['results'])
>>>
>>> /home/nils/.conda/envs/birdhouse/lib/python2.7/site-packages/pygbif/occurrences/search.pyc
>>> in search(taxonKey, scientificName, country,
>>> publishingCountry, hasCoordinate, typeStatus,
>>> recordNumber, lastInterpreted, continent, geometry,
>>> recordedBy, basisOfRecord, datasetKey, eventDate,
>>> catalogNumber, year, month, decimalLatitude,
>>> decimalLongitude, elevation, depth, institutionCode,
>>> collectionCode, hasGeospatialIssue, issue, q, mediatype,
>>> limit, offset, **kwargs)
>>> 251 'collectionCode': collectionCode,
>>> 'hasGeospatialIssue': hasGeospatialIssue,
>>> 252 'issue': issue, 'q': q, 'mediatype':
>>> mediatype, 'limit': limit,
>>> --> 253 'offset': offset}, **kwargs)
>>> 254 return out
>>>
>>> /home/nils/.conda/envs/birdhouse/lib/python2.7/site-packages/pygbif/gbifutils.pyc
>>> in gbif_GET(url, args, **kwargs)
>>> 17 def gbif_GET(url, args, **kwargs):
>>> 18 out = requests.get(url, params=args,
>>> headers=make_ua(), **kwargs)
>>> ---> 19 out.raise_for_status()
>>> 20 stopifnot(out.headers['content-type'])
>>> 21 return out.json()
>>>
>>> /home/nils/.conda/envs/birdhouse/lib/python2.7/site-packages/requests/models.pyc
>>> in raise_for_status(self)
>>> 842
>>> 843 if http_error_msg:
>>> --> 844 raise HTTPError(http_error_msg,
>>> response=self)
>>> 845
>>> 846 def close(self):
>>>
>>> HTTPError: 400 Client Error: Bad Request for url:
>>> <http://api.gbif.org/v1/occurrence/search?taxonKey=2882316&limit=200001&offset=0>http://api.gbif.org/v1/occurrence/search?taxonKey=2882316&limit=200001&offset=0
>>>
>>> In [73]:
>>>
>>>
>>> On 02/06/2016 02:22, Scott Chamberlain wrote:
>>>> Fixes to docs and fix to download_get are up on Github
>>>> now. Will push new version to pypi soon. Let me know if
>>>> some thing still don't work for you after reinstalling
>>>> from github
>>>>
>>>> S
>>>>
>>>> On Wed, Jun 1, 2016 at 4:51 PM Nils Hempelmann
>>>> <info at nilshempelmann.de
>>>> <mailto:info at nilshempelmann.de>> wrote:
>>>>
>>>> Hi Scott and Mauro
>>>>
>>>> Mauro sended me a snipped of code which worked fine:
>>>> https://github.com/bird-house/flyingpigeon/blob/develop/scripts/pygbif_occurence.py
>>>>
>>>> I found the example here:
>>>> https://github.com/maurobio/pygbif#occurrence-data
>>>>
>>>> and here:
>>>> https://github.com/sckott/pygbif#occurrences-module
>>>>
>>>> Thanks a lot very great. That enables a lot ;-)
>>>> I ll keep you posted
>>>>
>>>> merci
>>>> Nils
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 02/06/2016 01:45, Scott Chamberlain wrote:
>>>>> And where are those example from exactly? I don't
>>>>> see those examples searching the repo (which
>>>>> includes all docs).
>>>>>
>>>>> `pygbif.name_suggest` wouldn't work because
>>>>> `name_suggest` is a method in the `species`
>>>>> module, so you'd have to do
>>>>> pygbif.species.name_suggest, or `from pygbif
>>>>> import species`, then `species.name_suggest`
>>>>>
>>>>> Looks like occ.get(taxonKey = 252408386)is a
>>>>> documentation bug, that should be `key` instead of
>>>>> `taxonKey`, a copy paste error. will fix that.
>>>>>
>>>>> The `occ.download_get` call has a small bug, will
>>>>> fix that
>>>>>
>>>>> All other calls work for me
>>>>>
>>>>> S
>>>>>
>>>>> On Wed, Jun 1, 2016 at 4:34 PM Scott Chamberlain
>>>>> <myrmecocystus at gmail.com
>>>>> <mailto:myrmecocystus at gmail.com>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> What version of pygbif are you on? And what
>>>>> version of Python>
>>>>>
>>>>> Best, Scott
>>>>>
>>>>> On Wed, Jun 1, 2016 at 4:02 PM Nils Hempelmann
>>>>> <info at nilshempelmann.de
>>>>> <mailto:info at nilshempelmann.de>> wrote:
>>>>>
>>>>> Hi Mauro and Scott
>>>>>
>>>>> Was checking out pygbif. seems to be a
>>>>> very useful tool.
>>>>>
>>>>> Can you help me with the syntax (or
>>>>> forward me to the appropriate person
>>>>> ;-) )
>>>>> The given snippets of code are outdated.
>>>>>
>>>>> I am basically just looking for the
>>>>> occurrence coordinates:
>>>>> here is my first try :
>>>>>
>>>>> import pygbif
>>>>> occ =
>>>>> pygbif.occurrences.search(scientificName='Fagus
>>>>> sylvatica')
>>>>> occ['count']
>>>>>
>>>>> ... and further? ;-)
>>>>>
>>>>> the examples in the docu are throwing errors:
>>>>>
>>>>> key= pygbif.name_suggest(q='Helianthus
>>>>> annuus',rank='species')['key']
>>>>> pygbif.search(taxonKey=key[0]['key'],limit=2)
>>>>>
>>>>> from pygbif import occurrences as occ
>>>>> occ.search(taxonKey = 3329049)
>>>>> occ.get(taxonKey = 252408386)
>>>>> occ.count(isGeoreferenced = True)
>>>>> occ.download_list(user = "sckott", limit
>>>>> = 5)
>>>>> occ.download_meta(key =
>>>>> "0000099-140929101555934")
>>>>> occ.download_get("0000099-140929101555934")
>>>>>
>>>>>
>>>>> Thanks
>>>>> Nils
>>>>>
>>>>
>>>
>>
>>
>>
>
> _______________________________________________
> API-users mailing list
> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org>
> http://lists.gbif.org/mailman/listinfo/api-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gbif.org/pipermail/api-users/attachments/20160606/824e3b28/attachment-0001.html>
More information about the API-users
mailing list