Re: [API-users] birdhouse meets GBIF

3 Jun 2016

      Hi Nico,

Of course, I'd rather use something that works already than write my own
code. I'll let you know if I have any questions.

Best, Scott

On Fri, Jun 3, 2016 at 12:41 AM Nicolas Noé <n.noe@biodiversity.be> wrote:
...
Hey Scott,
About the functionality to read Darwin Core dumps: don't hesitate to
interfacee with python-dwca-reader
<https://github.com/BelgianBiodiversityPlatform/python-dwca-reader> to
avoid reinventing the wheel. It's been written exactly for this purpose, is
now rather mature and used by several group of users.
I'm also always willing to put some effort in order to make its
integration with projects like pygbif easier. Just fill a GitHub issue or
send me a mail for any clarification/bug/missing feature.
Best,
Nico
Le 2/06/16 16:17, Scott Chamberlain a écrit :
Nils,
We do have some download API functions in pygbif, and there's a pull
request now to add the methods to request downloads. We'll add some
functionality to read darwin core dumps as well.
Scott
On Thu, Jun 2, 2016 at 6:06 AM Tim Robertson <trobertson@gbif.org> wrote:
...
Hi Nils,
It’s documented here:  http://www.gbif.org/developer/occurrence#download
You POST a JSON doc with the query using HTTP basic authentication.
If you need any help, please say and we can provide some examples using
CURL or python.
I am actually working on a revision to the mapping API, and will see what
may be possible returning distinct locations.  It’s tricky to do in real
time though.
What kind of accuracy do you need please?
Thanks,
Tim
From: Nils Hempelmann <info@nilshempelmann.de>
Date: Thursday 2 June 2016 at 15:00
To: Tim Robertson <trobertson@gbif.org>, "api-users@lists.gbif.org" <
api-users@lists.gbif.org>, "wps-dev@lists.dkrz.de" <wps-dev@lists.dkrz.de
...
Subject: Re: [API-users] birdhouse meets GBIF
Hi Tim
Thanks for the quick answer. If not OGC service, how is the download API
callable from outside of GBIF?
Merci
Nils
On 02/06/2016 14:45, Tim Robertson wrote:
Hi Nils
We don’t have any OGC services, but there is an asynchronous download API
which can deliver CSVs.
Off the top of my head, the only way you can automate this at the moment
would be to do periodically issue a download (e.g. Daily) process as you
see fit, and cache the result for your app.
In the download API you can get any size from 1 - > 660 million records,
which is why it is asynchronous.  It’s used a lot by various applications
and communities.
I hope this helps,
Tim
From: API-users <api-users-bounces@lists.gbif.org> on behalf of Nils
Hempelmann <info@nilshempelmann.de>
Date: Thursday 2 June 2016 at 14:25
To: "api-users@lists.gbif.org" <api-users@lists.gbif.org>, "
wps-dev@lists.dkrz.de" <wps-dev@lists.dkrz.de>
Subject: [API-users] birdhouse meets GBIF
Dear all
Here is an current problem to solve :-) for the birdhouse WPS and GBIF
developers.
There is a species distribution process in the birdhouse WPS, processing
Climate data based on occurrence coordinates from GBIF.
Currently the GBIF data are given to the process via an url of the GBIF
csv (which had to be generated manually before).
I was implementing a direct occurrence search from birdhouse to GBIF with
pygbif. Worked fine BUT: there is a limit of 300 records which is far to
less to train the Species distribution model.
So here the question: Is there a way to request the species occurrence
coordinates somehow directly (e.g. like a WPS request)?
(The previous conversation is following)
And some link for the background informations:
Birdhouse:
http://birdhouse.readthedocs.io/en/latest/
GBIF:
http://www.gbif.org/
Species distribution process:
http://flyingpigeon.readthedocs.io/en/latest/tutorials/sdm.html
Birdhouse architecture:
https://github.com/bird-house/birdhouse-docs/blob/master/slides/birdhouse-ar...
Merci
Nils
-------- Forwarded Message -------
Subject: Re: pygbif for occurrence coordinates
Date: Thu, 2 Jun 2016 09:00:49 -0300
From: Mauro Cavalcanti <maurobio@gmail.com> <maurobio@gmail.com>
<maurobio@gmail.com>
To: Nils Hempelmann <info@nilshempelmann.de> <info@nilshempelmann.de>
<info@nilshempelmann.de>
CC: wps-dev@lists.dkrz.de, Carsten Ehbrecht <ehbrecht@dkrz.de>
<ehbrecht@dkrz.de>, Wolfgang.Falk@lwf.bayern.de, Scott Chamberlain
<myrmecocystus@gmail.com> <myrmecocystus@gmail.com>
<myrmecocystus@gmail.com>
Nils,
No, there is no other way to get at once a larger number of records using
the GBIF API (although one could be able to achieve this by sequentially
querying the server using "batches" of 200 or 300 records each). As I said,
there are good operational reasons for the GBIF developers to have imposed
such limit.
As of your other question, it should better be put to the GBIF developers
themselves (in the discussion list, so that we can all benefit from the
answers! :-))
With warmest regards,
--
Dr. Mauro J. Cavalcanti
E-mail: maurobio@gmail.com
Web: http://sites.google.com/site/maurobio
Em 02/06/2016 08:49, "Nils Hempelmann" <info@nilshempelmann.de> escreveu:
...
Hi Mauro
Oh ...
No way to set it unlimited?
The 'manual via browser' option and parsing the returning csv is the
current status.
or any alternative to pygbif?
If I understood it correctly, GBIF database is organized as Web Server,
so I gues there should be a way to connect to the birdhouse WPS, am I
right?
(put the web-dev list in copy)
Merci
Nils
On 02/06/2016 12:40, Mauro Cavalcanti wrote:
Nils,
That's a limit imposed (for good operational reasons) by the GBIF API.
If you want a larger number of records, you'll have to download them
"manually" (that is, via browser) and then parse locally the csv file
returned from the GBIF server.
Hope this helps.
Best regards,
--
Dr. Mauro J. Cavalcanti
E-mail: maurobio@gmail.com
Web: http://sites.google.com/site/maurobio
Em 02/06/2016 04:48, "Nils Hempelmann" <info@nilshempelmann.de>
escreveu:
...
Hi Scott
works fine. Thanks a lot. Just have a question for the search limits:
The maximal records seems to be limited to 300. Is that on purpose?
And requesting more than 200000 gives a 'bad request'
Merci
Nils
In [68]: len( occurrences.search(taxonKey=key, limit=100)['results'])
Out[68]: 100
In [69]: len( occurrences.search(taxonKey=key, limit=300)['results'])
Out[69]: 300
In [70]: len( occurrences.search(taxonKey=key, limit=3000)['results'])
Out[70]: 300
In [71]: len( occurrences.search(taxonKey=key, limit=200000)['results'])
Out[71]: 300
In [72]: len( occurrences.search(taxonKey=key, limit=200001)['results'])
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call
last)
<ipython-input-72-2f7d7b4ccba0> in <module>()
----> 1 len( occurrences.search(taxonKey=key, limit=200001)['results'])
/home/nils/.conda/envs/birdhouse/lib/python2.7/site-packages/pygbif/occurrences/search.pyc
in search(taxonKey, scientificName, country, publishingCountry,
hasCoordinate, typeStatus, recordNumber, lastInterpreted, continent,
geometry, recordedBy, basisOfRecord, datasetKey, eventDate, catalogNumber,
year, month, decimalLatitude, decimalLongitude, elevation, depth,
institutionCode, collectionCode, hasGeospatialIssue, issue, q, mediatype,
limit, offset, **kwargs)
    251         'collectionCode': collectionCode, 'hasGeospatialIssue':
hasGeospatialIssue,
    252         'issue': issue, 'q': q, 'mediatype': mediatype,
'limit': limit,
--> 253         'offset': offset}, **kwargs)
    254     return out
/home/nils/.conda/envs/birdhouse/lib/python2.7/site-packages/pygbif/gbifutils.pyc
in gbif_GET(url, args, **kwargs)
     17 def gbif_GET(url, args, **kwargs):
     18   out = requests.get(url, params=args, headers=make_ua(),
**kwargs)
---> 19   out.raise_for_status()
     20   stopifnot(out.headers['content-type'])
     21   return out.json()
/home/nils/.conda/envs/birdhouse/lib/python2.7/site-packages/requests/models.pyc
in raise_for_status(self)
    842
    843         if http_error_msg:
--> 844             raise HTTPError(http_error_msg, response=self)
    845
    846     def close(self):
HTTPError: 400 Client Error: Bad Request for url:
<http://api.gbif.org/v1/occurrence/search?taxonKey=2882316&limit=200001&offset=0>
http://api.gbif.org/v1/occurrence/search?taxonKey=2882316&limit=200001&offse...
In [73]:
On 02/06/2016 02:22, Scott Chamberlain wrote:
Fixes to docs and fix to download_get are up on Github now. Will push
new version to pypi soon. Let me know if some thing still don't work for
you after reinstalling from github
S
On Wed, Jun 1, 2016 at 4:51 PM Nils Hempelmann <info@nilshempelmann.de>
wrote:
...
Hi Scott and Mauro
Mauro sended me a snipped of code which worked fine:
https://github.com/bird-house/flyingpigeon/blob/develop/scripts/pygbif_occur...
I found the example here:
https://github.com/maurobio/pygbif#occurrence-data
and here:
https://github.com/sckott/pygbif#occurrences-module
Thanks a lot very great. That enables a lot ;-)
I ll keep you posted
merci
Nils
On 02/06/2016 01:45, Scott Chamberlain wrote:
And where are those example from exactly? I don't see those examples
searching the repo (which includes all docs).
`pygbif.name_suggest` wouldn't work because `name_suggest` is a method
in the `species` module, so you'd have to do pygbif.species.name_suggest,
or `from pygbif import species`, then `species.name_suggest`
Looks like occ.get(taxonKey = 252408386)  is a documentation bug,
that should be `key` instead of `taxonKey`, a copy paste error. will fix
that.
The `occ.download_get` call has a small bug, will fix that
All other calls work for me
S
On Wed, Jun 1, 2016 at 4:34 PM Scott Chamberlain <
myrmecocystus@gmail.com> wrote:
...
Hi,
What version of pygbif are you on? And what version of Python>
Best, Scott
On Wed, Jun 1, 2016 at 4:02 PM Nils Hempelmann <
info@nilshempelmann.de> wrote:
> Hi Mauro and Scott
>
> Was checking out pygbif. seems to be a very useful tool.
>
> Can you help me with the syntax (or forward me to the appropriate
> person
> ;-) )
> The given snippets of code are outdated.
>
> I am basically just looking for the occurrence coordinates:
> here is my first try :
>
> import pygbif
> occ = pygbif.occurrences.search(scientificName='Fagus sylvatica')
> occ['count']
>
> ... and further? ;-)
>
> the examples in the docu are throwing errors:
>
> key=  pygbif.name_suggest(q='Helianthus
> annuus',rank='species')['key']
> pygbif.search(taxonKey=key[0]['key'],limit=2)
>
> from pygbif import occurrences as occ occ.search(taxonKey = 3329049)
> occ.get(taxonKey = 252408386) occ.count(isGeoreferenced = True)
> occ.download_list(user  =  "sckott",  limit  =  5)
> occ.download_meta(key  =  "0000099-140929101555934")
> occ.download_get("0000099-140929101555934")
>
>
> Thanks
> Nils
>
>
_______________________________________________
API-users mailing list
API-users@lists.gbif.org
http://lists.gbif.org/mailman/listinfo/api-users
_______________________________________________
API-users mailing listAPI-users@lists.gbif.orghttp://lists.gbif.org/mailman/listinfo/api-users