[API-users] Requesting Occurrence Data for Large List of Species

Wed Apr 17 15:32:41 CEST 2019

Hi Ben,

We've been able to make some changes to our download system, which has increased the limit to beyond 300 species.  The actual limit is unclear, since it depends on the length of the query in characters.  (It also runs particularly slowly.)

I've rerun your two failed downloads, you should have received an email notification for each of them.  I can see this is probably too late, and you've already worked on splitting to multiple download -- apologies for the delay here.

Cheers,

Matt

On 08/04/2019 21:06, Benjamin Feinsilver wrote:
Thanks, Tim. I'll take another stab at it this week if I have time. I'm hesitant to try the wider search approach because the list of plant species I have is pretty diverse and I don't think it could conveniently be split into a few taxonomic groups. I don't think it would make sense to try to download all 250M plant occurrences at the kingdom level either.

On Mon, Apr 8, 2019 at 3:23 AM Tim Robertson <trobertson at gbif.org<mailto:trobertson at gbif.org>> wrote:
Hi Ben,

Thanks. Apparently even 300 is too long.

For background info the issues related to 1) limits on length allowed for HTTP GET (internally there is a GET call) and 2) the workflow engine managing the context for the download imposes a limit.
Being an asynchronous service, if you polled the API you’d also see the error.

I’m afraid you either need to reduce the size, or take the approach I suggested of a wider search (e.g. a higher taxon) and then post filtering.

I hope this helps.

Thanks,
Tim

From: Benjamin Feinsilver <benjamin.feinsilver at gmail.com<mailto:benjamin.feinsilver at gmail.com>>
Date: Monday, 8 April 2019 at 05.07
To: Tim Robertson <trobertson at gbif.org<mailto:trobertson at gbif.org>>
Cc: "api-users at lists.gbif.org<mailto:api-users at lists.gbif.org>" <api-users at lists.gbif.org<mailto:api-users at lists.gbif.org>>
Subject: Re: [API-users] Requesting Occurrence Data for Large List of Species

Hi Tim,

I received an error message (via email) when attempting to post 300 taxon keys:

"We are sorry, but an error has occurred processing your download."

Please see attached query file.

Curl command:

curl --include --user username:password --header "Content-Type: application/json" --data @query_1.json http://api.gbif.org/v1/occurrence/download/request

I received a HTTP status code "201 Created."

Thanks,

Ben

On Wed, Apr 3, 2019 at 3:52 AM Tim Robertson <trobertson at gbif.org<mailto:trobertson at gbif.org>> wrote:
Hi Benjamin,

Download will be best.

However, there are limits and you will not be able to push 3000 in.
You could either split it into groups of e.g. 300, or use a higher taxon and then implement a post-filter to throw away those not in your list (the latter is how I would do it).

I am sorry for this nuisance, and this is a known issue that we do aim to address: https://github.com/gbif/portal-feedback/issues/1768

Thanks,
Tim

From: API-users <api-users-bounces at lists.gbif.org<mailto:api-users-bounces at lists.gbif.org>> on behalf of Benjamin Feinsilver <benjamin.feinsilver at gmail.com<mailto:benjamin.feinsilver at gmail.com>>
Date: Wednesday, 3 April 2019 at 09.33
To: "api-users at lists.gbif.org<mailto:api-users at lists.gbif.org>" <api-users at lists.gbif.org<mailto:api-users at lists.gbif.org>>
Subject: [API-users] Requesting Occurrence Data for Large List of Species

Hello,

If I have a list of around 3,000 species, and I would like to request occurrence data for each species, is it more efficient to use the Search or Download API?

If using the Download API, could I include the list of species in an external query file and use the "in" predicate? For example:

{
  "creator":"userName",
  "notification_address": ["userName at example.org<mailto:userName at example.org>"],
  "predicate":
  {
    "type":"in",
    "key":"SCIENTIFIC_NAME",
    "values":["cat1","cat2","cat3"]
  }
}

Thanks,

Ben

_______________________________________________
API-users mailing list
API-users at lists.gbif.org<mailto:API-users at lists.gbif.org>
https://lists.gbif.org/mailman/listinfo/api-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.gbif.org/pipermail/api-users/attachments/20190417/90eedd56/attachment.html>