[API-users] Requesting Occurrence Data for Large List of Species

Benjamin Feinsilver benjamin.feinsilver at gmail.com
Wed Apr 17 18:08:04 CEST 2019


Thanks, Matt. I did end up running the queries last weekend in 50-species
chunks using the Python Requests library. The whole job (60 chunks) took
about 10 hours.

On Wed, Apr 17, 2019 at 9:32 AM Matthew Blissett <mblissett at gbif.org> wrote:

> Hi Ben,
>
> We've been able to make some changes to our download system, which has
> increased the limit to beyond 300 species.  The actual limit is unclear,
> since it depends on the length of the query in characters.  (It also runs
> particularly slowly.)
>
> I've rerun your two failed downloads, you should have received an email
> notification for each of them.  I can see this is probably too late, and
> you've already worked on splitting to multiple download -- apologies for
> the delay here.
>
> Cheers,
>
> Matt
> On 08/04/2019 21:06, Benjamin Feinsilver wrote:
>
> Thanks, Tim. I'll take another stab at it this week if I have time. I'm
> hesitant to try the wider search approach because the list of plant species
> I have is pretty diverse and I don't think it could conveniently be split
> into a few taxonomic groups. I don't think it would make sense to try to
> download all 250M plant occurrences at the kingdom level either.
>
> On Mon, Apr 8, 2019 at 3:23 AM Tim Robertson <trobertson at gbif.org> wrote:
>
>> Hi Ben,
>>
>>
>>
>> Thanks. Apparently even 300 is too long.
>>
>>
>>
>> For background info the issues related to 1) limits on length allowed for
>> HTTP GET (internally there is a GET call) and 2) the workflow engine
>> managing the context for the download imposes a limit.
>>
>> Being an asynchronous service, if you polled the API you’d also see the
>> error.
>>
>>
>>
>> I’m afraid you either need to reduce the size, or take the approach I
>> suggested of a wider search (e.g. a higher taxon) and then post filtering.
>>
>>
>>
>> I hope this helps.
>>
>>
>>
>> Thanks,
>>
>> Tim
>>
>>
>>
>>
>>
>> *From: *Benjamin Feinsilver <benjamin.feinsilver at gmail.com>
>> *Date: *Monday, 8 April 2019 at 05.07
>> *To: *Tim Robertson <trobertson at gbif.org>
>> *Cc: *"api-users at lists.gbif.org" <api-users at lists.gbif.org>
>> *Subject: *Re: [API-users] Requesting Occurrence Data for Large List of
>> Species
>>
>>
>>
>> Hi Tim,
>>
>>
>>
>> I received an error message (via email) when attempting to post 300 taxon
>> keys:
>>
>>
>>
>> "We are sorry, but an error has occurred processing your download."
>>
>>
>>
>> Please see attached query file.
>>
>>
>>
>> Curl command:
>>
>>
>>
>> curl --include --user username:password --header "Content-Type:
>> application/json" --data @query_1.json
>> http://api.gbif.org/v1/occurrence/download/request
>>
>>
>>
>> I received a HTTP status code "201 Created."
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Ben
>>
>>
>>
>> On Wed, Apr 3, 2019 at 3:52 AM Tim Robertson <trobertson at gbif.org> wrote:
>>
>> Hi Benjamin,
>>
>>
>>
>> Download will be best.
>>
>>
>>
>> However, there are limits and you will not be able to push 3000 in.
>>
>> You could either split it into groups of e.g. 300, or use a higher taxon
>> and then implement a post-filter to throw away those not in your list (the
>> latter is how I would do it).
>>
>>
>>
>> I am sorry for this nuisance, and this is a known issue that we do aim to
>> address: https://github.com/gbif/portal-feedback/issues/1768
>>
>>
>>
>> Thanks,
>>
>> Tim
>>
>>
>>
>>
>>
>> *From: *API-users <api-users-bounces at lists.gbif.org> on behalf of
>> Benjamin Feinsilver <benjamin.feinsilver at gmail.com>
>> *Date: *Wednesday, 3 April 2019 at 09.33
>> *To: *"api-users at lists.gbif.org" <api-users at lists.gbif.org>
>> *Subject: *[API-users] Requesting Occurrence Data for Large List of
>> Species
>>
>>
>>
>> Hello,
>>
>>
>>
>> If I have a list of around 3,000 species, and I would like to request
>> occurrence data for each species, is it more efficient to use the Search or
>> Download API?
>>
>>
>>
>> If using the Download API, could I include the list of species in an
>> external query file and use the "in" predicate? For example:
>>
>>
>>
>> {
>>   "creator":"userName",
>>   "notification_address": ["userName at example.org"],
>>   "predicate":
>>   {
>>     "type":"in",
>>     "key":"SCIENTIFIC_NAME",
>>     "values":["cat1","cat2","cat3"]
>>   }
>> }
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Ben
>>
>>
> _______________________________________________
> API-users mailing listAPI-users at lists.gbif.orghttps://lists.gbif.org/mailman/listinfo/api-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.gbif.org/pipermail/api-users/attachments/20190417/242f2248/attachment-0001.html>


More information about the API-users mailing list