Requesting Occurrence Data for Large List of Species
Hello,
If I have a list of around 3,000 species, and I would like to request occurrence data for each species, is it more efficient to use the Search or Download API?
If using the Download API, could I include the list of species in an external query file and use the "in" predicate? For example:
{ "creator":"userName", "notification_address": ["userName@example.org"], "predicate": { "type":"in", "key":"SCIENTIFIC_NAME", "values":["cat1","cat2","cat3"] } }
Thanks,
Ben
Hi Benjamin,
Download will be best.
However, there are limits and you will not be able to push 3000 in. You could either split it into groups of e.g. 300, or use a higher taxon and then implement a post-filter to throw away those not in your list (the latter is how I would do it).
I am sorry for this nuisance, and this is a known issue that we do aim to address: https://github.com/gbif/portal-feedback/issues/1768
Thanks, Tim
From: API-users api-users-bounces@lists.gbif.org on behalf of Benjamin Feinsilver benjamin.feinsilver@gmail.com Date: Wednesday, 3 April 2019 at 09.33 To: "api-users@lists.gbif.org" api-users@lists.gbif.org Subject: [API-users] Requesting Occurrence Data for Large List of Species
Hello,
If I have a list of around 3,000 species, and I would like to request occurrence data for each species, is it more efficient to use the Search or Download API?
If using the Download API, could I include the list of species in an external query file and use the "in" predicate? For example:
{ "creator":"userName", "notification_address": ["userName@example.orgmailto:userName@example.org"], "predicate": { "type":"in", "key":"SCIENTIFIC_NAME", "values":["cat1","cat2","cat3"] } }
Thanks,
Ben
Hi Tim,
I received an error message (via email) when attempting to post 300 taxon keys:
"We are sorry, but an error has occurred processing your download."
Please see attached query file.
Curl command:
curl --include --user username:password --header "Content-Type: application/json" --data @query_1.json http://api.gbif.org/v1/occurrence/download/request
I received a HTTP status code "201 Created."
Thanks,
Ben
On Wed, Apr 3, 2019 at 3:52 AM Tim Robertson trobertson@gbif.org wrote:
Hi Benjamin,
Download will be best.
However, there are limits and you will not be able to push 3000 in.
You could either split it into groups of e.g. 300, or use a higher taxon and then implement a post-filter to throw away those not in your list (the latter is how I would do it).
I am sorry for this nuisance, and this is a known issue that we do aim to address: https://github.com/gbif/portal-feedback/issues/1768
Thanks,
Tim
*From: *API-users api-users-bounces@lists.gbif.org on behalf of Benjamin Feinsilver benjamin.feinsilver@gmail.com *Date: *Wednesday, 3 April 2019 at 09.33 *To: *"api-users@lists.gbif.org" api-users@lists.gbif.org *Subject: *[API-users] Requesting Occurrence Data for Large List of Species
Hello,
If I have a list of around 3,000 species, and I would like to request occurrence data for each species, is it more efficient to use the Search or Download API?
If using the Download API, could I include the list of species in an external query file and use the "in" predicate? For example:
{ "creator":"userName", "notification_address": ["userName@example.org"], "predicate": { "type":"in", "key":"SCIENTIFIC_NAME", "values":["cat1","cat2","cat3"] } }
Thanks,
Ben
Hi Ben,
Thanks. Apparently even 300 is too long.
For background info the issues related to 1) limits on length allowed for HTTP GET (internally there is a GET call) and 2) the workflow engine managing the context for the download imposes a limit. Being an asynchronous service, if you polled the API you’d also see the error.
I’m afraid you either need to reduce the size, or take the approach I suggested of a wider search (e.g. a higher taxon) and then post filtering.
I hope this helps.
Thanks, Tim
From: Benjamin Feinsilver benjamin.feinsilver@gmail.com Date: Monday, 8 April 2019 at 05.07 To: Tim Robertson trobertson@gbif.org Cc: "api-users@lists.gbif.org" api-users@lists.gbif.org Subject: Re: [API-users] Requesting Occurrence Data for Large List of Species
Hi Tim,
I received an error message (via email) when attempting to post 300 taxon keys:
"We are sorry, but an error has occurred processing your download."
Please see attached query file.
Curl command:
curl --include --user username:password --header "Content-Type: application/json" --data @query_1.json http://api.gbif.org/v1/occurrence/download/request
I received a HTTP status code "201 Created."
Thanks,
Ben
On Wed, Apr 3, 2019 at 3:52 AM Tim Robertson <trobertson@gbif.orgmailto:trobertson@gbif.org> wrote: Hi Benjamin,
Download will be best.
However, there are limits and you will not be able to push 3000 in. You could either split it into groups of e.g. 300, or use a higher taxon and then implement a post-filter to throw away those not in your list (the latter is how I would do it).
I am sorry for this nuisance, and this is a known issue that we do aim to address: https://github.com/gbif/portal-feedback/issues/1768
Thanks, Tim
From: API-users <api-users-bounces@lists.gbif.orgmailto:api-users-bounces@lists.gbif.org> on behalf of Benjamin Feinsilver <benjamin.feinsilver@gmail.commailto:benjamin.feinsilver@gmail.com> Date: Wednesday, 3 April 2019 at 09.33 To: "api-users@lists.gbif.orgmailto:api-users@lists.gbif.org" <api-users@lists.gbif.orgmailto:api-users@lists.gbif.org> Subject: [API-users] Requesting Occurrence Data for Large List of Species
Hello,
If I have a list of around 3,000 species, and I would like to request occurrence data for each species, is it more efficient to use the Search or Download API?
If using the Download API, could I include the list of species in an external query file and use the "in" predicate? For example:
{ "creator":"userName", "notification_address": ["userName@example.orgmailto:userName@example.org"], "predicate": { "type":"in", "key":"SCIENTIFIC_NAME", "values":["cat1","cat2","cat3"] } }
Thanks,
Ben
Thanks, Tim. I'll take another stab at it this week if I have time. I'm hesitant to try the wider search approach because the list of plant species I have is pretty diverse and I don't think it could conveniently be split into a few taxonomic groups. I don't think it would make sense to try to download all 250M plant occurrences at the kingdom level either.
On Mon, Apr 8, 2019 at 3:23 AM Tim Robertson trobertson@gbif.org wrote:
Hi Ben,
Thanks. Apparently even 300 is too long.
For background info the issues related to 1) limits on length allowed for HTTP GET (internally there is a GET call) and 2) the workflow engine managing the context for the download imposes a limit.
Being an asynchronous service, if you polled the API you’d also see the error.
I’m afraid you either need to reduce the size, or take the approach I suggested of a wider search (e.g. a higher taxon) and then post filtering.
I hope this helps.
Thanks,
Tim
*From: *Benjamin Feinsilver benjamin.feinsilver@gmail.com *Date: *Monday, 8 April 2019 at 05.07 *To: *Tim Robertson trobertson@gbif.org *Cc: *"api-users@lists.gbif.org" api-users@lists.gbif.org *Subject: *Re: [API-users] Requesting Occurrence Data for Large List of Species
Hi Tim,
I received an error message (via email) when attempting to post 300 taxon keys:
"We are sorry, but an error has occurred processing your download."
Please see attached query file.
Curl command:
curl --include --user username:password --header "Content-Type: application/json" --data @query_1.json http://api.gbif.org/v1/occurrence/download/request
I received a HTTP status code "201 Created."
Thanks,
Ben
On Wed, Apr 3, 2019 at 3:52 AM Tim Robertson trobertson@gbif.org wrote:
Hi Benjamin,
Download will be best.
However, there are limits and you will not be able to push 3000 in.
You could either split it into groups of e.g. 300, or use a higher taxon and then implement a post-filter to throw away those not in your list (the latter is how I would do it).
I am sorry for this nuisance, and this is a known issue that we do aim to address: https://github.com/gbif/portal-feedback/issues/1768
Thanks,
Tim
*From: *API-users api-users-bounces@lists.gbif.org on behalf of Benjamin Feinsilver benjamin.feinsilver@gmail.com *Date: *Wednesday, 3 April 2019 at 09.33 *To: *"api-users@lists.gbif.org" api-users@lists.gbif.org *Subject: *[API-users] Requesting Occurrence Data for Large List of Species
Hello,
If I have a list of around 3,000 species, and I would like to request occurrence data for each species, is it more efficient to use the Search or Download API?
If using the Download API, could I include the list of species in an external query file and use the "in" predicate? For example:
{ "creator":"userName", "notification_address": ["userName@example.org"], "predicate": { "type":"in", "key":"SCIENTIFIC_NAME", "values":["cat1","cat2","cat3"] } }
Thanks,
Ben
Hi Ben,
We've been able to make some changes to our download system, which has increased the limit to beyond 300 species. The actual limit is unclear, since it depends on the length of the query in characters. (It also runs particularly slowly.)
I've rerun your two failed downloads, you should have received an email notification for each of them. I can see this is probably too late, and you've already worked on splitting to multiple download -- apologies for the delay here.
Cheers,
Matt
On 08/04/2019 21:06, Benjamin Feinsilver wrote: Thanks, Tim. I'll take another stab at it this week if I have time. I'm hesitant to try the wider search approach because the list of plant species I have is pretty diverse and I don't think it could conveniently be split into a few taxonomic groups. I don't think it would make sense to try to download all 250M plant occurrences at the kingdom level either.
On Mon, Apr 8, 2019 at 3:23 AM Tim Robertson <trobertson@gbif.orgmailto:trobertson@gbif.org> wrote: Hi Ben,
Thanks. Apparently even 300 is too long.
For background info the issues related to 1) limits on length allowed for HTTP GET (internally there is a GET call) and 2) the workflow engine managing the context for the download imposes a limit. Being an asynchronous service, if you polled the API you’d also see the error.
I’m afraid you either need to reduce the size, or take the approach I suggested of a wider search (e.g. a higher taxon) and then post filtering.
I hope this helps.
Thanks, Tim
From: Benjamin Feinsilver <benjamin.feinsilver@gmail.commailto:benjamin.feinsilver@gmail.com> Date: Monday, 8 April 2019 at 05.07 To: Tim Robertson <trobertson@gbif.orgmailto:trobertson@gbif.org> Cc: "api-users@lists.gbif.orgmailto:api-users@lists.gbif.org" <api-users@lists.gbif.orgmailto:api-users@lists.gbif.org> Subject: Re: [API-users] Requesting Occurrence Data for Large List of Species
Hi Tim,
I received an error message (via email) when attempting to post 300 taxon keys:
"We are sorry, but an error has occurred processing your download."
Please see attached query file.
Curl command:
curl --include --user username:password --header "Content-Type: application/json" --data @query_1.json http://api.gbif.org/v1/occurrence/download/request
I received a HTTP status code "201 Created."
Thanks,
Ben
On Wed, Apr 3, 2019 at 3:52 AM Tim Robertson <trobertson@gbif.orgmailto:trobertson@gbif.org> wrote: Hi Benjamin,
Download will be best.
However, there are limits and you will not be able to push 3000 in. You could either split it into groups of e.g. 300, or use a higher taxon and then implement a post-filter to throw away those not in your list (the latter is how I would do it).
I am sorry for this nuisance, and this is a known issue that we do aim to address: https://github.com/gbif/portal-feedback/issues/1768
Thanks, Tim
From: API-users <api-users-bounces@lists.gbif.orgmailto:api-users-bounces@lists.gbif.org> on behalf of Benjamin Feinsilver <benjamin.feinsilver@gmail.commailto:benjamin.feinsilver@gmail.com> Date: Wednesday, 3 April 2019 at 09.33 To: "api-users@lists.gbif.orgmailto:api-users@lists.gbif.org" <api-users@lists.gbif.orgmailto:api-users@lists.gbif.org> Subject: [API-users] Requesting Occurrence Data for Large List of Species
Hello,
If I have a list of around 3,000 species, and I would like to request occurrence data for each species, is it more efficient to use the Search or Download API?
If using the Download API, could I include the list of species in an external query file and use the "in" predicate? For example:
{ "creator":"userName", "notification_address": ["userName@example.orgmailto:userName@example.org"], "predicate": { "type":"in", "key":"SCIENTIFIC_NAME", "values":["cat1","cat2","cat3"] } }
Thanks,
Ben
_______________________________________________ API-users mailing list API-users@lists.gbif.orgmailto:API-users@lists.gbif.org https://lists.gbif.org/mailman/listinfo/api-users
Thanks, Matt. I did end up running the queries last weekend in 50-species chunks using the Python Requests library. The whole job (60 chunks) took about 10 hours.
On Wed, Apr 17, 2019 at 9:32 AM Matthew Blissett mblissett@gbif.org wrote:
Hi Ben,
We've been able to make some changes to our download system, which has increased the limit to beyond 300 species. The actual limit is unclear, since it depends on the length of the query in characters. (It also runs particularly slowly.)
I've rerun your two failed downloads, you should have received an email notification for each of them. I can see this is probably too late, and you've already worked on splitting to multiple download -- apologies for the delay here.
Cheers,
Matt On 08/04/2019 21:06, Benjamin Feinsilver wrote:
Thanks, Tim. I'll take another stab at it this week if I have time. I'm hesitant to try the wider search approach because the list of plant species I have is pretty diverse and I don't think it could conveniently be split into a few taxonomic groups. I don't think it would make sense to try to download all 250M plant occurrences at the kingdom level either.
On Mon, Apr 8, 2019 at 3:23 AM Tim Robertson trobertson@gbif.org wrote:
Hi Ben,
Thanks. Apparently even 300 is too long.
For background info the issues related to 1) limits on length allowed for HTTP GET (internally there is a GET call) and 2) the workflow engine managing the context for the download imposes a limit.
Being an asynchronous service, if you polled the API you’d also see the error.
I’m afraid you either need to reduce the size, or take the approach I suggested of a wider search (e.g. a higher taxon) and then post filtering.
I hope this helps.
Thanks,
Tim
*From: *Benjamin Feinsilver benjamin.feinsilver@gmail.com *Date: *Monday, 8 April 2019 at 05.07 *To: *Tim Robertson trobertson@gbif.org *Cc: *"api-users@lists.gbif.org" api-users@lists.gbif.org *Subject: *Re: [API-users] Requesting Occurrence Data for Large List of Species
Hi Tim,
I received an error message (via email) when attempting to post 300 taxon keys:
"We are sorry, but an error has occurred processing your download."
Please see attached query file.
Curl command:
curl --include --user username:password --header "Content-Type: application/json" --data @query_1.json http://api.gbif.org/v1/occurrence/download/request
I received a HTTP status code "201 Created."
Thanks,
Ben
On Wed, Apr 3, 2019 at 3:52 AM Tim Robertson trobertson@gbif.org wrote:
Hi Benjamin,
Download will be best.
However, there are limits and you will not be able to push 3000 in.
You could either split it into groups of e.g. 300, or use a higher taxon and then implement a post-filter to throw away those not in your list (the latter is how I would do it).
I am sorry for this nuisance, and this is a known issue that we do aim to address: https://github.com/gbif/portal-feedback/issues/1768
Thanks,
Tim
*From: *API-users api-users-bounces@lists.gbif.org on behalf of Benjamin Feinsilver benjamin.feinsilver@gmail.com *Date: *Wednesday, 3 April 2019 at 09.33 *To: *"api-users@lists.gbif.org" api-users@lists.gbif.org *Subject: *[API-users] Requesting Occurrence Data for Large List of Species
Hello,
If I have a list of around 3,000 species, and I would like to request occurrence data for each species, is it more efficient to use the Search or Download API?
If using the Download API, could I include the list of species in an external query file and use the "in" predicate? For example:
{ "creator":"userName", "notification_address": ["userName@example.org"], "predicate": { "type":"in", "key":"SCIENTIFIC_NAME", "values":["cat1","cat2","cat3"] } }
Thanks,
Ben
API-users mailing listAPI-users@lists.gbif.orghttps://lists.gbif.org/mailman/listinfo/api-users
participants (3)
-
Benjamin Feinsilver
-
Matthew Blissett
-
Tim Robertson