[API-users] Improvements to GBIF data downloads
Matthew Blissett
mblissett at gbif.org
Tue Aug 13 15:27:21 CET 2019
Hi all,
Some improvements to data downloads from GBIF.org are new ready for use:
• The "simple CSV" format download includes four additional columns:
verbatimScientificName, stateProvince, individualCount and
occurrenceStatus. The first makes it clear when GBIF processing has
replaced a scientific name, such as for occurrences using recently
published names. The last two help with filtering absence data.
Scripts accessing columns by number (rather than header) will probably
need to be updated.
• Downloading zip files now supports HTTP Range requests / partial
requests. This allows resuming an interrupted download (using CLI tools
like curl or wget, or some browser extensions). It may also help when
using Google or Amazon's cloud tools.
Due to limitations in the length of a URL, these two changes are only
available through API requests, not through the GBIF.org website:
• When using the API, it's now possible for a download query predicate
to include up to 101,000 terms — for example, 100,000 taxon keys,
catalogue numbers, collection codes or similar. It's best to use the
"in" predicate to make these requests:
"predicate": {
"type":"in",
"key":"CATALOG_NUMBER",
"values":["cat1","cat2","cat3"]
}
• It's also possible to query using more complex polygons, containing up
to 10,000 points. Some polygons are slow to process, so if we regularly
reach the capacity of our systems we may review this limit.
If you spot any issues with these changes, please let us know — by mail
to this list, through the feedback button on the website, or to the
portal-feedback repository on GitHub:
https://github.com/gbif/portal-feedback/issues
Thanks,
Matt Blissett
More information about the API-users
mailing list