Hi all,
Some improvements to data downloads from GBIF.org are new ready for use:
• The "simple CSV" format download includes four additional columns: verbatimScientificName, stateProvince, individualCount and occurrenceStatus. The first makes it clear when GBIF processing has replaced a scientific name, such as for occurrences using recently published names. The last two help with filtering absence data. Scripts accessing columns by number (rather than header) will probably need to be updated.
• Downloading zip files now supports HTTP Range requests / partial requests. This allows resuming an interrupted download (using CLI tools like curl or wget, or some browser extensions). It may also help when using Google or Amazon's cloud tools.
Due to limitations in the length of a URL, these two changes are only available through API requests, not through the GBIF.org website:
• When using the API, it's now possible for a download query predicate to include up to 101,000 terms — for example, 100,000 taxon keys, catalogue numbers, collection codes or similar. It's best to use the "in" predicate to make these requests:
"predicate": { "type":"in", "key":"CATALOG_NUMBER", "values":["cat1","cat2","cat3"] }
• It's also possible to query using more complex polygons, containing up to 10,000 points. Some polygons are slow to process, so if we regularly reach the capacity of our systems we may review this limit.
If you spot any issues with these changes, please let us know — by mail to this list, through the feedback button on the website, or to the portal-feedback repository on GitHub: https://github.com/gbif/portal-feedback/issues
Thanks,
Matt Blissett