[API-users] Improvements to GBIF data downloads

Tue Aug 13 15:27:21 CET 2019

Hi all,

Some improvements to data downloads from GBIF.org are new ready for use:

• The "simple CSV" format download includes four additional columns: 
verbatimScientificName, stateProvince, individualCount and 
occurrenceStatus.  The first makes it clear when GBIF processing has 
replaced a scientific name, such as for occurrences using recently 
published names.  The last two help with filtering absence data.  
Scripts accessing columns by number (rather than header) will probably 
need to be updated.

• Downloading zip files now supports HTTP Range requests / partial 
requests.  This allows resuming an interrupted download (using CLI tools 
like curl or wget, or some browser extensions). It may also help when 
using Google or Amazon's cloud tools.

Due to limitations in the length of a URL, these two changes are only 
available through API requests, not through the GBIF.org website:

• When using the API, it's now possible for a download query predicate 
to include up to 101,000 terms — for example, 100,000 taxon keys, 
catalogue numbers, collection codes or similar.  It's best to use the 
"in" predicate to make these requests:

   "predicate": {
     "type":"in",
     "key":"CATALOG_NUMBER",
     "values":["cat1","cat2","cat3"]
   }

• It's also possible to query using more complex polygons, containing up 
to 10,000 points.  Some polygons are slow to process, so if we regularly 
reach the capacity of our systems we may review this limit.

If you spot any issues with these changes, please let us know — by mail 
to this list, through the feedback button on the website, or to the 
portal-feedback repository on GitHub: 
https://github.com/gbif/portal-feedback/issues

Thanks,

Matt Blissett