Hi all,
Some improvements to data downloads from GBIF.org are new ready for use:
• The "simple CSV" format download includes four additional columns:
verbatimScientificName, stateProvince, individualCount and
occurrenceStatus. The first makes it clear when GBIF processing has
replaced a scientific name, such as for occurrences using recently
published names. The last two help with filtering absence data.
Scripts accessing columns by number (rather than header) will probably
need to be updated.
• Downloading zip files now supports HTTP Range requests / partial
requests. This allows resuming an interrupted download (using CLI tools
like curl or wget, or some browser extensions). It may also help when
using Google or Amazon's cloud tools.
Due to limitations in the length of a URL, these two changes are only
available through API requests, not through the GBIF.org website:
• When using the API, it's now possible for a download query predicate
to include up to 101,000 terms — for example, 100,000 taxon keys,
catalogue numbers, collection codes or similar. It's best to use the
"in" predicate to make these requests:
"predicate": {
"type":"in",
"key":"CATALOG_NUMBER",
"values":["cat1","cat2","cat3"]
}
• It's also possible to query using more complex polygons, containing up
to 10,000 points. Some polygons are slow to process, so if we regularly
reach the capacity of our systems we may review this limit.
If you spot any issues with these changes, please let us know — by mail
to this list, through the feedback button on the website, or to the
portal-feedback repository on GitHub:
https://github.com/gbif/portal-feedback/issues
Thanks,
Matt Blissett
Dear GBIF community
Please be aware of this communication from DataCite which will likely cause some disruption on GBIF services with relation to DOI issuing.
Many thanks,
Tim
From: Martin Fenner <martin.fenner(a)datacite.org>
Reply-To: "allusers(a)datacite.org" <allusers(a)datacite.org>
Date: Friday, 2 August 2019 at 08.22
To: DataCite Allmembers <allmembers(a)datacite.org>, DataCite Allusers <allusers(a)datacite.org>
Subject: [datacite-allusers] Delays in DOI registrations and updates
Dear DataCite members and users,
processing of DOI registrations is currently delayed by up to 12 hours, affecting registration in the handle system and inclusion in the search index. This is caused by an unusually very large number of maintenance jobs running in our system, and we expect this to be resolved by the end of the day. This is a delay in processing, all DOI registrations and updates are successfully captured by our system. Requesting or searching DOI metadata is not affected. Please check https://status.datacite.org for updated status information, or subscribe to email updates at this site.
The reason for the very large number of maintenance jobs is two important updates to better handle malformed dates in DOI metadata and to prepare for affiliation identifier support in the upcoming schema 4.3. The load on the system for these two tasks is much higher than expected, causing the delays described above. I sincerely apologize for the inconvenience this is causing, and going forward we will be better in giving advance notice with scheduled maintenances for situations like this.
What is happening today is unrelated to the service incident we reported on July 24 (https://status.datacite.org/incidents/d00gnm05ps3k). That incident was caused by a very high number (> 1 million in 24 hours) of DOI registration requests by a single client, temporarily overloading the system. To prevent this from happening in the future, we have this Monday implemented rate-limiting of 3,000 requests per IP address within 5 min, which translates to 10 requests per second or 36,000 requests per hour per IP address. Please reach out to support(a)datacite.org<mailto:support@datacite.org> if you need higher rate-limits temporarily or generally, and we can adjust the rate-limits for your IP addresses. This is the first time we implemented rate-limiting, but we had to take this step to make sure DOI registrations for all users are not affected by high numbers of requests by a single user. We have also increased the number of servers and improved monitoring to better handle these kinds of situations in the future.
Please reach out to me via support(a)datacite.org<mailto:support@datacite.org> if you have any comments or feedback.
Kind regards,
Martin Fenner
--
Martin Fenner
DataCite Technical Director
http://orcid.org/0000-0003-1419-2405
--
You received this message because you are subscribed to the Google Groups "DataCite Allusers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to allusers+unsubscribe(a)datacite.org<mailto:allusers+unsubscribe@datacite.org>.
To view this discussion on the web visit https://groups.google.com/a/datacite.org/d/msgid/allusers/CAG9q6AbF31-QENqc…<https://groups.google.com/a/datacite.org/d/msgid/allusers/CAG9q6AbF31-QENqc…>.