See the parts I extracted from this example request:
{
"key": "10734a60-7ed1-11df-8c4a-0800200c9a66",
"installationKey": "86e4d50b-d77c-4731-99fb-b3e2a2a83163",
"publishingOrganizationKey": "def87a70-0837-11d9-acb2-b8a03c50a862",
(...)
"lockedForAutoUpdate": false,
"created": "2010-05-03T22:02:18.000+0000",
"modified": "2017-01-19T18:16:28.844+0000",
(...)
"machineTags": [
{
"key": 606300,
"name": "crawl_attempt",
"value": "45",
"created": "2017-01-19T18:17:29.063+0000"
},
(...)
],
(...)
"dataLanguage": "eng",
"pubDate": "2017-01-18T23:00:00.000+0000",
(...)
}
I want to programatically know wether a certain published dataset is updated or not in GBIF portal, comparing to its current IPT server version.
1) - In the example above, there is 1 minute difference between these two values:
"modified":"DATETIME"
"machineTags":[{"name":"crawl_attempt","created":"DATETIME"}]
Does this mean the last dataset harvest began immediately after the IPT was updated?
(as I said, I don't see the "status" tag you mentioned).
2) - I believe the "modified" datetime comes from IPT server clock, and the crawl_attempt created datetime comes from the crawler machine clock. Is this correct?
So if the IPT server clock is not in the correct time but a bit ahead, when one compares both datetime programatically, the script could wrongly conclude that GBIF portal info is outdated (if reported crawl_attempt datetime is a bit earlier than modified datetime).
So, my question is: can I somehow request the crawler clock current time through the API, to compare it to the IPT server clock?
And likewise: can I somehow request the clock current time from a given IPT server?
So, if they are not synchronized, the script can take in account tha diference when comparing modified and crawl_attempt datetimes.
3) - What does "pubDate" represent? It looks odd to me that it always shows the same time (23:00:00).
4) - I am curious about the api request /dataset/{UUID}/crawl
Api documentation says it "Schedules a new crawl of the dataset".
In which case is this request supposed to be necessary, and which user credentials should be used? Can I see a curl example request?
I suppose this is not intended for crawling IPT servers, since I had the idea that crawls would be immediate when publishing through IPT (when there are not unexpected delays due to whatever data portal necessary works).
Many thanks in advance
David