I have a question about the results returned by the api when requesting the
download activity of a dataset.
/occurrence/download/dataset/{datasetKey}
<https://www.gbif.org/developer/occurrence#download>
Where can I find a description of the possible "status" of a given
download? (I found at least 6 different values in our datasets activity).
I understand that KILLED, FAILED, RUNNING, PREPARING ... can all be
interpreted as "nobody has (still) used that information".
Whereas a "SUCCEEDED" download means the opposite.
But what about "FILE_ERASED
<http://api.gbif.org/v1/occurrence/download/0000316-140429114108248>"?
Does it mean the download was "SUCCEEDED" in the past, and some time later
the info was deleted? (when looking at old downloads, the percentage of
FILE_ERASED is much higher).
If I want to evaluate real usage of a dataset along years, should I better
use the sum of "SUCCEEDED + FILE_ERASED". Right?
Also, I understand that the sum of SUCCEEDED + FILE_ERASED downloads for a
given dataset and year, should be the same if somebody repeats the query in
the future (whereas the sum of SUCCEEDED downloads will decrease).
Thanks a lot for your help
David
Hello.
I am using the api to retrieve information about downloads, and compare
usage of my institution datasets from year to year.
I use the last option in occurrence downloads
<https://www.gbif.org/developer/occurrence#download> section, which lists
the downloads activity of a certain dataset:
/occurrence/download/dataset/{datasetKey}
I have a couple of questions (a bug and a request)
(1) BUG:
For one of our datasets, I was delaying a few seconds between page requests
until I got this error at a certain offset:
http://api.gbif.org/v1/occurrence/download/dataset/1c334170-7ed1-11df-8c4a-…
Oddly, if I increased or decreased the offset a little bit, I got no errors.
So it seemed to me a problem related to a certain download event record.
Then I changed the offset and limit and I was able to find the single event
using this url:
http://api.gbif.org/v1/occurrence/download/dataset/1c334170-7ed1-11df-8c4a-…
This download event was on 2016-03-01 (date reported in the events after
and before that offset).
Unfortunately, when you read this message the number of download events
will already be higher. So you will need to change the above url and
increase the offset value to see the same event record (because the api
offset count begins with the last download event instead of beginning with
the first one).
I created this copy of the error page for you: http://archive.is/5UNAy
Or you can substract the current number of downloads count minus 18918 to
find the offset for the problematic record.
(2) ENHANCEMENT:
I take the opportunity to encourage you to introduce a "sort" parameter in
this kind of api requests. So that limit=1&offset=0&sort=asc will return
the oldest event (instead of the newest one). You can keep the current
sorting as default (i.e. sort=desc) so you don't break any applications
which are already using this api.
This option will make the above url more permanent (so you keep getting the
same event records when you repeat the same offset/limit values using
sort=asc).
It would be much easier to track historical usage of datasets: if I store
the offset of my last report, I can resume it from that point the next time
I want to see activity of this dataset (i.e., if year 2018 events begin at
offset 3333 and end at 3567, I will be able to make my next report starting
at offset 3568)
Another option would be to let use filter events by date (BTW, the api
shows a "created" and a "modified" date, but I don't understand the meaning
of "modified" for a download event).
Thanks a lot in advance for your help
--
David