understanding download status (specially FILE_ERASED)
I have a question about the results returned by the api when requesting the download activity of a dataset. /occurrence/download/dataset/{datasetKey} https://www.gbif.org/developer/occurrence#download
Where can I find a description of the possible "status" of a given download? (I found at least 6 different values in our datasets activity).
I understand that KILLED, FAILED, RUNNING, PREPARING ... can all be interpreted as "nobody has (still) used that information". Whereas a "SUCCEEDED" download means the opposite.
But what about "FILE_ERASED http://api.gbif.org/v1/occurrence/download/0000316-140429114108248"? Does it mean the download was "SUCCEEDED" in the past, and some time later the info was deleted? (when looking at old downloads, the percentage of FILE_ERASED is much higher).
If I want to evaluate real usage of a dataset along years, should I better use the sum of "SUCCEEDED + FILE_ERASED". Right?
Also, I understand that the sum of SUCCEEDED + FILE_ERASED downloads for a given dataset and year, should be the same if somebody repeats the query in the future (whereas the sum of SUCCEEDED downloads will decrease).
Thanks a lot for your help David
Hi David,
First of all, the status of a download request doesn't really say anything about actual usage. PREPARING and RUNNING are initial statuses that are followed by SUCCEEDED, unless something goes wrong—in which case you may see FAILED or KILLED, if cancelled.
Due to storage restrictions we have had to delete some older, very large downloads, which now have status FILE_ERASED.
If a download is used and the usage properly cited, say in a journal publication, we will flag this download by linking the citing article to the download. This information, however, is not available in the regular API. You may access this information through
https://www.gbif.org/api/resource/search?contentType=literature&gbifData...
or for a specific download
https://www.gbif.org/api/resource/search?contentType=literature&gbifDown...
If you'd like to discuss this further, feel free to reach out to me. I would be interested in learning more about what you're planning to do with this data?
Thanks, Daniel GBIF
________________________________ From: API-users api-users-bounces@lists.gbif.org on behalf of Herbario SANT sant.herbarium@gmail.com Sent: Monday, January 20, 2020 21:25 To: api-users@lists.gbif.org api-users@lists.gbif.org Subject: [API-users] understanding download status (specially FILE_ERASED)
I have a question about the results returned by the api when requesting the download activity of a dataset. /occurrence/download/dataset/{datasetKey}https://www.gbif.org/developer/occurrence#download
Where can I find a description of the possible "status" of a given download? (I found at least 6 different values in our datasets activity).
I understand that KILLED, FAILED, RUNNING, PREPARING ... can all be interpreted as "nobody has (still) used that information". Whereas a "SUCCEEDED" download means the opposite.
But what about "FILE_ERASEDhttp://api.gbif.org/v1/occurrence/download/0000316-140429114108248"? Does it mean the download was "SUCCEEDED" in the past, and some time later the info was deleted? (when looking at old downloads, the percentage of FILE_ERASED is much higher).
If I want to evaluate real usage of a dataset along years, should I better use the sum of "SUCCEEDED + FILE_ERASED". Right?
Also, I understand that the sum of SUCCEEDED + FILE_ERASED downloads for a given dataset and year, should be the same if somebody repeats the query in the future (whereas the sum of SUCCEEDED downloads will decrease).
Thanks a lot for your help David
Hi David
I hope you are well!
Good question.
PREPARING: just submitted by user and awaiting processing (typically only a few seconds) RUNNING: being created (typically 1-15 minutes) FAILED: something unexpected went wrong KILLED: user decided to abort the job while it was in PREPARING, RUNNING phase SUCCEEDED: The download was created and the user informed FILE_ERASED: The download was deleted according to the retention policy (please see https://www.gbif.org/faq?question=for-how-long-will-does-gbif-store-download... )
So you are correct to sum SUCCEEDED+FILE_ERASED to calculate number of times data is in downloads and that SUCCEEDED would decrease as the file system is cleaned up.
If you are exploring use, please also consider the citations, such as: https://www.gbif.org/resource/search?contentType=literature&gbifDatasetK...
I hope this helps, Tim
From: API-users api-users-bounces@lists.gbif.org on behalf of Herbario SANT sant.herbarium@gmail.com Date: Monday, 20 January 2020 at 21.26 To: "api-users@lists.gbif.org" api-users@lists.gbif.org Subject: [API-users] understanding download status (specially FILE_ERASED)
I have a question about the results returned by the api when requesting the download activity of a dataset. /occurrence/download/dataset/{datasetKey}https://www.gbif.org/developer/occurrence#download
Where can I find a description of the possible "status" of a given download? (I found at least 6 different values in our datasets activity).
I understand that KILLED, FAILED, RUNNING, PREPARING ... can all be interpreted as "nobody has (still) used that information". Whereas a "SUCCEEDED" download means the opposite.
But what about "FILE_ERASEDhttp://api.gbif.org/v1/occurrence/download/0000316-140429114108248"? Does it mean the download was "SUCCEEDED" in the past, and some time later the info was deleted? (when looking at old downloads, the percentage of FILE_ERASED is much higher).
If I want to evaluate real usage of a dataset along years, should I better use the sum of "SUCCEEDED + FILE_ERASED". Right?
Also, I understand that the sum of SUCCEEDED + FILE_ERASED downloads for a given dataset and year, should be the same if somebody repeats the query in the future (whereas the sum of SUCCEEDED downloads will decrease).
Thanks a lot for your help David
participants (3)
-
Daniel Noesgaard
-
Herbario SANT
-
Tim Robertson