[IPT] Daily feeds and archive history

Tue Feb 19 08:38:22 CET 2019

While it would be great to have versioned datasets I generally create a
snapshot of the data used in a paper and archive this in Zenodo. This gives
complete reproducibility without putting extra demands on the data
providers. I do however need to cite the source and the snapshot.
Regards
Quentin

On Mon, 18 Feb 2019, 17:45 Tim Robertson <trobertson at gbif.org wrote:

> Hi Jonathan
>
> (adding GBIF helpdesk to the CC)
>
>
>
> This is just a quick answer which I expect will result in follow up
> questions.
>
>
>
> In terms of citation, we use a DOI to identify the concept of a dataset,
> not the specific version. E.g. https://doi.org/10.15468/cup0nk
>
> If you start deleting copies of data (e.g. a background housekeeping task)
> what will break are links to the downloads in the IPT pages.
> https://ipt.huh.harvard.edu/ipt/resource?r=huh_all_records&v=1.3
>
> This may or may not be considered a problem for you.
>
>
>
> I think others might have contacted you about suggestions for improving
> the dataset titles being used but if not I would suggest considering
> correctly formatted titles as they are used in  many places (
> https://www.gbif.org/dataset/4e4f97d2-4670-4b24-b982-261e0a450faf).
>
>
>
> I hope this helps as a start,
>
> Tim
>
>
>
>
>
>
>
>
>
>
>
> *From: *IPT <ipt-bounces at lists.gbif.org> on behalf of "Kennedy, Jonathan"
> <jonathan_kennedy at harvard.edu>
> *Date: *Monday, 18 February 2019 at 18.31
> *To: *"ipt at lists.gbif.org" <ipt at lists.gbif.org>
> *Subject: *[IPT] Daily feeds and archive history
>
>
>
> Hi All,
>
>
>
> I am finishing an upgrade to the Harvard University Herbaria IPT instance
> and have configured our feeds for daily auto-publish. The HUH has invested
> in a mass digitization workflow and we are currently creating ~20,000 new
> vascular records per month (with minimal data), so we do have new records
> on a daily basis. However, our DwC archives are fairly large (100MB+), so
> we can’t keep the daily archive history. I am looking for guidance on how
> it will work with GBIF dataset citation if we do not preserve each daily
> archive. It seems problematic if a version of our dataset is used and cited
> but cannot be reconstructed.
>
>
>
> Best regards,
>
> Jonathan A. Kennedy
>
> Director of Biodiversity Informatics
>
> Harvard University Herbaria,
>
> Department of Organismic and Evolutionary Biology
> _______________________________________________
> IPT mailing list
> IPT at lists.gbif.org
> https://lists.gbif.org/mailman/listinfo/ipt
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.gbif.org/pipermail/ipt/attachments/20190219/ffe966cc/attachment.html>