<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

<meta name="Generator" content="Microsoft Word 15 (filtered medium)">

<style><!--

/* Font Definitions */

@font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}

@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}

/* Style Definitions */

p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0in;

        margin-bottom:.0001pt;

        font-size:11.0pt;

        font-family:"Calibri",sans-serif;}

a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:blue;

        text-decoration:underline;}

a:visited, span.MsoHyperlinkFollowed

        {mso-style-priority:99;

        color:purple;

        text-decoration:underline;}

p.msonormal0, li.msonormal0, div.msonormal0

        {mso-style-name:msonormal;

        mso-margin-top-alt:auto;

        margin-right:0in;

        mso-margin-bottom-alt:auto;

        margin-left:0in;

        font-size:11.0pt;

        font-family:"Calibri",sans-serif;}

span.EmailStyle18

        {mso-style-type:personal;

        font-family:"Calibri",sans-serif;

        color:windowtext;}

span.EmailStyle19

        {mso-style-type:personal-reply;

        font-family:"Calibri",sans-serif;

        color:windowtext;}

.MsoChpDefault

        {mso-style-type:export-only;

        font-size:10.0pt;}

@page WordSection1

        {size:8.5in 11.0in;

        margin:85.05pt 56.7pt 85.05pt 56.7pt;}

div.WordSection1

        {page:WordSection1;}

--></style><!--[if gte mso 9]><xml>

<o:shapedefaults v:ext="edit" spidmax="1026" />

</xml><![endif]--><!--[if gte mso 9]><xml>

<o:shapelayout v:ext="edit">

<o:idmap v:ext="edit" data="1" />

</o:shapelayout></xml><![endif]-->

</head>

<body lang="EN-US" link="blue" vlink="purple">

<div class="WordSection1">

<p class="MsoNormal">Thanks very much, this is helpful feedback. <o:p></o:p></p>

<p class="MsoNormal"><o:p> </o:p></p>

<p class="MsoNormal">On a related note, Harvard-IQSS created a platform called Dataverse (https://dataverse.org/about) around 2007 and one interesting element is that they published a method for hashing datasets. This is done for the purpose of creating a citation

 element that can be used to verify that you have downloaded the same data. Passing along in case this is of interest to the group:<o:p></o:p></p>

<p class="MsoNormal"><o:p> </o:p></p>

<p class="MsoNormal">http://best-practices.dataverse.org/data-citation/<o:p></o:p></p>

<p class="MsoNormal"><o:p> </o:p></p>

<p class="MsoNormal">Best regards,<o:p></o:p></p>

<div>

<p class="MsoNormal"><span style="font-size:10.5pt;color:black">Jonathan A. Kennedy<o:p></o:p></span></p>

<p class="MsoNormal"><span style="font-size:10.5pt;color:black">Director of Biodiversity Informatics<o:p></o:p></span></p>

<div>

<p class="MsoNormal"><span style="font-size:10.5pt;color:black">Harvard University Herbaria,<o:p></o:p></span></p>

</div>

</div>

<p class="MsoNormal"><span style="font-size:10.5pt;color:black">Department of Organismic and Evolutionary Biology</span><o:p></o:p></p>

<p class="MsoNormal"><o:p> </o:p></p>

<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">

<p class="MsoNormal"><b><span style="font-size:12.0pt;color:black">From: </span></b><span style="font-size:12.0pt;color:black">Daniel Noesgaard <dnoesgaard@gbif.org><br>

<b>Date: </b>Tuesday, February 19, 2019 at 3:22 AM<br>

<b>To: </b>Quentin Groom <quentin.groom@plantentuinmeise.be>, Tim Robertson <trobertson@gbif.org><br>

<b>Cc: </b>"Kennedy, Jonathan" <jonathan_kennedy@harvard.edu>, "ipt@lists.gbif.org list" <ipt@lists.gbif.org>, helpdesk <helpdesk@gbif.org><br>

<b>Subject: </b>Re: [IPT] Daily feeds and archive history<o:p></o:p></span></p>

</div>

<div>

<p class="MsoNormal"><o:p> </o:p></p>

</div>

<p class="MsoNormal">I might also add that every download from GBIF.org–be it a single dataset or an aggregate–is archived and given a unique, persistent DOI for citation. And that citations of downloads count against all the datasets that contributed to that

 download.<o:p></o:p></p>

<p class="MsoNormal"> <o:p></o:p></p>

<p class="MsoNormal">-- <o:p></o:p></p>

<p class="MsoNormal" style="text-autospace:none">Daniel Noesgaard<o:p></o:p></p>

<p class="MsoNormal" style="text-autospace:none">Science Communications Coordinator<o:p></o:p></p>

<p class="MsoNormal" style="text-autospace:none">GBIF | Global Biodiversity Information Facility - Secretariat<o:p></o:p></p>

<p class="MsoNormal" style="text-autospace:none">Universitetsparken 15<o:p></o:p></p>

<p class="MsoNormal" style="text-autospace:none">DK-2100 Copenhagen, Denmark<o:p></o:p></p>

<p class="MsoNormal" style="text-autospace:none">E: <a href="mailto:dnoesgaard@gbif.org">

<span style="color:#0000E9">dnoesgaard@gbif.org</span></a><o:p></o:p></p>

<p class="MsoNormal" style="text-autospace:none">W: www.gbif.org<o:p></o:p></p>

<p class="MsoNormal">T: +45 35 32 08 74<o:p></o:p></p>

<p class="MsoNormal"> <o:p></o:p></p>

<p class="MsoNormal"> <o:p></o:p></p>

<p class="MsoNormal"> <o:p></o:p></p>

<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">

<p class="MsoNormal"><b><span style="font-size:12.0pt;color:black">From: </span></b><span style="font-size:12.0pt;color:black">Quentin Groom <quentin.groom@plantentuinmeise.be><br>

<b>Date: </b>Tuesday, 19 February 2019 at 08.38<br>

<b>To: </b>Tim Robertson <trobertson@gbif.org><br>

<b>Cc: </b>"Kennedy, Jonathan" <jonathan_kennedy@harvard.edu>, "ipt@lists.gbif.org list" <ipt@lists.gbif.org>, helpdesk <helpdesk@gbif.org>, Daniel Noesgaard <dnoesgaard@gbif.org><br>

<b>Subject: </b>Re: [IPT] Daily feeds and archive history</span><o:p></o:p></p>

</div>

<div>

<p class="MsoNormal"> <o:p></o:p></p>

</div>

<div>

<p class="MsoNormal">While it would be great to have versioned datasets I generally create a snapshot of the data used in a paper and archive this in Zenodo. This gives complete reproducibility without putting extra demands on the data providers. I do however

 need to cite the source and the snapshot. <o:p></o:p></p>

<div>

<p class="MsoNormal">Regards<o:p></o:p></p>

</div>

<div>

<p class="MsoNormal">Quentin<o:p></o:p></p>

</div>

</div>

<p class="MsoNormal"> <o:p></o:p></p>

<div>

<div>

<p class="MsoNormal">On Mon, 18 Feb 2019, 17:45 Tim Robertson <<a href="mailto:trobertson@gbif.org">trobertson@gbif.org</a> wrote:<o:p></o:p></p>

</div>

<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0in;margin-bottom:5.0pt">

<div>

<div>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">Hi Jonathan</span><o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">(adding GBIF helpdesk to the CC)</span><o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB"> </span><o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">This is just a quick answer which I expect will result in follow up questions.</span><o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB"> </span><o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">In terms of citation, we use a DOI to identify the concept of a dataset, not the specific version. E.g.

<a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__doi.org_10.15468_cup0nk&d=DwMGaQ&c=WO-RGvefibhHBZq3fL85hQ&r=CdeDWKDCq4utpRBAQsRWPsFEuA9hFIpReg9XUuWRHOA&m=QAbsRjSWihrdVjG7RYt6giVaADF8smdKP1WZnbfukuc&s=gxEzg7QhLSvKKIBG7rDac6LWCKd-bjMirk5DHQx2y9I&e=" target="_blank">

https://doi.org/10.15468/cup0nk</a> </span><o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">If you start deleting copies of data (e.g. a background housekeeping task) what will break are links to the downloads in the IPT pages. 

<a href="https://ipt.huh.harvard.edu/ipt/resource?r=huh_all_records&v=1.3" target="_blank">

https://ipt.huh.harvard.edu/ipt/resource?r=huh_all_records&v=1.3</a></span><o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">This may or may not be considered a problem for you.</span><o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB"> </span><o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">I think others might have contacted you about suggestions for improving the dataset titles being used but if not I would suggest considering correctly formatted

 titles as they are used in  many places (<a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__www.gbif.org_dataset_4e4f97d2-2D4670-2D4b24-2Db982-2D261e0a450faf-29&d=DwMGaQ&c=WO-RGvefibhHBZq3fL85hQ&r=CdeDWKDCq4utpRBAQsRWPsFEuA9hFIpReg9XUuWRHOA&m=QAbsRjSWihrdVjG7RYt6giVaADF8smdKP1WZnbfukuc&s=hLi2fk3gePaQiOUHBC7Lb3KNPFmLiRKgOlK1tdYUqFA&e=" target="_blank">https://www.gbif.org/dataset/4e4f97d2-4670-4b24-b982-261e0a450faf)</a>.</span><o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB"> </span><o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">I hope this helps as a start,</span><o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">Tim</span><o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB"> </span><o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB"> </span><o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB"> </span><o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB"> </span><o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB"> </span><o:p></o:p></p>

<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b><span lang="EN-GB" style="color:black">From:

</span></b><span lang="EN-GB" style="color:black">IPT <<a href="mailto:ipt-bounces@lists.gbif.org" target="_blank">ipt-bounces@lists.gbif.org</a>> on behalf of "Kennedy, Jonathan" <<a href="mailto:jonathan_kennedy@harvard.edu" target="_blank">jonathan_kennedy@harvard.edu</a>><br>

<b>Date: </b>Monday, 18 February 2019 at 18.31<br>

<b>To: </b>"<a href="mailto:ipt@lists.gbif.org" target="_blank">ipt@lists.gbif.org</a>" <<a href="mailto:ipt@lists.gbif.org" target="_blank">ipt@lists.gbif.org</a>><br>

<b>Subject: </b>[IPT] Daily feeds and archive history</span><o:p></o:p></p>

</div>

<div>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB"> </span><o:p></o:p></p>

</div>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">Hi All,

</span><o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB"> </span><o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">I am finishing an upgrade to the Harvard University Herbaria IPT instance and have configured our feeds for daily auto-publish. The HUH has invested in a mass

 digitization workflow and we are currently creating ~20,000 new vascular records per month (with minimal data), so we do have new records on a daily basis. However, our DwC archives are fairly large (100MB+), so we can’t keep the daily archive history. I am

 looking for guidance on how it will work with GBIF dataset citation if we do not preserve each daily archive. It seems problematic if a version of our dataset is used and cited but cannot be reconstructed.

</span><o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB"> </span><o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">Best regards,</span><o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB" style="font-size:10.5pt;color:black">Jonathan A. Kennedy</span><o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB" style="font-size:10.5pt;color:black">Director of Biodiversity Informatics</span><o:p></o:p></p>

<div>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB" style="font-size:10.5pt;color:black">Harvard University Herbaria,</span><o:p></o:p></p>

</div>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB" style="font-size:10.5pt;color:black">Department of Organismic and Evolutionary Biology</span><o:p></o:p></p>

</div>

</div>

<p class="MsoNormal">_______________________________________________<br>

IPT mailing list<br>

<a href="mailto:IPT@lists.gbif.org" target="_blank">IPT@lists.gbif.org</a><br>

<a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gbif.org_mailman_listinfo_ipt&d=DwMGaQ&c=WO-RGvefibhHBZq3fL85hQ&r=CdeDWKDCq4utpRBAQsRWPsFEuA9hFIpReg9XUuWRHOA&m=QAbsRjSWihrdVjG7RYt6giVaADF8smdKP1WZnbfukuc&s=DV0zFYttiKPqFg1nTOXbnwdsZXT8Zm3O1ZF1tTialnE&e=" target="_blank">https://lists.gbif.org/mailman/listinfo/ipt</a><o:p></o:p></p>

</blockquote>

</div>

</div>

</body>

</html>