Hi John, Paul,

Reporting back.. 

I did some investigation into what DataCite intends a list of rights to be used for. 

The DataCite Metadata Working Group has explained that a list of rights is intended to support applying multiple licenses that apply to the dataset as a whole. Here’s the answer provided to me on behalf of their chair:

the Metadata Working Group has discussed your question and would like to say that the intension was (a) to allow for multiple licenses to be applied to a dataset. Moreover, we suggest that if different licenses apply to separable components of a dataset, those (various) components ought to have separate metadata records (and so also separate DOIs).

From the DataCite mailing list, I’m told datasets with multiple licenses are pretty common. For example, OpenAIRE applies multiple complementary licenses to their datasets sometimes [1]. 

As for EML, we did an investigation last year into how it allows licenses to be expressed for datasets. We did discover the license [2] and licenseURL fields, however, they relate to software not datasets. The EML mailing list was consulted for guidance on this topic with no answer ever received unfortunately [3]. Furthermore, EML documentation includes no guidance for applying multiple licenses to a dataset (or its components) as far as I can see. 

Nevertheless, EML does allow one free-text intellectualRights [4] element per dataset and this is where GBIF expresses a license in its own metadata profile (based on EML).  To make the license machine readable/parseable, what we do is use the ulink [5] element inside the intellectualRights to store the license and license URL separately. Since we need to enforce the GBIF licensing Policy [6], we only allow a single license to be expressed though. 

To sum up, it’s great we now have a clear recommendation from DataCite on how to apply licenses to datasets. To better integrate with DataCite we will benefit from adopting recommendations in line with theirs. 

With kind regards,

Kyle

[1] https://guidelines.openaire.eu/wiki/OpenAIRE_Guidelines:_For_Data_Archives#Access_rights_and_license_information
[2] https://knb.ecoinformatics.org/#external//emlparser/docs/eml-2.1.1/./eml-software.html#license
[3] https://github.com/peterdesmet/awesome-metadata/issues/2#issuecomment-62885616
[4] https://knb.ecoinformatics.org/#external//emlparser/docs/eml-2.1.1/./eml-resource.html#intellectualRights
[5] https://knb.ecoinformatics.org/#external//emlparser/docs/eml-2.1.1/./eml-text.html#ulink
[6] http://www.gbif.org/terms/licences

On 04 Mar 2015, at 07:01, John Wieczorek <tuco@berkeley.edu> wrote:

With the latter, I agree. Looking forward to the outcome of the first item. And thanks, Paul, for realizing this and bringing it forward to everyone's attention.

On Tue, Mar 3, 2015 at 3:30 PM, Tim Robertson <trobertson@gbif.org> wrote:
On Tue, 3 Mar 2015 18:55:35 +0100
Tim Robertson <trobertson@gbif.org> wrote:
Thanks Paul and Tuco for the feedback - useful food for thought and I
note that the DataCite metadata kernel also uses a list for rights,
not a single statement.  Allowing a collection of rights might be the
most applicable solution here.

That still leaves uncertainty, does the list CC-BY, CC-BY-NC mean that
the data set in it's entirety is licensed under both CC-BY and
CC-BY-NC, or that there are parts under one license and parts under the
other.  At the data set level, collection of rights statements could
easily be interpreted differently than a statement that rights are
stated at the record level. 

Agreed.  We need to understand how this will relate to DataCite, EML etc and other networks we need to integrate with.
I am not yet sure if DataCite intends to use it to indicate dual licensing (e.g. it can be used in either license) or if it indicates variance.  
This needs further investigation and we’ll reply.

A point of clarity though - an image extension allows you to provide
metadata about an image that exists on a URL, but the image itself is
not part of the DwC-A / dataset.  One field of the image metadata is
the license applicable for the image but that should not be
transferred to the dataset being put out by the IPT.  Or are we not
in agreement on that?  E.g. the DwC-A can be available under CC0 but
contain links to online images that could be behind some far more
restrictive license.

This does seem fairly clear in an AudobonCore or other media extension
where metadata about the rights associated with external media objects
are being asserted in the metadata in association with the retrieval
locations of those media objects are being asserted.   It seems less
clear if dwc:associatedMedia is present in a flat Occurrence record, is
an assertion about the rights made in the dataset level metadata to be
taken to extend to the digital object at the other end of a link found
in dwc:associatedMedia?   

That is one of the common questions - we always advise folk to use a more expressive model (i.e. extensions) where it is necessary to associate titles, rights statements etc.

More tomorrow.
Tim





Thanks,
Tim





On 03 Mar 2015, at 17:13, John Wieczorek <tuco@berkeley.edu> wrote:

I agree. This is particularly problematic in a resource that
includes a media extension, where the rights of the core records
may well differ from that of the media, and where the rights on
individual media vary within the extension. I think creates an
unacceptable barrier. Instead, could the IPT allow a set of rights
at the dataset level or validate for rights at the record level?

On Tue, Mar 3, 2015 at 12:12 PM, Paul J. Morris <mole@morris.net>
wrote: On Tue, 3 Mar 2015 13:18:35 +0100
Kyle Braak <kbraak@gbif.org> wrote:
Best practice is that the license applied to the dataset should
not contradict the license(s) applied at the record level.

I think this imposes a requirement that the dataset level metadata
can have a value which indicates that rights are described at the
record level rather than at the dataset level.  Otherwise, it
imposes a requirement on data providers that they create a unique
resource for each separate rights statement, this will be a problem
for any provider who has more than one rights assertion in their
data, and for intermediate aggregators who are combining data sets
from downstream profiders and passing them on to other aggregators
upstream.

-Paul
--
Paul J. Morris
Biodiversity Informatics Manager
Harvard University Herbaria/Museum of Comparative Zoölogy
mole@morris.net  AA3SD  PGP public key available
_______________________________________________
IPT mailing list
IPT@lists.gbif.org
http://lists.gbif.org/mailman/listinfo/ipt

_______________________________________________
IPT mailing list
IPT@lists.gbif.org
http://lists.gbif.org/mailman/listinfo/ipt



-- 
Paul J. Morris
Biodiversity Informatics Manager
Harvard University Herbaria/Museum of Comparative Zoölogy
mole@morris.net  AA3SD  PGP public key available


_______________________________________________
IPT mailing list
IPT@lists.gbif.org
http://lists.gbif.org/mailman/listinfo/ipt