[IPT] Darwin Core Star Schema

Quentin Groom quentin.groom at plantentuinmeise.be
Wed Nov 11 20:48:44 CET 2015


Thanks Kyle, that's all useful information.
Regards
Quentin



Dr. Quentin Groom
(Botany and Information Technology)

Botanic Garden Meise
Domein van Bouchout
B-1860 Meise
Belgium

ORCID: 0000-0002-0596-5376

Landline; +32 (0) 226 009 20 ext. 364
FAX:      +32 (0) 226 009 45

E-mail:     quentin.groom at plantentuinmeise.be
Skype name: qgroom
Website:    www.botanicgarden.be


On 11 November 2015 at 19:12, Kyle Braak <kbraak at gbif.org> wrote:

> Dear Quentin,
>
> Thank you for your feedback.
>
> I answer your questions inline below.
>
> Best regards,
>
> Kyle
>
> On 11 Nov 2015, at 17:10, Quentin Groom <quentin.groom at plantentuinmeise.be>
> wrote:
>
> Having now experimented a little.
>
> I've had most success with a completely flat file
>
> The validator works OK except for the following 3 errors, which are
> presumably due to the recent changes to Darwin Core.
>
>    - Unknown term
>
>    http://rs.tdwg.org/dwc/terms/organismQuantity
>
>     mapped to column 9
>    - Unknown term
>
>    http://purl.org/dc/terms/license
>
>     mapped to column 16
>    - Unknown term
>
>    http://rs.tdwg.org/dwc/terms/organismQuantityType
>
>     mapped to column 8
>
>
> Exactly right. The validator needs to be updated to work with the latest
> extension versions, which use the latest Darwin Core terms.
>
>
> The IPT also seems to accept a flat file and I can get all the fields
> mapped. However, I'm not clear why it shows zero records in this summary. A
> problem???
> <image.png>
>
>
> I strongly suspect you are looking at a ‘preview’ page of the unpublished
> version 1.0. After you publish version 1.0, try viewing its homepage and
> the number of (core) records will show as you’d expect.
>
>
> Using, Event as the core file and occurrence as an extension the validator
> works OK, but it does create an error "The extension data file contains
> references to core IDs that do not exist:", but I think this is something
> to do with where it assumes the core IDs are in the file (i.e. not in the
> 1st column).
>
>
> The ID field links records from the two sources together. In this case,
> each occurrence record should link to an event record via its eventID. You
> can refer to the IPT User Manual's section on mapping [1] for more
> information. If you get stuck, feel free to write to me for direct
> assistance.
>
>
> I think I've also got it working in the IPT, though I'm not yet seeing the
> benefit of the star format over the flat format. Are the benefits of the
> Star Schema format only related to the size of the DWC-A, or are there
> another benefits. If it does just relate to the size then I think it would
> be best to recommend the flat file format for all but the big users.
>
>
> There are other benefits. The Star Schema is conceptually cleaner, easier
> to maintain, less prone to mistakes, etc.
>
> Please note that you can capture additional measurements or facts using
> the MeasurementOrFact extension [2]. Using Occurrence as the core it’s only
> appropriate to capture measurements or fact relating to the species
> occurrences whereas using Event as the core, you can capture measurements
> or facts relating to the sampling event (e.g., environmental measurements
> like sediment temperature and redox potential (Eh)).
>
> [1]
> https://github.com/gbif/ipt/wiki/IPT2ManualNotes.wiki#darwin-core-mappings
> [2] http://rs.gbif.org/extension/dwc/measurements_or_facts.xml
>
> Regards
> Quentin
>
>
>
>
>
> Dr. Quentin Groom
> (Botany and Information Technology)
>
> Botanic Garden Meise
> Domein van Bouchout
> B-1860 Meise
> Belgium
>
> ORCID: 0000-0002-0596-5376
>
> Landline; +32 (0) 226 009 20 ext. 364
> FAX:      +32 (0) 226 009 45
>
> E-mail:     quentin.groom at plantentuinmeise.be
> Skype name: qgroom
> Website:    www.botanicgarden.be
>
>
> On 11 November 2015 at 14:15, Tim Robertson <trobertson at gbif.org> wrote:
>
>> Thanks Quentin
>>
>> So if I understand correctly, the event file can be used as a Core, just
>> as Taxon and Occurrence can be Core files.
>>
>>
>> Yes, that’s correct
>>
>> Though as there can only be one Core ID I will still need to keep my
>> taxon information in the Occurrence file.
>> Although I don't think this is a problem, it can get a little confusing
>> in the documentation due to the crossover of terms between taxon and
>> occurrence files.
>>
>>
>> I’m afraid that is the kind of limitation I was eluding to about star
>> schemas… You have to denormalise things into a format which flattens what
>> you might otherwise model as 2 tables.
>>
>> Currently a "Taxon” can’t be used as an extension, so you would need to
>> use Occurrence.  Adding Taxon as an option would be technically possible,
>> but that would be completely decoupled from the occurrences.  It would
>> however allow you to have:
>>
>> Core: Rows of Sampling event documenting e.g. a square on the ground
>> sample on a specific period
>>   Extension taxon: List of species observed within the sampling event
>>   Extension occurrence: Documented evidence of specimens collected or
>> observed
>>
>> At the moment though, you would have to express species lists as
>> occurrences, which might make some sense because they are effectively
>> observations.
>>
>> I'm happy to be a Guinea pig. I'll experiment with the validator if you
>> think this should work and let you know how I get on.
>>
>>
>> Thanks for this,
>>
>> All the best,
>> Tim
>>
>>
>> Regards
>> Quentin
>>
>>
>>
>> Dr. Quentin Groom
>> (Botany and Information Technology)
>>
>> Botanic Garden Meise
>> Domein van Bouchout
>> B-1860 Meise
>> Belgium
>>
>> ORCID: 0000-0002-0596-5376
>>
>> Landline; +32 (0) 226 009 20 ext. 364
>> FAX:      +32 (0) 226 009 45
>>
>> E-mail:     quentin.groom at plantentuinmeise.be
>> Skype name: qgroom
>> Website:    www.botanicgarden.be
>>
>>
>> On 11 November 2015 at 11:58, Hannu Saarenmaa <
>> hannu.saarenmaa at helsinki.fi> wrote:
>>
>>> Quentin & Co
>>>
>>> It depends what you mean by "survey".   I would put each visit to a
>>> sampling location (such as a plot) in the event core, and put all the taxa
>>> that are observed in a non-core table.   The properties of the entire
>>> survey (project) would go to the EML metadata.
>>>
>>> Hannu
>>>
>>>
>>> On 2015-11-11 10:20, Quentin Groom wrote:
>>>
>>> I'm rather confused how the Darwin Core Star Schema is meant to work for
>>> survey data.
>>>
>>> Darwin Core can have one of two Core files, taxon or occurrence. The
>>> most appropriate for a survey would seem to be occurrence. So I imagine
>>> that in the star schema you could also have a related event file detailing
>>> the date and location of each survey and a non-core taxon file detailing
>>> the taxa that are observed.
>>>
>>> However, this does not seem to be possible. The DWC-A validator (
>>> http://tools.gbif.org/dwca-validator/), assumes only on core id in the
>>> core file so you can't link an occurrence both to a taxon and to an event.
>>> This is also true in the Darwin Core Archive Assistant (
>>> http://tools.gbif.org/dwca-assistant/). The solution seems to be to put
>>> all the information from the taxon core file into the occurrence file, but
>>> keep the separate event file linked with the core occurrence id.
>>>
>>> Is this correct? It seems rather counter intuitive.
>>>
>>> Regards
>>> Quentin
>>>
>>>
>>> Dr. Quentin Groom
>>> (Botany and Information Technology)
>>>
>>> Botanic Garden Meise
>>> Domein van Bouchout
>>> B-1860 Meise
>>> Belgium
>>>
>>> ORCID: 0000-0002-0596-5376
>>>
>>> Landline; +32 (0) 226 009 20 ext. 364
>>> FAX:      +32 (0) 226 009 45
>>>
>>> E-mail:     quentin.groom at plantentuinmeise.be
>>> Skype name: qgroom
>>> Website:    www.botanicgarden.be
>>>
>>>
>>> _______________________________________________
>>> IPT mailing listIPT at lists.gbif.orghttp://lists.gbif.org/mailman/listinfo/ipt
>>>
>>>
>>> --
>>>
>>> Hannu Saarenmaa, Research Directorhannu.saarenmaa at uef.fi
>>> Mobile +358-50-4479668
>>>
>>> University of Eastern Finland
>>> Digitarium, SIB Labs, Joensuu Science Park
>>> Länsikatu 15 (P.O. Box 111)
>>> FI-80101 Joensuu
>>> www.digitarium.fi/en - Service Centre for High-Performance Digitisationwww.eubon.eu - EU BON - GEO BON - Data Integration and Interoperability
>>>
>>>
>>
>>
> _______________________________________________
> IPT mailing list
> IPT at lists.gbif.org
> http://lists.gbif.org/mailman/listinfo/ipt
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gbif.org/pipermail/ipt/attachments/20151111/d10a76c1/attachment.html>


More information about the IPT mailing list