I'm rather confused how the Darwin Core Star Schema is meant to work for survey data.
Darwin Core can have one of two Core files, taxon or occurrence. The most appropriate for a survey would seem to be occurrence. So I imagine that in the star schema you could also have a related event file detailing the date and location of each survey and a non-core taxon file detailing the taxa that are observed.
However, this does not seem to be possible. The DWC-A validator ( http://tools.gbif.org/dwca-validator/), assumes only on core id in the core file so you can't link an occurrence both to a taxon and to an event. This is also true in the Darwin Core Archive Assistant ( http://tools.gbif.org/dwca-assistant/). The solution seems to be to put all the information from the taxon core file into the occurrence file, but keep the separate event file linked with the core occurrence id.
Is this correct? It seems rather counter intuitive.
Regards Quentin
Dr. Quentin Groom (Botany and Information Technology)
Botanic Garden Meise Domein van Bouchout B-1860 Meise Belgium
ORCID: 0000-0002-0596-5376
Landline; +32 (0) 226 009 20 ext. 364 FAX: +32 (0) 226 009 45
E-mail: quentin.groom@plantentuinmeise.be Skype name: qgroom Website: www.botanicgarden.be
Hi Quentin
In the latest IPT, we have introduced a third type of core which is the Sampling Event core to specifically accommodate this case. It is still a star schema, and therefore has all the known limitations, but I think it does provide a model to accommodate the specific limitation that you describe.
The sampling event core allows you to describe e.g. the time, place and protocol of the sample (i.e. it is *very* similar to the occurrence fields) but by having it as the core, you can attach the extensions of “occurrences observed / collected” in each sampling event and also things like lists of species present or absent.
We are only just venturing into this new territory and are still working with early adopters to get reference datasets. Would you be keen to work on a pilot of this?
For information: The DwC-A validator should work, but it’s not been tested in anger with this new format so we might run into some things we need to fix. The DwC-A assistant is something that in truth has not be well maintained for years having been developed externally by a contractor, and most likely will not work at all for this. That is not ideal but we haven’t had the resources to maintain it, and it is in very little use. The IPT is the primary tool for publishing along with the BioCASE and TAPIR tools and there are enough hosted IPT installations now that if someone did not wish to run one, a viable host would be offered.
I hope this helps clarify things, and we are very willing to help work through issues you may encounter in the sampling event core. If you wish to simply play around, would you like an account on the demo site? ipt.gbif.org
Thanks, Tim
On 11 Nov 2015, at 09:20, Quentin Groom quentin.groom@plantentuinmeise.be wrote:
I'm rather confused how the Darwin Core Star Schema is meant to work for survey data.
Darwin Core can have one of two Core files, taxon or occurrence. The most appropriate for a survey would seem to be occurrence. So I imagine that in the star schema you could also have a related event file detailing the date and location of each survey and a non-core taxon file detailing the taxa that are observed.
However, this does not seem to be possible. The DWC-A validator (http://tools.gbif.org/dwca-validator/ http://tools.gbif.org/dwca-validator/), assumes only on core id in the core file so you can't link an occurrence both to a taxon and to an event. This is also true in the Darwin Core Archive Assistant (http://tools.gbif.org/dwca-assistant/ http://tools.gbif.org/dwca-assistant/). The solution seems to be to put all the information from the taxon core file into the occurrence file, but keep the separate event file linked with the core occurrence id.
Is this correct? It seems rather counter intuitive.
Regards Quentin
Dr. Quentin Groom (Botany and Information Technology)
Botanic Garden Meise Domein van Bouchout B-1860 Meise Belgium
ORCID: 0000-0002-0596-5376
Landline; +32 (0) 226 009 20 ext. 364 FAX: +32 (0) 226 009 45
E-mail: quentin.groom@plantentuinmeise.be mailto:quentin.groom@plantentuinmeise.be Skype name: qgroom Website: www.botanicgarden.be http://www.botanicgarden.be/
IPT mailing list IPT@lists.gbif.org http://lists.gbif.org/mailman/listinfo/ipt
Quentin & Co
It depends what you mean by "survey". I would put each visit to a sampling location (such as a plot) in the event core, and put all the taxa that are observed in a non-core table. The properties of the entire survey (project) would go to the EML metadata.
Hannu
On 2015-11-11 10:20, Quentin Groom wrote:
I'm rather confused how the Darwin Core Star Schema is meant to work for survey data.
Darwin Core can have one of two Core files, taxon or occurrence. The most appropriate for a survey would seem to be occurrence. So I imagine that in the star schema you could also have a related event file detailing the date and location of each survey and a non-core taxon file detailing the taxa that are observed.
However, this does not seem to be possible. The DWC-A validator (http://tools.gbif.org/dwca-validator/), assumes only on core id in the core file so you can't link an occurrence both to a taxon and to an event. This is also true in the Darwin Core Archive Assistant (http://tools.gbif.org/dwca-assistant/). The solution seems to be to put all the information from the taxon core file into the occurrence file, but keep the separate event file linked with the core occurrence id.
Is this correct? It seems rather counter intuitive.
Regards Quentin
Dr. Quentin Groom (Botany and Information Technology)
Botanic Garden Meise Domein van Bouchout B-1860 Meise Belgium
ORCID: 0000-0002-0596-5376
Landline; +32 (0) 226 009 20 ext. 364 FAX: +32 (0) 226 009 45
E-mail: quentin.groom@plantentuinmeise.be mailto:quentin.groom@plantentuinmeise.be Skype name: qgroom Website: www.botanicgarden.be http://www.botanicgarden.be
IPT mailing list IPT@lists.gbif.org http://lists.gbif.org/mailman/listinfo/ipt
So if I understand correctly, the event file can be used as a Core, just as Taxon and Occurrence can be Core files. Though as there can only be one Core ID I will still need to keep my taxon information in the Occurrence file. Although I don't think this is a problem, it can get a little confusing in the documentation due to the crossover of terms between taxon and occurrence files.
I'm happy to be a Guinea pig. I'll experiment with the validator if you think this should work and let you know how I get on.
Regards Quentin
Dr. Quentin Groom (Botany and Information Technology)
Botanic Garden Meise Domein van Bouchout B-1860 Meise Belgium
ORCID: 0000-0002-0596-5376
Landline; +32 (0) 226 009 20 ext. 364 FAX: +32 (0) 226 009 45
E-mail: quentin.groom@plantentuinmeise.be Skype name: qgroom Website: www.botanicgarden.be
On 11 November 2015 at 11:58, Hannu Saarenmaa hannu.saarenmaa@helsinki.fi wrote:
Quentin & Co
It depends what you mean by "survey". I would put each visit to a sampling location (such as a plot) in the event core, and put all the taxa that are observed in a non-core table. The properties of the entire survey (project) would go to the EML metadata.
Hannu
On 2015-11-11 10:20, Quentin Groom wrote:
I'm rather confused how the Darwin Core Star Schema is meant to work for survey data.
Darwin Core can have one of two Core files, taxon or occurrence. The most appropriate for a survey would seem to be occurrence. So I imagine that in the star schema you could also have a related event file detailing the date and location of each survey and a non-core taxon file detailing the taxa that are observed.
However, this does not seem to be possible. The DWC-A validator ( http://tools.gbif.org/dwca-validator/), assumes only on core id in the core file so you can't link an occurrence both to a taxon and to an event. This is also true in the Darwin Core Archive Assistant ( http://tools.gbif.org/dwca-assistant/). The solution seems to be to put all the information from the taxon core file into the occurrence file, but keep the separate event file linked with the core occurrence id.
Is this correct? It seems rather counter intuitive.
Regards Quentin
Dr. Quentin Groom (Botany and Information Technology)
Botanic Garden Meise Domein van Bouchout B-1860 Meise Belgium
ORCID: 0000-0002-0596-5376
Landline; +32 (0) 226 009 20 ext. 364 FAX: +32 (0) 226 009 45
E-mail: quentin.groom@plantentuinmeise.be Skype name: qgroom Website: www.botanicgarden.be
IPT mailing listIPT@lists.gbif.orghttp://lists.gbif.org/mailman/listinfo/ipt
--
Hannu Saarenmaa, Research Directorhannu.saarenmaa@uef.fi Mobile +358-50-4479668
University of Eastern Finland Digitarium, SIB Labs, Joensuu Science Park Länsikatu 15 (P.O. Box 111) FI-80101 Joensuu www.digitarium.fi/en - Service Centre for High-Performance Digitisationwww.eubon.eu - EU BON - GEO BON - Data Integration and Interoperability
Thanks Quentin
So if I understand correctly, the event file can be used as a Core, just as Taxon and Occurrence can be Core files.
Yes, that’s correct
Though as there can only be one Core ID I will still need to keep my taxon information in the Occurrence file. Although I don't think this is a problem, it can get a little confusing in the documentation due to the crossover of terms between taxon and occurrence files.
I’m afraid that is the kind of limitation I was eluding to about star schemas… You have to denormalise things into a format which flattens what you might otherwise model as 2 tables.
Currently a "Taxon” can’t be used as an extension, so you would need to use Occurrence. Adding Taxon as an option would be technically possible, but that would be completely decoupled from the occurrences. It would however allow you to have:
Core: Rows of Sampling event documenting e.g. a square on the ground sample on a specific period Extension taxon: List of species observed within the sampling event Extension occurrence: Documented evidence of specimens collected or observed
At the moment though, you would have to express species lists as occurrences, which might make some sense because they are effectively observations.
I'm happy to be a Guinea pig. I'll experiment with the validator if you think this should work and let you know how I get on.
Thanks for this,
All the best, Tim
Regards Quentin
Dr. Quentin Groom (Botany and Information Technology)
Botanic Garden Meise Domein van Bouchout B-1860 Meise Belgium
ORCID: 0000-0002-0596-5376
Landline; +32 (0) 226 009 20 ext. 364 FAX: +32 (0) 226 009 45
E-mail: quentin.groom@plantentuinmeise.be mailto:quentin.groom@plantentuinmeise.be Skype name: qgroom Website: www.botanicgarden.be http://www.botanicgarden.be/
On 11 November 2015 at 11:58, Hannu Saarenmaa <hannu.saarenmaa@helsinki.fi mailto:hannu.saarenmaa@helsinki.fi> wrote: Quentin & Co
It depends what you mean by "survey". I would put each visit to a sampling location (such as a plot) in the event core, and put all the taxa that are observed in a non-core table. The properties of the entire survey (project) would go to the EML metadata.
Hannu
On 2015-11-11 10:20, Quentin Groom wrote:
I'm rather confused how the Darwin Core Star Schema is meant to work for survey data.
Darwin Core can have one of two Core files, taxon or occurrence. The most appropriate for a survey would seem to be occurrence. So I imagine that in the star schema you could also have a related event file detailing the date and location of each survey and a non-core taxon file detailing the taxa that are observed.
However, this does not seem to be possible. The DWC-A validator (http://tools.gbif.org/dwca-validator/ http://tools.gbif.org/dwca-validator/), assumes only on core id in the core file so you can't link an occurrence both to a taxon and to an event. This is also true in the Darwin Core Archive Assistant (http://tools.gbif.org/dwca-assistant/ http://tools.gbif.org/dwca-assistant/). The solution seems to be to put all the information from the taxon core file into the occurrence file, but keep the separate event file linked with the core occurrence id.
Is this correct? It seems rather counter intuitive.
Regards Quentin
Dr. Quentin Groom (Botany and Information Technology)
Botanic Garden Meise Domein van Bouchout B-1860 Meise Belgium
ORCID: 0000-0002-0596-5376
Landline; +32 (0) 226 009 20 ext. 364 FAX: +32 (0) 226 009 45
E-mail: quentin.groom@plantentuinmeise.be mailto:quentin.groom@plantentuinmeise.be Skype name: qgroom Website: www.botanicgarden.be http://www.botanicgarden.be/
IPT mailing list IPT@lists.gbif.org mailto:IPT@lists.gbif.org http://lists.gbif.org/mailman/listinfo/ipt http://lists.gbif.org/mailman/listinfo/ipt
--
Hannu Saarenmaa, Research Director hannu.saarenmaa@uef.fi mailto:hannu.saarenmaa@uef.fi Mobile +358-50-4479668 tel:%2B358-50-4479668
University of Eastern Finland Digitarium, SIB Labs, Joensuu Science Park Länsikatu 15 (P.O. Box 111) FI-80101 Joensuu
www.digitarium.fi/en http://www.digitarium.fi/en - Service Centre for High-Performance Digitisation www.eubon.eu http://www.eubon.eu/ - EU BON - GEO BON - Data Integration and Interoperability
Having now experimented a little.
I've had most success with a completely flat file
The validator works OK except for the following 3 errors, which are presumably due to the recent changes to Darwin Core.
- Unknown term
http://rs.tdwg.org/dwc/terms/organismQuantity
mapped to column 9 - Unknown term
http://purl.org/dc/terms/license
mapped to column 16 - Unknown term
http://rs.tdwg.org/dwc/terms/organismQuantityType
mapped to column 8
The IPT also seems to accept a flat file and I can get all the fields mapped. However, I'm not clear why it shows zero records in this summary. A problem??? [image: Inline images 1]
Using, Event as the core file and occurrence as an extension the validator works OK, but it does create an error "The extension data file contains references to core IDs that do not exist:", but I think this is something to do with where it assumes the core IDs are in the file (i.e. not in the 1st column).
I think I've also got it working in the IPT, though I'm not yet seeing the benefit of the star format over the flat format. Are the benefits of the Star Schema format only related to the size of the DWC-A, or are there another benefits. If it does just relate to the size then I think it would be best to recommend the flat file format for all but the big users. Regards Quentin
Dr. Quentin Groom (Botany and Information Technology)
Botanic Garden Meise Domein van Bouchout B-1860 Meise Belgium
ORCID: 0000-0002-0596-5376
Landline; +32 (0) 226 009 20 ext. 364 FAX: +32 (0) 226 009 45
E-mail: quentin.groom@plantentuinmeise.be Skype name: qgroom Website: www.botanicgarden.be
On 11 November 2015 at 14:15, Tim Robertson trobertson@gbif.org wrote:
Thanks Quentin
So if I understand correctly, the event file can be used as a Core, just as Taxon and Occurrence can be Core files.
Yes, that’s correct
Though as there can only be one Core ID I will still need to keep my taxon information in the Occurrence file. Although I don't think this is a problem, it can get a little confusing in the documentation due to the crossover of terms between taxon and occurrence files.
I’m afraid that is the kind of limitation I was eluding to about star schemas… You have to denormalise things into a format which flattens what you might otherwise model as 2 tables.
Currently a "Taxon” can’t be used as an extension, so you would need to use Occurrence. Adding Taxon as an option would be technically possible, but that would be completely decoupled from the occurrences. It would however allow you to have:
Core: Rows of Sampling event documenting e.g. a square on the ground sample on a specific period Extension taxon: List of species observed within the sampling event Extension occurrence: Documented evidence of specimens collected or observed
At the moment though, you would have to express species lists as occurrences, which might make some sense because they are effectively observations.
I'm happy to be a Guinea pig. I'll experiment with the validator if you think this should work and let you know how I get on.
Thanks for this,
All the best, Tim
Regards Quentin
Dr. Quentin Groom (Botany and Information Technology)
Botanic Garden Meise Domein van Bouchout B-1860 Meise Belgium
ORCID: 0000-0002-0596-5376
Landline; +32 (0) 226 009 20 ext. 364 FAX: +32 (0) 226 009 45
E-mail: quentin.groom@plantentuinmeise.be Skype name: qgroom Website: www.botanicgarden.be
On 11 November 2015 at 11:58, Hannu Saarenmaa <hannu.saarenmaa@helsinki.fi
wrote:
Quentin & Co
It depends what you mean by "survey". I would put each visit to a sampling location (such as a plot) in the event core, and put all the taxa that are observed in a non-core table. The properties of the entire survey (project) would go to the EML metadata.
Hannu
On 2015-11-11 10:20, Quentin Groom wrote:
I'm rather confused how the Darwin Core Star Schema is meant to work for survey data.
Darwin Core can have one of two Core files, taxon or occurrence. The most appropriate for a survey would seem to be occurrence. So I imagine that in the star schema you could also have a related event file detailing the date and location of each survey and a non-core taxon file detailing the taxa that are observed.
However, this does not seem to be possible. The DWC-A validator ( http://tools.gbif.org/dwca-validator/), assumes only on core id in the core file so you can't link an occurrence both to a taxon and to an event. This is also true in the Darwin Core Archive Assistant ( http://tools.gbif.org/dwca-assistant/). The solution seems to be to put all the information from the taxon core file into the occurrence file, but keep the separate event file linked with the core occurrence id.
Is this correct? It seems rather counter intuitive.
Regards Quentin
Dr. Quentin Groom (Botany and Information Technology)
Botanic Garden Meise Domein van Bouchout B-1860 Meise Belgium
ORCID: 0000-0002-0596-5376
Landline; +32 (0) 226 009 20 ext. 364 FAX: +32 (0) 226 009 45
E-mail: quentin.groom@plantentuinmeise.be Skype name: qgroom Website: www.botanicgarden.be
IPT mailing listIPT@lists.gbif.orghttp://lists.gbif.org/mailman/listinfo/ipt
--
Hannu Saarenmaa, Research Directorhannu.saarenmaa@uef.fi Mobile +358-50-4479668
University of Eastern Finland Digitarium, SIB Labs, Joensuu Science Park Länsikatu 15 (P.O. Box 111) FI-80101 Joensuu www.digitarium.fi/en - Service Centre for High-Performance Digitisationwww.eubon.eu - EU BON - GEO BON - Data Integration and Interoperability
Dear Quentin,
Thank you for your feedback.
I answer your questions inline below.
Best regards,
Kyle
On 11 Nov 2015, at 17:10, Quentin Groom quentin.groom@plantentuinmeise.be wrote:
Having now experimented a little.
I've had most success with a completely flat file
The validator works OK except for the following 3 errors, which are presumably due to the recent changes to Darwin Core. Unknown term http://rs.tdwg.org/dwc/terms/organismQuantity http://rs.tdwg.org/dwc/terms/organismQuantity mapped to column 9 Unknown term http://purl.org/dc/terms/license http://purl.org/dc/terms/license mapped to column 16 Unknown term http://rs.tdwg.org/dwc/terms/organismQuantityType http://rs.tdwg.org/dwc/terms/organismQuantityType mapped to column 8
Exactly right. The validator needs to be updated to work with the latest extension versions, which use the latest Darwin Core terms.
The IPT also seems to accept a flat file and I can get all the fields mapped. However, I'm not clear why it shows zero records in this summary. A problem??? <image.png>
I strongly suspect you are looking at a ‘preview’ page of the unpublished version 1.0. After you publish version 1.0, try viewing its homepage and the number of (core) records will show as you’d expect.
Using, Event as the core file and occurrence as an extension the validator works OK, but it does create an error "The extension data file contains references to core IDs that do not exist:", but I think this is something to do with where it assumes the core IDs are in the file (i.e. not in the 1st column).
The ID field links records from the two sources together. In this case, each occurrence record should link to an event record via its eventID. You can refer to the IPT User Manual's section on mapping [1] for more information. If you get stuck, feel free to write to me for direct assistance.
I think I've also got it working in the IPT, though I'm not yet seeing the benefit of the star format over the flat format. Are the benefits of the Star Schema format only related to the size of the DWC-A, or are there another benefits. If it does just relate to the size then I think it would be best to recommend the flat file format for all but the big users.
There are other benefits. The Star Schema is conceptually cleaner, easier to maintain, less prone to mistakes, etc.
Please note that you can capture additional measurements or facts using the MeasurementOrFact extension [2]. Using Occurrence as the core it’s only appropriate to capture measurements or fact relating to the species occurrences whereas using Event as the core, you can capture measurements or facts relating to the sampling event (e.g., environmental measurements like sediment temperature and redox potential (Eh)).
[1] https://github.com/gbif/ipt/wiki/IPT2ManualNotes.wiki#darwin-core-mappings https://github.com/gbif/ipt/wiki/IPT2ManualNotes.wiki#darwin-core-mappings [2] http://rs.gbif.org/extension/dwc/measurements_or_facts.xml http://rs.gbif.org/extension/dwc/measurements_or_facts.xml
Regards Quentin
Dr. Quentin Groom (Botany and Information Technology)
Botanic Garden Meise Domein van Bouchout B-1860 Meise Belgium
ORCID: 0000-0002-0596-5376
Landline; +32 (0) 226 009 20 ext. 364 FAX: +32 (0) 226 009 45
E-mail: quentin.groom@plantentuinmeise.be mailto:quentin.groom@plantentuinmeise.be Skype name: qgroom Website: www.botanicgarden.be http://www.botanicgarden.be/
On 11 November 2015 at 14:15, Tim Robertson <trobertson@gbif.org mailto:trobertson@gbif.org> wrote: Thanks Quentin
So if I understand correctly, the event file can be used as a Core, just as Taxon and Occurrence can be Core files.
Yes, that’s correct
Though as there can only be one Core ID I will still need to keep my taxon information in the Occurrence file. Although I don't think this is a problem, it can get a little confusing in the documentation due to the crossover of terms between taxon and occurrence files.
I’m afraid that is the kind of limitation I was eluding to about star schemas… You have to denormalise things into a format which flattens what you might otherwise model as 2 tables.
Currently a "Taxon” can’t be used as an extension, so you would need to use Occurrence. Adding Taxon as an option would be technically possible, but that would be completely decoupled from the occurrences. It would however allow you to have:
Core: Rows of Sampling event documenting e.g. a square on the ground sample on a specific period Extension taxon: List of species observed within the sampling event Extension occurrence: Documented evidence of specimens collected or observed
At the moment though, you would have to express species lists as occurrences, which might make some sense because they are effectively observations.
I'm happy to be a Guinea pig. I'll experiment with the validator if you think this should work and let you know how I get on.
Thanks for this,
All the best, Tim
Regards Quentin
Dr. Quentin Groom (Botany and Information Technology)
Botanic Garden Meise Domein van Bouchout B-1860 Meise Belgium
ORCID: 0000-0002-0596-5376
Landline; +32 (0) 226 009 20 ext. 364 FAX: +32 (0) 226 009 45
E-mail: quentin.groom@plantentuinmeise.be mailto:quentin.groom@plantentuinmeise.be Skype name: qgroom Website: www.botanicgarden.be http://www.botanicgarden.be/
On 11 November 2015 at 11:58, Hannu Saarenmaa <hannu.saarenmaa@helsinki.fi mailto:hannu.saarenmaa@helsinki.fi> wrote: Quentin & Co
It depends what you mean by "survey". I would put each visit to a sampling location (such as a plot) in the event core, and put all the taxa that are observed in a non-core table. The properties of the entire survey (project) would go to the EML metadata.
Hannu
On 2015-11-11 10:20, Quentin Groom wrote:
I'm rather confused how the Darwin Core Star Schema is meant to work for survey data.
Darwin Core can have one of two Core files, taxon or occurrence. The most appropriate for a survey would seem to be occurrence. So I imagine that in the star schema you could also have a related event file detailing the date and location of each survey and a non-core taxon file detailing the taxa that are observed.
However, this does not seem to be possible. The DWC-A validator (http://tools.gbif.org/dwca-validator/ http://tools.gbif.org/dwca-validator/), assumes only on core id in the core file so you can't link an occurrence both to a taxon and to an event. This is also true in the Darwin Core Archive Assistant (http://tools.gbif.org/dwca-assistant/ http://tools.gbif.org/dwca-assistant/). The solution seems to be to put all the information from the taxon core file into the occurrence file, but keep the separate event file linked with the core occurrence id.
Is this correct? It seems rather counter intuitive.
Regards Quentin
Dr. Quentin Groom (Botany and Information Technology)
Botanic Garden Meise Domein van Bouchout B-1860 Meise Belgium
ORCID: 0000-0002-0596-5376
Landline; +32 (0) 226 009 20 ext. 364 FAX: +32 (0) 226 009 45
E-mail: quentin.groom@plantentuinmeise.be mailto:quentin.groom@plantentuinmeise.be Skype name: qgroom Website: www.botanicgarden.be http://www.botanicgarden.be/
IPT mailing list IPT@lists.gbif.org mailto:IPT@lists.gbif.org http://lists.gbif.org/mailman/listinfo/ipt http://lists.gbif.org/mailman/listinfo/ipt
--
Hannu Saarenmaa, Research Director hannu.saarenmaa@uef.fi mailto:hannu.saarenmaa@uef.fi Mobile +358-50-4479668 tel:%2B358-50-4479668
University of Eastern Finland Digitarium, SIB Labs, Joensuu Science Park Länsikatu 15 (P.O. Box 111) FI-80101 Joensuu
www.digitarium.fi/en http://www.digitarium.fi/en - Service Centre for High-Performance Digitisation www.eubon.eu http://www.eubon.eu/ - EU BON - GEO BON - Data Integration and Interoperability
IPT mailing list IPT@lists.gbif.org http://lists.gbif.org/mailman/listinfo/ipt
Thanks Kyle, that's all useful information. Regards Quentin
Dr. Quentin Groom (Botany and Information Technology)
Botanic Garden Meise Domein van Bouchout B-1860 Meise Belgium
ORCID: 0000-0002-0596-5376
Landline; +32 (0) 226 009 20 ext. 364 FAX: +32 (0) 226 009 45
E-mail: quentin.groom@plantentuinmeise.be Skype name: qgroom Website: www.botanicgarden.be
On 11 November 2015 at 19:12, Kyle Braak kbraak@gbif.org wrote:
Dear Quentin,
Thank you for your feedback.
I answer your questions inline below.
Best regards,
Kyle
On 11 Nov 2015, at 17:10, Quentin Groom quentin.groom@plantentuinmeise.be wrote:
Having now experimented a little.
I've had most success with a completely flat file
The validator works OK except for the following 3 errors, which are presumably due to the recent changes to Darwin Core.
- Unknown term
http://rs.tdwg.org/dwc/terms/organismQuantity
mapped to column 9
- Unknown term
http://purl.org/dc/terms/license
mapped to column 16
- Unknown term
http://rs.tdwg.org/dwc/terms/organismQuantityType
mapped to column 8
Exactly right. The validator needs to be updated to work with the latest extension versions, which use the latest Darwin Core terms.
The IPT also seems to accept a flat file and I can get all the fields mapped. However, I'm not clear why it shows zero records in this summary. A problem??? <image.png>
I strongly suspect you are looking at a ‘preview’ page of the unpublished version 1.0. After you publish version 1.0, try viewing its homepage and the number of (core) records will show as you’d expect.
Using, Event as the core file and occurrence as an extension the validator works OK, but it does create an error "The extension data file contains references to core IDs that do not exist:", but I think this is something to do with where it assumes the core IDs are in the file (i.e. not in the 1st column).
The ID field links records from the two sources together. In this case, each occurrence record should link to an event record via its eventID. You can refer to the IPT User Manual's section on mapping [1] for more information. If you get stuck, feel free to write to me for direct assistance.
I think I've also got it working in the IPT, though I'm not yet seeing the benefit of the star format over the flat format. Are the benefits of the Star Schema format only related to the size of the DWC-A, or are there another benefits. If it does just relate to the size then I think it would be best to recommend the flat file format for all but the big users.
There are other benefits. The Star Schema is conceptually cleaner, easier to maintain, less prone to mistakes, etc.
Please note that you can capture additional measurements or facts using the MeasurementOrFact extension [2]. Using Occurrence as the core it’s only appropriate to capture measurements or fact relating to the species occurrences whereas using Event as the core, you can capture measurements or facts relating to the sampling event (e.g., environmental measurements like sediment temperature and redox potential (Eh)).
[1] https://github.com/gbif/ipt/wiki/IPT2ManualNotes.wiki#darwin-core-mappings [2] http://rs.gbif.org/extension/dwc/measurements_or_facts.xml
Regards Quentin
Dr. Quentin Groom (Botany and Information Technology)
Botanic Garden Meise Domein van Bouchout B-1860 Meise Belgium
ORCID: 0000-0002-0596-5376
Landline; +32 (0) 226 009 20 ext. 364 FAX: +32 (0) 226 009 45
E-mail: quentin.groom@plantentuinmeise.be Skype name: qgroom Website: www.botanicgarden.be
On 11 November 2015 at 14:15, Tim Robertson trobertson@gbif.org wrote:
Thanks Quentin
So if I understand correctly, the event file can be used as a Core, just as Taxon and Occurrence can be Core files.
Yes, that’s correct
Though as there can only be one Core ID I will still need to keep my taxon information in the Occurrence file. Although I don't think this is a problem, it can get a little confusing in the documentation due to the crossover of terms between taxon and occurrence files.
I’m afraid that is the kind of limitation I was eluding to about star schemas… You have to denormalise things into a format which flattens what you might otherwise model as 2 tables.
Currently a "Taxon” can’t be used as an extension, so you would need to use Occurrence. Adding Taxon as an option would be technically possible, but that would be completely decoupled from the occurrences. It would however allow you to have:
Core: Rows of Sampling event documenting e.g. a square on the ground sample on a specific period Extension taxon: List of species observed within the sampling event Extension occurrence: Documented evidence of specimens collected or observed
At the moment though, you would have to express species lists as occurrences, which might make some sense because they are effectively observations.
I'm happy to be a Guinea pig. I'll experiment with the validator if you think this should work and let you know how I get on.
Thanks for this,
All the best, Tim
Regards Quentin
Dr. Quentin Groom (Botany and Information Technology)
Botanic Garden Meise Domein van Bouchout B-1860 Meise Belgium
ORCID: 0000-0002-0596-5376
Landline; +32 (0) 226 009 20 ext. 364 FAX: +32 (0) 226 009 45
E-mail: quentin.groom@plantentuinmeise.be Skype name: qgroom Website: www.botanicgarden.be
On 11 November 2015 at 11:58, Hannu Saarenmaa < hannu.saarenmaa@helsinki.fi> wrote:
Quentin & Co
It depends what you mean by "survey". I would put each visit to a sampling location (such as a plot) in the event core, and put all the taxa that are observed in a non-core table. The properties of the entire survey (project) would go to the EML metadata.
Hannu
On 2015-11-11 10:20, Quentin Groom wrote:
I'm rather confused how the Darwin Core Star Schema is meant to work for survey data.
Darwin Core can have one of two Core files, taxon or occurrence. The most appropriate for a survey would seem to be occurrence. So I imagine that in the star schema you could also have a related event file detailing the date and location of each survey and a non-core taxon file detailing the taxa that are observed.
However, this does not seem to be possible. The DWC-A validator ( http://tools.gbif.org/dwca-validator/), assumes only on core id in the core file so you can't link an occurrence both to a taxon and to an event. This is also true in the Darwin Core Archive Assistant ( http://tools.gbif.org/dwca-assistant/). The solution seems to be to put all the information from the taxon core file into the occurrence file, but keep the separate event file linked with the core occurrence id.
Is this correct? It seems rather counter intuitive.
Regards Quentin
Dr. Quentin Groom (Botany and Information Technology)
Botanic Garden Meise Domein van Bouchout B-1860 Meise Belgium
ORCID: 0000-0002-0596-5376
Landline; +32 (0) 226 009 20 ext. 364 FAX: +32 (0) 226 009 45
E-mail: quentin.groom@plantentuinmeise.be Skype name: qgroom Website: www.botanicgarden.be
IPT mailing listIPT@lists.gbif.orghttp://lists.gbif.org/mailman/listinfo/ipt
--
Hannu Saarenmaa, Research Directorhannu.saarenmaa@uef.fi Mobile +358-50-4479668
University of Eastern Finland Digitarium, SIB Labs, Joensuu Science Park Länsikatu 15 (P.O. Box 111) FI-80101 Joensuu www.digitarium.fi/en - Service Centre for High-Performance Digitisationwww.eubon.eu - EU BON - GEO BON - Data Integration and Interoperability
IPT mailing list IPT@lists.gbif.org http://lists.gbif.org/mailman/listinfo/ipt
participants (4)
-
Hannu Saarenmaa
-
Kyle Braak
-
Quentin Groom
-
Tim Robertson