[IPT] [tdwg-content] Reverting the process of DwC standardization

Menashe' Eliezer menashe.eliezer at gmail.com
Thu Oct 29 15:05:47 CET 2015

Resending the same message due to a subscription problem.


2015-10-29 12:15 GMT+01:00 Menashe' Eliezer <menashe.eliezer at gmail.com>:

> Hello,
> Please see my updated suggestion at
> https://github.com/gbif/ipt/issues/1165
> IMHO Open Refine is not the right tool. One can simply use org.apache.poi
> in his Java application for reading all the information from the different
> files inside the DwC, and create an ODS file with the combined matrix,
> which takes into consideration also possible parentEventID. I'm sorry I
> don't have time to do it myself.
> I hope it's clear.
> --
> Menashè
> 2015-10-28 18:57 GMT+01:00 Shorthouse, David <
> david.shorthouse at umontreal.ca>:
>> All,
>> Is part of the issue being expressed here because the raw ecological data
>> sets we're discussing are small-ish matrices rather than occurrences, with
>> site codes as columns, taxa as rows and measures of density/abundance as
>> cells (and similar for environmental variables)? Such structures are often
>> used as input for software that executes eg ordinations, classification &
>> regression trees, species richness estimates. The shortcoming of such a
>> structure is the inherent idiosyncratic nature of "site codes", with
>> variable numbers of them, i.e. an arbitrary number of columns. I doubt it
>> was ever designed for ease of dataset integration, but rather for ease of
>> computation. Representing this structure as Event core requires significant
>> transposition & potential for error if it were manual. Open Refine is one
>> such tool that could permit bi-directional transpositions (DwC -> matrix
>> and then matrix -> DwC), but it is still clunky and accommodation of
>> extensions is virtually non-existent. But, perhaps Open Refine recipes and
>> guides gets us one step closer to finding a balance between the need for
>> standardized representation & efficient transport (DwC) vs. end-users who
>> want matrices for ease of computation.
>> David P. Shorthouse
>> On Tue, Oct 27, 2015 at 7:36 AM, David Valentim Dias <dvdias at sibbr.gov.br
>> > wrote:
>>> Hi again,
>>> I think the problem target both. DwC because is a solution to a problem
>>> creating another problem to researchers less "skilled" in table
>>> manipulation. Ecological data with occurrence is resulting in three tables
>>> and manipulation of these are getting harder with the number of core or
>>> extensions used.
>>> Two possible solutions comes in mind: create a new term describing the
>>> original layout of the columns (so we can use csvjoin like Menashe suggest)
>>> or ipt with option to store the original table associated with resource.
>>> We can always use external links in eml and save the file somewhere but
>>> this means creating another service and managing more login (aka resource
>>> cost and new problems).
>>> I think any solution will need ipt changes.
>>> 2015-10-27 9:08 GMT-02:00 Menashe' Eliezer <menashe.eliezer at gmail.com>:
>>>> Hi Tim,
>>>> I believe that the IPT feature I've requested long ago could be helpful
>>>> for David: https://github.com/gbif/ipt/issues/1165
>>>> Consumers and also the data providers don't have a DwC-A viewer, and
>>>> they need to join the separate csv files for having one table in a
>>>> worksheet.
>>>> Web applications like the one at OBIS website do let end users download
>>>> one big table.
>>>> Best regards,
>>>> Menashè
>>>> 2015-10-27 9:53 GMT+01:00 Tim Robertson <trobertson at gbif.org>:
>>>>> Hi David
>>>>> (CC’ing the IPT list as this might be an IPT specific thread -
>>>>> http://lists.gbif.org/mailman/listinfo/ipt)
>>>>> For clarification - is your question specific to the DwC-A standard
>>>>> which is possible as Alex says or is it specific to the IPT tool please?
>>>>> Do you imagine a scenario where you’d effectively map the same
>>>>> extension 2 times - once to interpreted and once to verbatim - or do you
>>>>> envisage a different data schema for each?
>>>>> Thanks,
>>>>> Tim
>>>>> On 23 Oct 2015, at 16:00, Alex Thompson <godfoder at acis.ufl.edu> wrote:
>>>>> David,
>>>>> It's certainly possible, within the context of a Darwin Core Archive,
>>>>> to include other files within the ZIP file that lie outside the schema of
>>>>> the archive. Both GBIF and iDigBio do this when generating downloads for
>>>>> various reasons (RIGHTS & LICENSE files, additional EML metadata, etc).
>>>>> However, I do not believe it is possible to do this within IPT. You might
>>>>> submit an issue on the IPT issue tracker (
>>>>> https://github.com/gbif/ipt/issues) for potential inclusion of this
>>>>> feature in a future version of IPT.
>>>>> There are workarounds you can use to include additional data in Darwin
>>>>> Core archives, but none of them will exactly match your old format. For
>>>>> instance, including an additional Occurrence file with the values as JSON
>>>>> in dynamicProperties or in some other verbatim format in the
>>>>> occurrenceRemarks field. Both of those would at least give some method of
>>>>> single-row access (vs joining multiple measurementOrFacts to a single event
>>>>> id) if that is the primary concern, even if they would require additional
>>>>> parsing steps to be useful.
>>>>> Alex Thompson
>>>>> iDigBio Infrastructure
>>>>> On 10/23/2015 09:40 AM, David Valentim Dias wrote:
>>>>> Dear colleagues,
>>>>> Here on SiBBr we're using the new eventCore and measurementOrFacts and
>>>>> after the process of standardization to DwC and publishing we think some
>>>>> users/researchers will want the "original" table format because of multiple
>>>>> reasons.
>>>>> Is possible to have a vertabimTable or some place where we can store
>>>>> the original table/column format?
>>>>> Regards
>> _______________________________________________
>> IPT mailing list
>> IPT at lists.gbif.org
>> http://lists.gbif.org/mailman/listinfo/ipt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gbif.org/pipermail/ipt/attachments/20151029/e8bd27ae/attachment-0001.html>

More information about the IPT mailing list