[IPT] How to add his own/custom extension

Julien Husson biology.info at gmail.com
Fri Mar 14 17:13:40 CET 2014


Hi Tim, thanks for your prompt answer.

It's not really what i expected, to be more specific >

First, i have to harvest lot of data from multiple DB (remote or not) with,
of course, different structures/models and format of datas ... sometimes no
DB just flat CSV/xsl files...
That's why i use an IPT to create data mapping and to standardize the data
stream into Dwc standard.
(I'm talking about millions of specimen)

Secondly, i build specific indexes of harvested data with a custom
harvester using canadensys-harvester lib. ( Thanks to Christian). It's at
this point that it's begin to be difficult with a denormalized view of the
data from IPT Dwc-A.
 'Cause i need to transform this denormalized view or raw model into a
normalize view that match with my big relational database model which
become the new repository.

That's why i thought that the custom extensions could be make my life
easier.


On Fri, Mar 14, 2014 at 5:12 PM, Julien Husson <biology.info at gmail.com>wrote:

> Hi Tim, thanks for your prompt answer.
>
> It's not really what i expected, to be more specific >
>
> First, i have to harvest lot of data from multiple DB (remote or not)
> with, of course, different structures/models and format of datas ...
> sometimes no DB just flat CSV/xsl files...
> That's why i use an IPT to create data mapping and to standardize the data
> stream into Dwc standard.
> (I'm talking about millions of specimen)
>
> Secondly, i build specific indexes of harvested data with a custom
> harvester using canadensys-harvester lib. ( Thanks to Christian). It's at
> this point that it's begin to be difficult with a denormalized view of the
> data from IPT Dwc-A.
>  'Cause i need to transform this denormalized view or raw model into a
> normalize view that match with my big relational database model which
> become the new repository.
>
> That's why i thought that the custom extensions could be make my life
> easier.
>
>
>
>
> On Fri, Mar 14, 2014 at 4:32 PM, Tim Robertson [GBIF] <trobertson at gbif.org
> > wrote:
>
>> Hi Julien,
>>
>> Thanks for your question.  It really depends on what you are trying to
>> publish.  We can add extensions of course, but without knowing the
>> specifics it is difficult to comment.
>>
>> However, a "specimen, event, location" DB model would typically map to an
>> Occurrence core with no extensions required - this is the most common use
>> case of Darwin Core and the IPT.  An Occurrence core is basically a
>> denormalized view of the data.
>>
>> If I were the data manager, I would probably consider that I was
>> publishing a "DwC Occurrence view" of my more complex model and as such
>> would keep a view in the database along the lines of:
>>
>> CREATE VIEW view_dwc_occurrence AS
>> SELECT
>>   specimen.bar_code AS occurrenceID,
>>   specimen.name AS scientificName,
>>   location.latitude AS decimalLatitude,
>>   location.longitude AS decimalLongitude,
>>   event.year AS year
>> FROM
>>   specimen
>>   JOIN event ON ...
>>   JOIN location ON ...
>> WHERE
>>   <insert any conditions for record inclusion, such as non endangered etc>
>>
>> Then in my IPT I would simply do "SELECT * FROM view_dwc_occurrence".
>>  Here you are flattening the normalised model into a denormalized DwC view
>> of the data.
>>
>> Maintaining a view in the database layer as opposed to a custom mapping
>> in the IPT benefits you by:
>>   i) catching issues early with database schema changes since the DB will
>> likely stop you with an error
>>   ii) offering an easy mapping of DB table field names, to DwC terms in a
>> language I find very familiar (SQL)
>>   iii) a super simple IPT mapping, as all columns will map automatically
>> in the IPT since they are DwC recognised terms already
>>
>> Does that help in any way?  If not, could you please elaborate on the
>> model and what you are trying to achieve and we'll do all we can.
>>
>> Thanks,
>> Tim
>>
>>
>>
>> On 14 Mar 2014, at 16:06, Julien Husson <biology.info at gmail.com> wrote:
>>
>> Hi,
>>
>> I use Dwc-A to feed my BD.
>>
>> We know the limits of the Dwc star schema to represent a relationnal
>> database.
>>
>> For example in the case of 1-n cardinality :  specimen --- event/record
>> --- location.
>> If i understand the concept, I need to use the Darwin *Core* *Occurrence*
>> *extension*, denormalize my relational model in a big raw model and
>> transform / re-normalize this model to match with my DB model.
>>
>> In order to reduce cost*, *dev and optimize sql statement, it will be
>> grandly appreciate to add custom extension. In this case, i can to be
>> very close of my relational database model and avoid multiple step of dev.
>>
>> I discovered this link but explanantion is now deprecated
>>
>> http://dag-endresen.blogspot.fr/2009/06/adding-extension-for-germplasm-to-gbif.html
>>
>> Thanks,
>>
>> J.
>> _______________________________________________
>> IPT mailing list
>> IPT at lists.gbif.org
>> http://lists.gbif.org/mailman/listinfo/ipt
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.gbif.org/pipermail/ipt/attachments/20140314/be3e598b/attachment.html 


More information about the IPT mailing list