Hi Tim, thanks for your prompt answer.
It's not really what i expected, to be more specific >
First, i have to harvest lot of data from multiple DB (remote or not) with,
of course, different structures/models and format of datas ... sometimes no
DB just flat CSV/xsl files...
That's why i use an IPT to create data mapping and to standardize the data
stream into Dwc standard.
(I'm talking about millions of specimen)
Secondly, i build specific indexes of harvested data with a custom
harvester using canadensys-harvester lib. ( Thanks to Christian). It's at
this point that it's begin to be difficult with a denormalized view of the
data from IPT Dwc-A.
'Cause i need to transform this denormalized view or raw model into a
normalize view that match with my big relational database model which
become the new repository.
That's why i thought that the custom extensions could be make my life
easier.
On Fri, Mar 14, 2014 at 5:12 PM, Julien Husson <biology.info(a)gmail.com>wrote:
> Hi Tim, thanks for your prompt answer.
>
> It's not really what i expected, to be more specific >
>
> First, i have to harvest lot of data from multiple DB (remote or not)
> with, of course, different structures/models and format of datas ...
> sometimes no DB just flat CSV/xsl files...
> That's why i use an IPT to create data mapping and to standardize the data
> stream into Dwc standard.
> (I'm talking about millions of specimen)
>
> Secondly, i build specific indexes of harvested data with a custom
> harvester using canadensys-harvester lib. ( Thanks to Christian). It's at
> this point that it's begin to be difficult with a denormalized view of the
> data from IPT Dwc-A.
> 'Cause i need to transform this denormalized view or raw model into a
> normalize view that match with my big relational database model which
> become the new repository.
>
> That's why i thought that the custom extensions could be make my life
> easier.
>
>
>
>
> On Fri, Mar 14, 2014 at 4:32 PM, Tim Robertson [GBIF] <trobertson(a)gbif.org
> > wrote:
>
>> Hi Julien,
>>
>> Thanks for your question. It really depends on what you are trying to
>> publish. We can add extensions of course, but without knowing the
>> specifics it is difficult to comment.
>>
>> However, a "specimen, event, location" DB model would typically map to an
>> Occurrence core with no extensions required - this is the most common use
>> case of Darwin Core and the IPT. An Occurrence core is basically a
>> denormalized view of the data.
>>
>> If I were the data manager, I would probably consider that I was
>> publishing a "DwC Occurrence view" of my more complex model and as such
>> would keep a view in the database along the lines of:
>>
>> CREATE VIEW view_dwc_occurrence AS
>> SELECT
>> specimen.bar_code AS occurrenceID,
>> specimen.name AS scientificName,
>> location.latitude AS decimalLatitude,
>> location.longitude AS decimalLongitude,
>> event.year AS year
>> FROM
>> specimen
>> JOIN event ON ...
>> JOIN location ON ...
>> WHERE
>> <insert any conditions for record inclusion, such as non endangered etc>
>>
>> Then in my IPT I would simply do "SELECT * FROM view_dwc_occurrence".
>> Here you are flattening the normalised model into a denormalized DwC view
>> of the data.
>>
>> Maintaining a view in the database layer as opposed to a custom mapping
>> in the IPT benefits you by:
>> i) catching issues early with database schema changes since the DB will
>> likely stop you with an error
>> ii) offering an easy mapping of DB table field names, to DwC terms in a
>> language I find very familiar (SQL)
>> iii) a super simple IPT mapping, as all columns will map automatically
>> in the IPT since they are DwC recognised terms already
>>
>> Does that help in any way? If not, could you please elaborate on the
>> model and what you are trying to achieve and we'll do all we can.
>>
>> Thanks,
>> Tim
>>
>>
>>
>> On 14 Mar 2014, at 16:06, Julien Husson <biology.info(a)gmail.com> wrote:
>>
>> Hi,
>>
>> I use Dwc-A to feed my BD.
>>
>> We know the limits of the Dwc star schema to represent a relationnal
>> database.
>>
>> For example in the case of 1-n cardinality : specimen --- event/record
>> --- location.
>> If i understand the concept, I need to use the Darwin *Core* *Occurrence*
>> *extension*, denormalize my relational model in a big raw model and
>> transform / re-normalize this model to match with my DB model.
>>
>> In order to reduce cost*, *dev and optimize sql statement, it will be
>> grandly appreciate to add custom extension. In this case, i can to be
>> very close of my relational database model and avoid multiple step of dev.
>>
>> I discovered this link but explanantion is now deprecated
>>
>> http://dag-endresen.blogspot.fr/2009/06/adding-extension-for-germplasm-to-g…
>>
>> Thanks,
>>
>> J.
>> _______________________________________________
>> IPT mailing list
>> IPT(a)lists.gbif.org
>> http://lists.gbif.org/mailman/listinfo/ipt
>>
>>
>>
>
Hi,
I use Dwc-A to feed my BD.
We know the limits of the Dwc star schema to represent a relationnal
database.
For example in the case of 1-n cardinality : specimen --- event/record ---
location.
If i understand the concept, I need to use the Darwin *Core* *Occurrence*
*extension*, denormalize my relational model in a big raw model and
transform / re-normalize this model to match with my DB model.
In order to reduce cost*, *dev and optimize sql statement, it will be
grandly appreciate to add custom extension. In this case, i can to be very
close of my relational database model and avoid multiple step of dev.
I discovered this link but explanantion is now deprecated
http://dag-endresen.blogspot.fr/2009/06/adding-extension-for-germplasm-to-g…
Thanks,
J.