Re: [IPT] How to add his own/custom extension

14 Mar 2014

      Hi Tim, thanks for your prompt answer.

It's not really what i expected, to be more specific >

First, i have to harvest lot of data from multiple DB (remote or not) with,
of course, different structures/models and format of datas ... sometimes no
DB just flat CSV/xsl files...
That's why i use an IPT to create data mapping and to standardize the data
stream into Dwc standard.
(I'm talking about millions of specimen)

Secondly, i build specific indexes of harvested data with a custom
harvester using canadensys-harvester lib. ( Thanks to Christian). It's at
this point that it's begin to be difficult with a denormalized view of the
data from IPT Dwc-A.
 'Cause i need to transform this denormalized view or raw model into a
normalize view that match with my big relational database model which
become the new repository.

That's why i thought that the custom extensions could be make my life
easier.

On Fri, Mar 14, 2014 at 5:12 PM, Julien Husson <biology.info@gmail.com>wrote:
...
Hi Tim, thanks for your prompt answer.
It's not really what i expected, to be more specific >
First, i have to harvest lot of data from multiple DB (remote or not)
with, of course, different structures/models and format of datas ...
sometimes no DB just flat CSV/xsl files...
That's why i use an IPT to create data mapping and to standardize the data
stream into Dwc standard.
(I'm talking about millions of specimen)
Secondly, i build specific indexes of harvested data with a custom
harvester using canadensys-harvester lib. ( Thanks to Christian). It's at
this point that it's begin to be difficult with a denormalized view of the
data from IPT Dwc-A.
 'Cause i need to transform this denormalized view or raw model into a
normalize view that match with my big relational database model which
become the new repository.
That's why i thought that the custom extensions could be make my life
easier.
On Fri, Mar 14, 2014 at 4:32 PM, Tim Robertson [GBIF] <trobertson@gbif.org
...
wrote:
...
Hi Julien,
Thanks for your question.  It really depends on what you are trying to
publish.  We can add extensions of course, but without knowing the
specifics it is difficult to comment.
However, a "specimen, event, location" DB model would typically map to an
Occurrence core with no extensions required - this is the most common use
case of Darwin Core and the IPT.  An Occurrence core is basically a
denormalized view of the data.
If I were the data manager, I would probably consider that I was
publishing a "DwC Occurrence view" of my more complex model and as such
would keep a view in the database along the lines of:
CREATE VIEW view_dwc_occurrence AS
SELECT
  specimen.bar_code AS occurrenceID,
  specimen.name AS scientificName,
  location.latitude AS decimalLatitude,
  location.longitude AS decimalLongitude,
  event.year AS year
FROM
  specimen
  JOIN event ON ...
  JOIN location ON ...
WHERE
  <insert any conditions for record inclusion, such as non endangered etc>
Then in my IPT I would simply do "SELECT * FROM view_dwc_occurrence".
 Here you are flattening the normalised model into a denormalized DwC view
of the data.
Maintaining a view in the database layer as opposed to a custom mapping
in the IPT benefits you by:
  i) catching issues early with database schema changes since the DB will
likely stop you with an error
  ii) offering an easy mapping of DB table field names, to DwC terms in a
language I find very familiar (SQL)
  iii) a super simple IPT mapping, as all columns will map automatically
in the IPT since they are DwC recognised terms already
Does that help in any way?  If not, could you please elaborate on the
model and what you are trying to achieve and we'll do all we can.
Thanks,
Tim
On 14 Mar 2014, at 16:06, Julien Husson <biology.info@gmail.com> wrote:
Hi,
I use Dwc-A to feed my BD.
We know the limits of the Dwc star schema to represent a relationnal
database.
For example in the case of 1-n cardinality :  specimen --- event/record
--- location.
If i understand the concept, I need to use the Darwin *Core* *Occurrence*
*extension*, denormalize my relational model in a big raw model and
transform / re-normalize this model to match with my DB model.
In order to reduce cost*, *dev and optimize sql statement, it will be
grandly appreciate to add custom extension. In this case, i can to be
very close of my relational database model and avoid multiple step of dev.
I discovered this link but explanantion is now deprecated
http://dag-endresen.blogspot.fr/2009/06/adding-extension-for-germplasm-to-gb...
Thanks,
J.
_______________________________________________
IPT mailing list
IPT@lists.gbif.org
http://lists.gbif.org/mailman/listinfo/ipt