<div dir="ltr"><div><div><div>Hi Tim, thanks for your prompt answer.<br><br></div>It's not really what i expected, to be more specific ><br><br></div>First,
i have to harvest lot of data from multiple DB (remote or not) with, of
course, different structures/models and format of datas ... sometimes
no DB just flat CSV/xsl files...<br>
That's why i use an IPT to create data mapping and to standardize the data stream into Dwc standard.<br>(<span lang="en"><span>I'm talking about</span> <span>millions of</span> <span>specimen)</span></span><br>
<br></div><div>Secondly, i build specific indexes of harvested data with
a custom harvester using canadensys-harvester lib. ( Thanks to
Christian). It's at this point that it's begin to be difficult with a
denormalized view of the data from IPT Dwc-A.<br>
</div><div> 'Cause i need to transform this denormalized view or raw
model into a normalize view that match with my big relational database
model which become the new repository.<br><br></div>That's why i thought that the custom extensions could be make my life easier.</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Mar 14, 2014 at 5:12 PM, Julien Husson <span dir="ltr"><<a href="mailto:biology.info@gmail.com" target="_blank">biology.info@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div>Hi Tim, thanks for your prompt answer.<br><br></div>It's not really what i expected, to be more specific ><br>
<br></div>First, i have to harvest lot of data from multiple DB (remote or not) with, of course, different structures/models and format of datas ... sometimes no DB just flat CSV/xsl files...<br>
That's why i use an IPT to create data mapping and to standardize the data stream into Dwc standard.<br>(<span lang="en"><span>I'm talking about</span> <span>millions of</span> <span>specimen)</span></span><br>
<br></div><div>Secondly, i build specific indexes of harvested data with a custom harvester using canadensys-harvester lib. ( Thanks to Christian). It's at this point that it's begin to be difficult with a denormalized view of the data from IPT Dwc-A.<br>
</div><div> 'Cause i need to transform this denormalized view or raw model into a normalize view that match with my big relational database model which become the new repository.<br><br></div><div>That's why i thought that the custom extensions could be make my life easier.<span lang="en"><span></span></span></div>
<div><div><div><br><br></div></div></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Mar 14, 2014 at 4:32 PM, Tim Robertson [GBIF] <span dir="ltr"><<a href="mailto:trobertson@gbif.org" target="_blank">trobertson@gbif.org</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word">Hi Julien,<div><br></div><div>Thanks for your question. It really depends on what you are trying to publish. We can add extensions of course, but without knowing the specifics it is difficult to comment.</div>
<div><br></div><div>However, a “specimen, event, location” DB model would typically map to an Occurrence core with no extensions required - this is the most common use case of Darwin Core and the IPT. An Occurrence core is basically a denormalized view of the data.</div>
<div><br></div><div>If I were the data manager, I would probably consider that I was publishing a “DwC Occurrence view” of my more complex model and as such would keep a view in the database along the lines of:</div><div>
<br></div><div>CREATE VIEW view_dwc_occurrence AS</div><div>SELECT</div><div> specimen.bar_code AS occurrenceID,</div><div> <a href="http://specimen.name" target="_blank">specimen.name</a> AS scientificName,</div><div>
location.latitude AS decimalLatitude,</div>
<div> location.longitude AS decimalLongitude,</div><div> event.year AS year</div><div>FROM </div><div> specimen </div><div> JOIN event ON … </div><div> JOIN location ON …</div><div>WHERE</div><div> <insert any conditions for record inclusion, such as non endangered etc></div>
<div><br></div><div>Then in my IPT I would simply do “SELECT * FROM view_dwc_occurrence”. Here you are flattening the normalised model into a denormalized DwC view of the data.</div><div><br></div><div>Maintaining a view in the database layer as opposed to a custom mapping in the IPT benefits you by:</div>
<div> i) catching issues early with database schema changes since the DB will likely stop you with an error</div><div> ii) offering an easy mapping of DB table field names, to DwC terms in a language I find very familiar (SQL)</div>
<div> iii) a super simple IPT mapping, as all columns will map automatically in the IPT since they are DwC recognised terms already</div><div><br></div><div>Does that help in any way? If not, could you please elaborate on the model and what you are trying to achieve and we’ll do all we can.</div>
<div><br></div><div>Thanks,</div><div>Tim</div><div><br></div><div> </div><div><br><div><div><div><div>On 14 Mar 2014, at 16:06, Julien Husson <<a href="mailto:biology.info@gmail.com" target="_blank">biology.info@gmail.com</a>> wrote:</div>
<br></div></div><blockquote type="cite"><div><div><div dir="ltr"><div><div><div>Hi,<br><br></div><div>I use Dwc-A to feed my BD.<br></div><div><br></div>We know the limits of the Dwc star schema to represent a relationnal database.<br>
<br><span lang="en">For example in the case of 1-n cardinality : specimen --- event/record --- location.<br>
</span><span lang="en">If i understand the concept, I need to use </span><span lang="en">the Darwin <b>Core</b> <b>Occurrence</b> <b>extension</b>, denormalize my relational model in a big raw model and transform / re-normalize this model to match with my DB model</span>.<br>
<br><span lang="en"><span>In</span> <span>order to reduce cost<em>, </em>dev<em> </em>and<em> </em>optimize sql statement, it will be </span></span><span lang="en"><span lang="en"><span>grandly</span></span> appreciate to add custom extension. In this case, i can to be very close of my relational database model and avoid multiple step of dev.<br>
</span></div><span lang="en"><span></span></span></div><span lang="en"><span></span></span><div><div><br>I discovered this link but explanantion is now deprecated<br>
<a href="http://dag-endresen.blogspot.fr/2009/06/adding-extension-for-germplasm-to-gbif.html" target="_blank">http://dag-endresen.blogspot.fr/2009/06/adding-extension-for-germplasm-to-gbif.html</a><br><br></div><div>Thanks,<br>
<br></div>
<div>J.<br></div></div></div></div></div>
_______________________________________________<br>IPT mailing list<br><a href="mailto:IPT@lists.gbif.org" target="_blank">IPT@lists.gbif.org</a><br><a href="http://lists.gbif.org/mailman/listinfo/ipt" target="_blank">http://lists.gbif.org/mailman/listinfo/ipt</a><br>
</blockquote></div><br></div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>