IPT

Download

ipt@lists.gbif.org

March 2014

2 participants
2 discussions

Re: [IPT] How to add his own/custom extension
by Julien Husson 09 Apr '14

09 Apr '14

Hi Tim, thanks for your prompt answer. It's not really what i expected, to be more specific > First, i have to harvest lot of data from multiple DB (remote or not) with, of course, different structures/models and format of datas ... sometimes no DB just flat CSV/xsl files... That's why i use an IPT to create data mapping and to standardize the data stream into Dwc standard. (I'm talking about millions of specimen) Secondly, i build specific indexes of harvested data with a custom harvester using canadensys-harvester lib. ( Thanks to Christian). It's at this point that it's begin to be difficult with a denormalized view of the data from IPT Dwc-A. 'Cause i need to transform this denormalized view or raw model into a normalize view that match with my big relational database model which become the new repository. That's why i thought that the custom extensions could be make my life easier. On Fri, Mar 14, 2014 at 5:12 PM, Julien Husson <biology.info(a)gmail.com>wrote: > Hi Tim, thanks for your prompt answer. > > It's not really what i expected, to be more specific > > > First, i have to harvest lot of data from multiple DB (remote or not) > with, of course, different structures/models and format of datas ... > sometimes no DB just flat CSV/xsl files... > That's why i use an IPT to create data mapping and to standardize the data > stream into Dwc standard. > (I'm talking about millions of specimen) > > Secondly, i build specific indexes of harvested data with a custom > harvester using canadensys-harvester lib. ( Thanks to Christian). It's at > this point that it's begin to be difficult with a denormalized view of the > data from IPT Dwc-A. > 'Cause i need to transform this denormalized view or raw model into a > normalize view that match with my big relational database model which > become the new repository. > > That's why i thought that the custom extensions could be make my life > easier. > > > > > On Fri, Mar 14, 2014 at 4:32 PM, Tim Robertson [GBIF] <trobertson(a)gbif.org > > wrote: > >> Hi Julien, >> >> Thanks for your question. It really depends on what you are trying to >> publish. We can add extensions of course, but without knowing the >> specifics it is difficult to comment. >> >> However, a "specimen, event, location" DB model would typically map to an >> Occurrence core with no extensions required - this is the most common use >> case of Darwin Core and the IPT. An Occurrence core is basically a >> denormalized view of the data. >> >> If I were the data manager, I would probably consider that I was >> publishing a "DwC Occurrence view" of my more complex model and as such >> would keep a view in the database along the lines of: >> >> CREATE VIEW view_dwc_occurrence AS >> SELECT >> specimen.bar_code AS occurrenceID, >> specimen.name AS scientificName, >> location.latitude AS decimalLatitude, >> location.longitude AS decimalLongitude, >> event.year AS year >> FROM >> specimen >> JOIN event ON ... >> JOIN location ON ... >> WHERE >> <insert any conditions for record inclusion, such as non endangered etc> >> >> Then in my IPT I would simply do "SELECT * FROM view_dwc_occurrence". >> Here you are flattening the normalised model into a denormalized DwC view >> of the data. >> >> Maintaining a view in the database layer as opposed to a custom mapping >> in the IPT benefits you by: >> i) catching issues early with database schema changes since the DB will >> likely stop you with an error >> ii) offering an easy mapping of DB table field names, to DwC terms in a >> language I find very familiar (SQL) >> iii) a super simple IPT mapping, as all columns will map automatically >> in the IPT since they are DwC recognised terms already >> >> Does that help in any way? If not, could you please elaborate on the >> model and what you are trying to achieve and we'll do all we can. >> >> Thanks, >> Tim >> >> >> >> On 14 Mar 2014, at 16:06, Julien Husson <biology.info(a)gmail.com> wrote: >> >> Hi, >> >> I use Dwc-A to feed my BD. >> >> We know the limits of the Dwc star schema to represent a relationnal >> database. >> >> For example in the case of 1-n cardinality : specimen --- event/record >> --- location. >> If i understand the concept, I need to use the Darwin *Core* *Occurrence* >> *extension*, denormalize my relational model in a big raw model and >> transform / re-normalize this model to match with my DB model. >> >> In order to reduce cost*, *dev and optimize sql statement, it will be >> grandly appreciate to add custom extension. In this case, i can to be >> very close of my relational database model and avoid multiple step of dev. >> >> I discovered this link but explanantion is now deprecated >> >> http://dag-endresen.blogspot.fr/2009/06/adding-extension-for-germplasm-to-g… >> >> Thanks, >> >> J. >> _______________________________________________ >> IPT mailing list >> IPT(a)lists.gbif.org >> http://lists.gbif.org/mailman/listinfo/ipt >> >> >> >

2 2

How to add his own/custom extension
by Julien Husson 14 Mar '14

14 Mar '14

Hi, I use Dwc-A to feed my BD. We know the limits of the Dwc star schema to represent a relationnal database. For example in the case of 1-n cardinality : specimen --- event/record --- location. If i understand the concept, I need to use the Darwin *Core* *Occurrence* *extension*, denormalize my relational model in a big raw model and transform / re-normalize this model to match with my DB model. In order to reduce cost*, *dev and optimize sql statement, it will be grandly appreciate to add custom extension. In this case, i can to be very close of my relational database model and avoid multiple step of dev. I discovered this link but explanantion is now deprecated http://dag-endresen.blogspot.fr/2009/06/adding-extension-for-germplasm-to-g… Thanks, J.

2 1