star schema functionality request
Markus, Tim, It is a concern for EOL usage of the IPT that the star schema is missing the functionality to add extension to another extension and not just always to the main DWC. The denormalized approach of doing actually this doesn't look good and providers will not like it.
Another concern is the size of the recordset if we will (denormalize) repeat the main data object many times to accommodate many references, agents, audience type associated with it.
We will continue testing IPT's usage for EOL while trying to minimize the overhead of the denormalized approach. But it would really be helpful to add the functionality making the star schema more relational.
Thanks. Eli
Hi Eli,
A star schema for data transfer is indeed limited in scope, but offers a lot more than is possible with a flat file, and supports many use cases we face in the GBIF network related to serving occurrence oriented and taxon/species oriented data. Therefore I don't foresee the IPT handling a more relational model than the star in the *near* future due to the complexity of designing the user interfaces and output models to support this. Are you thinking of developing this functionality yourself using the IPT as a base platform? We can branch the source head if you get a patch for this.
If you are looking to serve very normalised data, perhaps asking your providers to support a standard exchange format is probably the best option, which I believe EOL does now. You might consider evaluating if TapirLink supports a more normalised format, to provide a tool that providers can install and map to your output model.
Best wishes,
Tim
Markus, Tim, It is a concern for EOL usage of the IPT that the star schema is missing the functionality to add extension to another extension and not just always to the main DWC. The denormalized approach of doing actually this doesn't look good and providers will not like it.
Another concern is the size of the recordset if we will (denormalize) repeat the main data object many times to accommodate many references, agents, audience type associated with it.
We will continue testing IPT's usage for EOL while trying to minimize the overhead of the denormalized approach. But it would really be helpful to add the functionality making the star schema more relational.
Thanks. Eli
IPT mailing list IPT@lists.gbif.org http://lists.gbif.org/mailman/listinfo/ipt
____________________________________________________________ Tim Robertson Systems Architect Global Biodiversity Information Facility Secretariat (GBIF) Universitetsparken 15, 2100 Copenhagen Ø, Denmark http://www.gbif.org trobertson@gbif.org Phone: +45 35 32 14 87 (Office) Fax: +45 35 32 14 80 ____________________________________________________________
Eli, if you are certain that you will need to reproduce complex and highly normalised data you might want to consider plain RDF. There are tools like D2RQ that do the relational db mapping, but they have their own problems:
http://www4.wiwiss.fu-berlin.de/bizer/d2rq/
Markus
On May 10, 2009, at 11:16 AM, Tim Robertson (GBIF) wrote:
Hi Eli,
A star schema for data transfer is indeed limited in scope, but offers a lot more than is possible with a flat file, and supports many use cases we face in the GBIF network related to serving occurrence oriented and taxon/species oriented data. Therefore I don't foresee the IPT handling a more relational model than the star in the *near* future due to the complexity of designing the user interfaces and output models to support this. Are you thinking of developing this functionality yourself using the IPT as a base platform? We can branch the source head if you get a patch for this.
If you are looking to serve very normalised data, perhaps asking your providers to support a standard exchange format is probably the best option, which I believe EOL does now. You might consider evaluating if TapirLink supports a more normalised format, to provide a tool that providers can install and map to your output model.
Best wishes,
Tim
Markus, Tim, It is a concern for EOL usage of the IPT that the star schema is missing the functionality to add extension to another extension and not just always to the main DWC. The denormalized approach of doing actually this doesn't look good and providers will not like it.
Another concern is the size of the recordset if we will (denormalize) repeat the main data object many times to accommodate many references, agents, audience type associated with it.
We will continue testing IPT's usage for EOL while trying to minimize the overhead of the denormalized approach. But it would really be helpful to add the functionality making the star schema more relational.
Thanks. Eli
IPT mailing list IPT@lists.gbif.org http://lists.gbif.org/mailman/listinfo/ipt
Tim Robertson Systems Architect Global Biodiversity Information Facility Secretariat (GBIF) Universitetsparken 15, 2100 Copenhagen Ø, Denmark http://www.gbif.org trobertson@gbif.org Phone: +45 35 32 14 87 (Office) Fax: +45 35 32 14 80 ____________________________________________________________
IPT mailing list IPT@lists.gbif.org http://lists.gbif.org/mailman/listinfo/ipt
Hi Tim, Markus, Thanks for all your suggestions. We will try to see the best path for EOL.
Anyway, we've finished creating the EOL data object extension for IPT. We continued working on it because IPT can definitely still be used by many EOL providers who have good technical support and willing to undergo the IPT process.
We can send you the extension today. Thanks. Eli
----- Original Message ----- From: "Tim Robertson (GBIF)" trobertson@gbif.org To: "Eli Agbayani" eagbayani@eol.org Cc: ipt@lists.gbif.org, "Patrick Leary" pleary@eol.org Sent: Sunday, May 10, 2009 5:16:03 AM (GMT-0500) America/New_York Subject: Re: [IPT] star schema functionality request
Hi Eli,
A star schema for data transfer is indeed limited in scope, but offers a lot more than is possible with a flat file, and supports many use cases we face in the GBIF network related to serving occurrence oriented and taxon/species oriented data. Therefore I don't foresee the IPT handling a more relational model than the star in the *near* future due to the complexity of designing the user interfaces and output models to support this. Are you thinking of developing this functionality yourself using the IPT as a base platform? We can branch the source head if you get a patch for this.
If you are looking to serve very normalised data, perhaps asking your providers to support a standard exchange format is probably the best option, which I believe EOL does now. You might consider evaluating if TapirLink supports a more normalised format, to provide a tool that providers can install and map to your output model.
Best wishes,
Tim
Markus, Tim, It is a concern for EOL usage of the IPT that the star schema is missing the functionality to add extension to another extension and not just always to the main DWC. The denormalized approach of doing actually this doesn't look good and providers will not like it.
Another concern is the size of the recordset if we will (denormalize) repeat the main data object many times to accommodate many references, agents, audience type associated with it.
We will continue testing IPT's usage for EOL while trying to minimize the overhead of the denormalized approach. But it would really be helpful to add the functionality making the star schema more relational.
Thanks. Eli
IPT mailing list IPT@lists.gbif.org http://lists.gbif.org/mailman/listinfo/ipt
____________________________________________________________ Tim Robertson Systems Architect Global Biodiversity Information Facility Secretariat (GBIF) Universitetsparken 15, 2100 Copenhagen Ø, Denmark http://www.gbif.org trobertson@gbif.org Phone: +45 35 32 14 87 (Office) Fax: +45 35 32 14 80 ____________________________________________________________
participants (3)
-
"Markus Döring (GBIF)"
-
Eli Agbayani
-
Tim Robertson (GBIF)