Hi Julien,
If I understand correctly you have a data flow/process with:
a) multiple input sources:
- local databases of various schemas
- remote databases of various schemas
- CSV files of various flavours
b) you use an IPT as a staging area, and map the input into DwC
c) you then harvest datasets from the IPT into a normalised database model
And you’re finding that the intermediary step in b) is causing issue due to the limited nature of the format (star schema and only a few extensions supported)?
Can I ask please - are there other reasons you use the IPT? E.g. are you publishing to Canadensys or GBIF for example? Or perhaps you use it for easier metadata authoring.
Some things perhaps worth considering:
- is the IPT the best tool for this? Defining an intermediate format, and scripting this with SQL exports might be a better model. We have used Scriptella (
http://scriptella.javaforge.com/) for several data migration projects and found it very good for transforming across schemas.
- If you really do want to create extensions, can you please elaborate on what they would be? Extensions are in general intended to be reusable by a wider community (e.g. standardised publishing of images) and without knowing more, it is not easy to comment if extensions are a good way to achieve what you want, or if you are really just looking for general purpose “extract transform load” (ETL) tools, for which there are many on the web; like scriptella.
I hope these comments come across as constructive and not a nuisance.
Cheers,
Tim
Hi Tim, thanks for your prompt answer.
It's not really what i expected, to be more specific >
First,
i have to harvest lot of data from multiple DB (remote or not) with, of
course, different structures/models and format of datas ... sometimes
no DB just flat CSV/xsl files...
That's why i use an IPT to create data mapping and to standardize the data stream into Dwc standard.
(
I'm talking about millions of specimen)
Secondly, i build specific indexes of harvested data with
a custom harvester using canadensys-harvester lib. ( Thanks to
Christian). It's at this point that it's begin to be difficult with a
denormalized view of the data from IPT Dwc-A.
'Cause i need to transform this denormalized view or raw
model into a normalize view that match with my big relational database
model which become the new repository.
That's why i thought that the custom extensions could be make my life easier.
_______________________________________________
IPT mailing list
IPT@lists.gbif.org
http://lists.gbif.org/mailman/listinfo/ipt