Tim,
 
how will the GBIF Indexer store the original DwC record? Is it kept in a text field in the Index database? In the original DwC archive?
 
Jörg


Von: ipt-bounces@lists.gbif.org [mailto:ipt-bounces@lists.gbif.org] Im Auftrag von Tim Robertson (GBIF)
Gesendet: Mittwoch, 15. September 2010 13:29
An: ipt@lists.gbif.org
Betreff: Re: [IPT] Functionality request: ADMIN checking data before GBIFregistration

Thanks all for the comments, which I will try and collate here with some resolutions:

Propose resolved:
a) The preference would be to allow the ADMIN to provide Registration privileges to individual MANAGERS

Outstanding issues:
b) Visualisations are important, and will need discussion and potentially further IPT modules developed, or deployment of external services.  
I know of one group considering the development of DwC-A visualisations already, and technologies like Google Fusion tables makes this kind of thing trivial. 

c) record resolution is something that has been indicated as important.  This can be achieved in one of 2 ways:
  > implementation in the IPT, and requires research into technologies to perform satisfactorily
     Potential technologies might be
       - Berkeley DB
       - A relational database, such as Mysql or H2
       - Lucene indexing 
  > reliance on a "stable cache" for record serving

The first release (for test purposes) of the revised IPT software will not have record level serving, but while this is being developed, I would like to ask people to start discussing what kind of record level serving is truly a requirement in the IPT, as opposed to a "nice to have".  We support DwC-A in the GBIF portal, and the intention is to simply reserve the record that came from the DwC-A directly, unless the record indicates there is further information on a URL (e.g. if the record identifier is an LSID).  Would this strategy not be suitable for the likes of the BioCASe portals as often there is no further information to redirect to?  in the case of the IPT, there is no extra information, and I propose should the source be a DwC-A, that individual records be cached in the harvesting portal.  With this approach, there would be no individual record serving needs in the IPT.

Ultimately, we might consider aiming for data owners offering single records on a resolvable URL, and conforming to Linked/Open Data requirements, along with a DwC-A effectively providing a single "index" view of the dataset.  The DwC-A records would reference the originals by resolvable ID so any search system would always be able to point back to the authoritative source.  This would effectively be distributed indexing, and not dissimilar to the sharing of sitemaps, but with extra information to enable better discovery. 

Thank you all for this feedback, and please correct any misunderstandings on my part

Tim




 
  

On Sep 15, 2010, at 12:03 PM, Mihail-Constantin Carausu wrote:

Dear Tim

I think the development team's mentioned approach is a workable solution
to cover both requirements at this stage.
However, I think Hannu (and me) had in mind a kind of "Basket of
approvals"-alike functionality in the Admin's (owner of the provider)
administration section side: When a Manager has been published a dataset
through the IPT, this will automatically trigger a request for approval
or submits an yes/no event in the Admin's administration section. The
Admin must finally active interfere and approve the dataset publication
(e.g. by checking an "Approved" check box in the basket of approvals
list with events/datasets in the administration section) at the absolute
latest stage (e.g. when GBIF just needs to start to index it, or
something like that). Without this final approval the dataset will still
be published and visible through the IPT but not visible/searchable on
the GBIF data portal. This approach is not necessarily in contradiction
with the Manager's ability to autonomously publish datasets within the
IPT, only it puts this ability always under control from the central
administration section when the dataset has to go to the GBIF data
portal.
I think both solutions/approaches have obvious advantages and
disadvantages while none of them provides a 100% protection against
publishing something odd by a (test) user.
I have a little question regarding the development team's proposed
solutions:  is it not possible for the central Admin to enable the
publishing ability for some "trusted" managers and disable this for
others inside the same instance of the IPT.

Now I saw Hannu's new message just arrived, sorry for eventually
unsynchronized double-crossing messages, but I will send this anyhow.

Best regards,
Mihail

----------------------------
Mihail Carausu
MSc.Eng., Informatics Manager
Danish Biodiversity Information Facility (DanBIF)
--------------------------------------------


-----Original Message-----
From: ipt-bounces@lists.gbif.org [mailto:ipt-bounces@lists.gbif.org] On
Behalf Of Tim Robertson (GBIF)
Sent: 15. september 2010 09:43
To: ipt@lists.gbif.org
Subject: [IPT] Functionality request: ADMIN checking data before
GBIFregistration

Hi all,

Hannu has raised a request for the following to be satisfied by the IPT:
"- Publishing a resource must be accepted by the owner of the  
provider.  It has happened that a test user publishes something odd  
which goes all the way to the data portal without nobody controlling  
it."

This is a contradiction to the requests of others, and specifically  
those wishing to promote basic "data hosting centers", who request  
that a data MANAGER should be able to work autonomously.

After discussion with the developers the proposal is to implement the  
following, which we hope satisfies both requirements:
In the Administration section, an ADMIN can choose to enable or  
disable the ability for MANAGERS to register resources with GBIF.  By  
default MANAGERS can register a resource, but an ADMIN can disable  
this through this check box.

If anyone has any concerns or comments on this approach, please can  
you raise them on this list?

Many thanks,
Tim






_______________________________________________
IPT mailing list
IPT@lists.gbif.org
http://lists.gbif.org/mailman/listinfo/ipt