[IPT] coreid (lowercase "i") vs coreId in meta.xml - schema validation

Stoner, Dan F dstoner at acis.ufl.edu
Thu Dec 12 16:53:24 UTC 2019


Fantastic, I see the updated schema and the meta.xml validation error has gone away.

thanks!

Dan Stoner
iDigBio / ACIS Laboratory
University of Florida


________________________________________
From: IPT <ipt-bounces at lists.gbif.org> on behalf of Matthew Blissett <mblissett at gbif.org>
Sent: Thursday, December 12, 2019 11:08 AM
To: ipt at lists.gbif.org
Subject: Re: [IPT] coreid (lowercase "i") vs coreId in meta.xml - schema validation

[External Email]

Hi Dan,

On 12/12/2019 16:30, Stoner, Dan F wrote:
> I found some oddities and I am not exactly sure where to go next.
>
> We are noticing the following while processing meta.xml in darwin core archives produced by IPT (and other servers):
>
> Schema validation failed, continuing unvalidated
> XMLSyntaxError: Element '{https://urldefense.proofpoint.com/v2/url?u=http-3A__rs.tdwg.org_dwc_text_-257Dcoreid&d=DwIGaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=GhEALp5fgUuEr_myFMqdby27w-SUjMv06c7EippE1CE&m=bnkWo_ToB27TIVHP8znLmPtSg9efDH4PSHiPCHLCYiw&s=fO3_RmmAqrY_FB7CQUQFrR425KMvSAdch8_tGWKT7F8&e= ': This element is not expected. Expected is ( {https://urldefense.proofpoint.com/v2/url?u=http-3A__rs.tdwg.org_dwc_text_-257DcoreId&d=DwIGaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=GhEALp5fgUuEr_myFMqdby27w-SUjMv06c7EippE1CE&m=bnkWo_ToB27TIVHP8znLmPtSg9efDH4PSHiPCHLCYiw&s=bYzUXTURlEkia4j-Edo6JwrC48ykHIv7XbMfkQQ1tsw&e=  )

There's some background to this on this issue:
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_tdwg_dwc_issues_143&d=DwIGaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=GhEALp5fgUuEr_myFMqdby27w-SUjMv06c7EippE1CE&m=bnkWo_ToB27TIVHP8znLmPtSg9efDH4PSHiPCHLCYiw&s=gRf6kX3lnldAqOlqGquTkImKRbRyA2aIheuV8TcRQBA&e=

The schema itself and the documentation were conflicting, and this was
fixed (in mine and Tim's opinion) the wrong way, by changing the schema.

*I've just pushed a commit to fix it the right way,* i.e. reflecting 99%
actual usage and leaving the schema as it was for almost a decade.

Although we do accept either, we still see only 31 datasets registered
in GBIF with "coreId" rather than "coreid".

> It seems like most consumers are not actually validating meta.xml using the schema, and the producers are generating files out of compliance with the schema.
>
> Most of the Darwin Core archives I have manually inspected and tried to validate contain meta.xml with lowercase "i" in coreid despite the Standard indicating capital "I" in coreId.
>
>
> I poked at the GBIF Darwin Core Validator 3 code repo and found this:
>
> schema.meta=https://urldefense.proofpoint.com/v2/url?u=https-3A__raw.githubusercontent.com_tdwg_dwc_master_standard_documents_text_tdwg-5Fdwc-5Ftext.xsd-2Chttp-3A__rs.tdwg.org_dwc_text_tdwg-5Fdwc-5Ftext.xsd&d=DwIGaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=GhEALp5fgUuEr_myFMqdby27w-SUjMv06c7EippE1CE&m=bnkWo_ToB27TIVHP8znLmPtSg9efDH4PSHiPCHLCYiw&s=gvEhG3t7NeRxRoKNiM0vjl_H_I5MbRH9xQ4lf5CGHGw&e=
>
>
> The first link leads to 404, the second leads to an xsd that contains the proper coreId.  So maybe the Validator is not being "strict" about validation against the schema?

I suspect it has been running for so long that, when the validator
process was originally started, both URLs were valid, and had coreid or
one of each.

Cheers,

Matt


_______________________________________________
IPT mailing list
IPT at lists.gbif.org
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gbif.org_mailman_listinfo_ipt&d=DwIGaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=GhEALp5fgUuEr_myFMqdby27w-SUjMv06c7EippE1CE&m=bnkWo_ToB27TIVHP8znLmPtSg9efDH4PSHiPCHLCYiw&s=6zz2vV5m4BIw8XvyD2UNq11B_y0niKljWfEB8Nc8YEM&e=


More information about the IPT mailing list