I saw something similar recently when I upgraded a 2.0 IPT instance to 2.3.2. The old IPT instance had resources that were using an old Audubon Core media extension. The presentation of the issue was the same apparent failure at "the core ID field occurrenceID" as you were seeing, with a NullPointerException behind that, and apparently the two versions of the Audubon Core media extension pointing to different columns as the core ID, with what appeared to be the mapping in the UI not corresponding to the mapping that was actually being used.
Not the same as what you are seeing (NullPointerException and no stack trace, instead of an IOException and a stack trace), but the initial presentation apparently at validating core ID values was similar.
Here are some of my notes on the issue:
"Process creates temporary file occurrence.txt, then multimedia.txt, then shows publishing log messages about checking for IDs, then generates a sorted_occurrence.txt file, which ends up at the same size as occurrence.txt, eml file and metadata are written to the temporary directory, and then the process fails with a message that mentions a NullPointerException, but doesn't give a source. Tomcat log files don't appear to contain the exception or a stack trace, though tomcat does appear to be configured to log at that level."
"Increased logging through debug mode in IPT is not informative. Sort of occurrence.txt appears to succeed, then exception is thrown some time around generation of eml file, with no message in the log files."
"Cloned IPT source from github, checked out commit for current 2.3.2 release, got the release to build (had to edit the pom for com.lowagie:itext:4.2.2 for maven to build, the pom includes a redirect to a newer source for itext (which doesn't work as rtf generation has been removed...)). Once build was working, located source of the minimal human readable failure message (ResourceManagerImpl.java line 750 "dwca.failed"). Added a log4j Logger to ResourceManagerImpl, then log.error(e.getMessage(),e); to the catch (ExecutionException e) block about line 745. Then tried to republish the main source. This threw the null pointer exception, with an informative stack trace that pointed to the mapping of occurrenceId in the audubon core extension being incorrect (there may be a bug in the update of the extension). Deleted and remapped the audubon core mapping from the [failing] resources, publication of each was then successful. List of installed extensions shows two installed audubon core media extensions [with the same name]"
"Audubon Core media extension update appeared to have same name, but wasn't compatible with the installed version (it doesn't flatten the URIs of different types to allow mapping of both a thumbnail and a best quality image). Unmapped the older one from both resources [that were using it], deleted it, and remapped both resources onto the current audubon core extension"
-Paul
On Wed, 2 Mar 2016 15:59:39 +0100 Peter Desmet peter.desmet@inbo.be wrote:
Hi,
We're trying to publish version 45.4 of this dataset: http://data.inbo.be/ipt/resource?r=florabank1-occurrences, but the validation seems to fail. Here's the publication log: http://data.inbo.be/ipt/publicationlog.do?r=florabank1-occurrences
As far as I can tell, the validation fails on "the core ID field occurrenceID is always present and unique", but we have verified this in the generated dwca-45.4.zip file, and all records have a unique occurrenceID.
Any idea what might be going on? Possible causes:
- The dataset is quite big (3,5 million records)
- We've just solved this issue:
https://github.com/LifeWatchINBO/data-publication/issues/104 by following Kyle Braak's instructions. The latest published version is now 45.3, the current (to be published) version is 45.4, so everything seems fine there. 3. Even though the publication failed, the following files are created in /resource/florabank1-occurrences:
eml-45.4.xml dwca-45.4,zip florabank1-occurrences-45.4.rtf
Will those file be overwritten if I try to republish or might they be causing the publication to fail?
Thanks,
Peter _______________________________________________ IPT mailing list IPT@lists.gbif.org http://lists.gbif.org/mailman/listinfo/ipt