Dear David,

Thank you very much for this thorough feedback, which will be logged as issues and addressed to improve the IPT.  We are in a testing phase for a 2.0.2 release which has minor enhancements, so I would anticipate the majority of these being fixed for the 2.0.3 release.
It is reassuring to hear that you feel the IPT was easy to use and you don't consider these serious issues.  

Thanks again,
Tim


On Apr 12, 2011, at 8:52 AM, David Remsen (GBIF) wrote:

Guys - I received this message from one of the checklist award recipients.  Fairly detailed set of IPT reviews.  Some of these sure worthy of new issues.

DR

Begin forwarded message:

From: David Eades <dceades@illinois.edu>
Date: April 11, 2011 9:21:38 PM GMT+02:00
To: "'David Remsen \(GBIF\)'" <dremsen@gbif.org>
Subject: IPT feedback notes

Dear David,

This message is sent in response to the request for feedback about our use
of IPT.

Overall, the IPT 2.0.1-r3048 is an easy to use and intuitive interface, it
does a good job at processing our tarball datasets and comes with a handy
interface to fill out the EML metadata.  The installation was done
independently by two persons in two countries. The following is a list of
the issues mentioned. None of the comments indicate serious problems with a
good product. This information is about three weeks old.  Some issues may
have already been fixed.

1. Updating resources from archives: We couldn't find a way in which an
updated archive dataset can be used to replace the existing resource
data. The only available method using the IPT web interfaces seems to
be by uploading updated versions of the TXT files one by one (using
exactly the same file names that were used when creating the resource
for the first time), and then press "publish". One alternative is
deleting the resource and re-create it from the updated archive, but
this option isn't convenient for registered resources. The other
alternative is to just replace the TXT files by writing to the file
system directly ([data_directory]/resources/[resource_name]/sources/*.txt)
and then
press "publish", but is this a supported method?

2. Unstable "main" administrator: When the IPT is set up for the first
time, there is an administrator user created and it is the one who
appears in messages like "If you don't have an account yet, please ask
your IPT administrator to create one for you.: admin_name
<mail_address>". However, if later a new user is added with (or an
existing user promoted to) an administration role and its mail address
alphabetically precedes the current "main" administrator, then this
new administrator starts to appear in the aforementioned messages
instead of the original one.

3. Failed processing archives containing some empty TXT files: When an
archive is uploaded at the resource creation stage, the IPT immediately
begins processing it. However, as soon as an empty file is found, the
process stops without finish processing the remaining TXT files (but
creating the resource with the files it was able to process), and
suggesting to run the archive through the validator (which process
such files without complaint.) The workaround for this is to have
such files contain at least a newline character, but when the resource
is published, those files are generated completely empty once more,
and when the generated archive is submitted back to the IPT for
resource creation, it fails again (despite its generation by the IPT
itself.) We think such files should be accepted by the IPT, as them
should be considered empty (0 rows) tables. The reason we have some of
our datasets with empty extension files is because all of them are
generated from equally-capable databases all maintained by the same
software (SpeciesFile), but some of them don't have records for all
kinds of information yet (like common names for instance), however in
future updates such records may begin to appear.

4. Sometimes the file sizes are incorrectly reported: When managing a
resource ( /manage/resource.do?r=resource_name ), some file sizes are
incorrectly reported. Example: "vernacular [file]  0 bytes, 1 rows, 3
columns." The file contains this one line only, terminated by a
newline character: "1";"webspinners";"English". This doesn't seem to
be really a problem, but we make the comment just in case it's
actually more than a "cosmetic" issue. (The published archive the IPT
generates still contains the example line.)

5. Validator (http://tools.gbif.org/dwca-validator/ ) and IPT out of
sync: This issue occurred when migrating from the release candidate 3
to the 2.0.1-r3048 version. Previously, the types and specimens
extension was of rowtype "http://rs.gbif.org/terms/1.0/Specimen",
however now it is no longer recognized by the IPT and
"http://rs.gbif.org/terms/1.0/TypesAndSpecimen" must be used instead
(which in turn is not recognized by the validator and this new
rowtype doesn't have the identificationRemarks term.) Also, with the
species profile extension we are using the livingPeriod term, but this
one is not recognized by the validator while it is accepted by the
IPT.

6. Explicitly set vocabularies in meta.xml are not preserved by the
IPT: For several of our columns we are using the vocabularies the IPT
comes with and we explicitly advertise that fact in our source
meta.xml (for example: <field index="6"
term="http://rs.tdwg.org/dwc/terms/taxonRank"
vocabulary="http://rs.gbif.org/vocabulary/gbif/rank.xml"/>.) However,
when the resource is published, the archive generated by the IPT
removes this information. We think that perhaps the IPT should not
remove the vocabularies from meta.xml and maybe for those vocabularies
the IPT is aware of it should either complain when a row violates the
vocabulary or else set the column value to NULL (like the automap option
of the value translation page does?)

7. When the id and a term in the core file are set to the same column
index the IPT generates a separate column with a duplicated value: Our
meta.xml defines the id and taxonID in the same way as in the example
at http://rs.tdwg.org/dwc/terms/guides/text/index.htm#implement (id
and taxonID both mapped to column 0.) However, when we publish the
resource, the IPT keeps the id at column 0, but also creates a new
column for taxonID containing the same value column 0 has. This is not
much of a problem, but in doing so it takes more space than necessary.

8. The documentation is heavily biased toward Linux (IPTServerPreparation).

No support of Windows Server environments was apparent.  We elected to
implement
a standalone Linux server, rather than try to use Microsoft interoperability
tools
under Windows Server.

It seems that the presumption is that the installer knows the
names and locations of the relevant files, which are often not explicitly
stated.

No step-by-step instruction is available.  The section related
to Tomcat comprises about 9 sentences.  The time it required to
research and implement the actual steps was several hours, shortened
by liberal use of a Linux-familiar collaborator.

9. No criteria for selection of the server infrastructure are
provided.  This appears to be oriented toward making a given
pre-existing web server installation adapt to the IPT.  Since in our
case, we created a server from the ground up, we selected an
Ubuntu/Tomcat approach for no better reason than we had some minimal
familiarity with them, and access to greater expertise.

10. Maintenance/version upgrade process (Starting Over) is
similarly sparsely detailed, and described in general terms.  Since
in-house expertise is oriented toward WS2008 and IIS 7, the management
and maintenance of this web application in the selected environment is
by no means obvious.  The initial setup eventually worked, but the
structure of the web server and relationships among the programs and
files was essentially mysterious, without reference material like: how
web applications are managed under Tomcat.

11. Pictorial information is limited to post-installation data
management user operations.  Installation-related graphics would have
been useful.  A disclaimer emphasizing the value of Linux expertise
would prevent the mistaken assumption that this is a cookbook
installation.  This section of the documentation appears to be where
the effort was concentrated, but this was not the area of my activity
(setting up the server and site).

Hope this information is useful.