Dear David,
Thank you very much for this thorough feedback, which will be logged as issues and addressed to improve the IPT. We are in a testing phase for a 2.0.2 release which has minor enhancements, so I would anticipate the majority of these being fixed for the 2.0.3 release. It is reassuring to hear that you feel the IPT was easy to use and you don't consider these serious issues.
Thanks again, Tim
On Apr 12, 2011, at 8:52 AM, David Remsen (GBIF) wrote:
Guys - I received this message from one of the checklist award recipients. Fairly detailed set of IPT reviews. Some of these sure worthy of new issues.
DR
Begin forwarded message:
From: David Eades dceades@illinois.edu Date: April 11, 2011 9:21:38 PM GMT+02:00 To: "'David Remsen (GBIF)'" dremsen@gbif.org Subject: IPT feedback notes Reply-To: dceades@illinois.edu
Dear David,
This message is sent in response to the request for feedback about our use of IPT.
Overall, the IPT 2.0.1-r3048 is an easy to use and intuitive interface, it does a good job at processing our tarball datasets and comes with a handy interface to fill out the EML metadata. The installation was done independently by two persons in two countries. The following is a list of the issues mentioned. None of the comments indicate serious problems with a good product. This information is about three weeks old. Some issues may have already been fixed.
- Updating resources from archives: We couldn't find a way in
which an updated archive dataset can be used to replace the existing resource data. The only available method using the IPT web interfaces seems to be by uploading updated versions of the TXT files one by one (using exactly the same file names that were used when creating the resource for the first time), and then press "publish". One alternative is deleting the resource and re-create it from the updated archive, but this option isn't convenient for registered resources. The other alternative is to just replace the TXT files by writing to the file system directly ([data_directory]/resources/[resource_name]/sources/ *.txt) and then press "publish", but is this a supported method?
- Unstable "main" administrator: When the IPT is set up for the
first time, there is an administrator user created and it is the one who appears in messages like "If you don't have an account yet, please ask your IPT administrator to create one for you.: admin_name <mail_address>". However, if later a new user is added with (or an existing user promoted to) an administration role and its mail address alphabetically precedes the current "main" administrator, then this new administrator starts to appear in the aforementioned messages instead of the original one.
- Failed processing archives containing some empty TXT files: When
an archive is uploaded at the resource creation stage, the IPT immediately begins processing it. However, as soon as an empty file is found, the process stops without finish processing the remaining TXT files (but creating the resource with the files it was able to process), and suggesting to run the archive through the validator (which process such files without complaint.) The workaround for this is to have such files contain at least a newline character, but when the resource is published, those files are generated completely empty once more, and when the generated archive is submitted back to the IPT for resource creation, it fails again (despite its generation by the IPT itself.) We think such files should be accepted by the IPT, as them should be considered empty (0 rows) tables. The reason we have some of our datasets with empty extension files is because all of them are generated from equally-capable databases all maintained by the same software (SpeciesFile), but some of them don't have records for all kinds of information yet (like common names for instance), however in future updates such records may begin to appear.
- Sometimes the file sizes are incorrectly reported: When managing a
resource ( /manage/resource.do?r=resource_name ), some file sizes are incorrectly reported. Example: "vernacular [file] 0 bytes, 1 rows, 3 columns." The file contains this one line only, terminated by a newline character: "1";"webspinners";"English". This doesn't seem to be really a problem, but we make the comment just in case it's actually more than a "cosmetic" issue. (The published archive the IPT generates still contains the example line.)
- Validator (http://tools.gbif.org/dwca-validator/ ) and IPT out of
sync: This issue occurred when migrating from the release candidate 3 to the 2.0.1-r3048 version. Previously, the types and specimens extension was of rowtype "http://rs.gbif.org/terms/1.0/Specimen", however now it is no longer recognized by the IPT and "http://rs.gbif.org/terms/1.0/TypesAndSpecimen" must be used instead (which in turn is not recognized by the validator and this new rowtype doesn't have the identificationRemarks term.) Also, with the species profile extension we are using the livingPeriod term, but this one is not recognized by the validator while it is accepted by the IPT.
- Explicitly set vocabularies in meta.xml are not preserved by the
IPT: For several of our columns we are using the vocabularies the IPT comes with and we explicitly advertise that fact in our source meta.xml (for example: <field index="6" term="http://rs.tdwg.org/dwc/terms/taxonRank" vocabulary="http://rs.gbif.org/vocabulary/gbif/rank.xml"/>.) However, when the resource is published, the archive generated by the IPT removes this information. We think that perhaps the IPT should not remove the vocabularies from meta.xml and maybe for those vocabularies the IPT is aware of it should either complain when a row violates the vocabulary or else set the column value to NULL (like the automap option of the value translation page does?)
- When the id and a term in the core file are set to the same column
index the IPT generates a separate column with a duplicated value: Our meta.xml defines the id and taxonID in the same way as in the example at http://rs.tdwg.org/dwc/terms/guides/text/index.htm#implement (id and taxonID both mapped to column 0.) However, when we publish the resource, the IPT keeps the id at column 0, but also creates a new column for taxonID containing the same value column 0 has. This is not much of a problem, but in doing so it takes more space than necessary.
- The documentation is heavily biased toward Linux
(IPTServerPreparation).
No support of Windows Server environments was apparent. We elected to implement a standalone Linux server, rather than try to use Microsoft interoperability tools under Windows Server.
It seems that the presumption is that the installer knows the names and locations of the relevant files, which are often not explicitly stated.
No step-by-step instruction is available. The section related to Tomcat comprises about 9 sentences. The time it required to research and implement the actual steps was several hours, shortened by liberal use of a Linux-familiar collaborator.
- No criteria for selection of the server infrastructure are
provided. This appears to be oriented toward making a given pre-existing web server installation adapt to the IPT. Since in our case, we created a server from the ground up, we selected an Ubuntu/Tomcat approach for no better reason than we had some minimal familiarity with them, and access to greater expertise.
- Maintenance/version upgrade process (Starting Over) is
similarly sparsely detailed, and described in general terms. Since in-house expertise is oriented toward WS2008 and IIS 7, the management and maintenance of this web application in the selected environment is by no means obvious. The initial setup eventually worked, but the structure of the web server and relationships among the programs and files was essentially mysterious, without reference material like: how web applications are managed under Tomcat.
- Pictorial information is limited to post-installation data
management user operations. Installation-related graphics would have been useful. A disclaimer emphasizing the value of Linux expertise would prevent the mistaken assumption that this is a cookbook installation. This section of the documentation appears to be where the effort was concentrated, but this was not the area of my activity (setting up the server and site).
Hope this information is useful.
participants (1)
-
Tim Robertson (GBIF)