[IPT] IPT feedback notes

Tim Robertson (GBIF) trobertson at gbif.org
Tue Apr 12 09:21:01 CEST 2011


Dear David,

Thank you very much for this thorough feedback, which will be logged  
as issues and addressed to improve the IPT.  We are in a testing phase  
for a 2.0.2 release which has minor enhancements, so I would  
anticipate the majority of these being fixed for the 2.0.3 release.
It is reassuring to hear that you feel the IPT was easy to use and you  
don't consider these serious issues.

Thanks again,
Tim


On Apr 12, 2011, at 8:52 AM, David Remsen (GBIF) wrote:

> Guys - I received this message from one of the checklist award  
> recipients.  Fairly detailed set of IPT reviews.  Some of these sure  
> worthy of new issues.
>
> DR
>
> Begin forwarded message:
>
>> From: David Eades <dceades at illinois.edu>
>> Date: April 11, 2011 9:21:38 PM GMT+02:00
>> To: "'David Remsen \(GBIF\)'" <dremsen at gbif.org>
>> Subject: IPT feedback notes
>> Reply-To: <dceades at illinois.edu>
>>
>> Dear David,
>>
>> This message is sent in response to the request for feedback about  
>> our use
>> of IPT.
>>
>> Overall, the IPT 2.0.1-r3048 is an easy to use and intuitive  
>> interface, it
>> does a good job at processing our tarball datasets and comes with a  
>> handy
>> interface to fill out the EML metadata.  The installation was done
>> independently by two persons in two countries. The following is a  
>> list of
>> the issues mentioned. None of the comments indicate serious  
>> problems with a
>> good product. This information is about three weeks old.  Some  
>> issues may
>> have already been fixed.
>>
>> 1. Updating resources from archives: We couldn't find a way in  
>> which an
>> updated archive dataset can be used to replace the existing resource
>> data. The only available method using the IPT web interfaces seems to
>> be by uploading updated versions of the TXT files one by one (using
>> exactly the same file names that were used when creating the resource
>> for the first time), and then press "publish". One alternative is
>> deleting the resource and re-create it from the updated archive, but
>> this option isn't convenient for registered resources. The other
>> alternative is to just replace the TXT files by writing to the file
>> system directly ([data_directory]/resources/[resource_name]/sources/ 
>> *.txt)
>> and then
>> press "publish", but is this a supported method?
>>
>> 2. Unstable "main" administrator: When the IPT is set up for the  
>> first
>> time, there is an administrator user created and it is the one who
>> appears in messages like "If you don't have an account yet, please  
>> ask
>> your IPT administrator to create one for you.: admin_name
>> <mail_address>". However, if later a new user is added with (or an
>> existing user promoted to) an administration role and its mail  
>> address
>> alphabetically precedes the current "main" administrator, then this
>> new administrator starts to appear in the aforementioned messages
>> instead of the original one.
>>
>> 3. Failed processing archives containing some empty TXT files: When  
>> an
>> archive is uploaded at the resource creation stage, the IPT  
>> immediately
>> begins processing it. However, as soon as an empty file is found, the
>> process stops without finish processing the remaining TXT files (but
>> creating the resource with the files it was able to process), and
>> suggesting to run the archive through the validator (which process
>> such files without complaint.) The workaround for this is to have
>> such files contain at least a newline character, but when the  
>> resource
>> is published, those files are generated completely empty once more,
>> and when the generated archive is submitted back to the IPT for
>> resource creation, it fails again (despite its generation by the IPT
>> itself.) We think such files should be accepted by the IPT, as them
>> should be considered empty (0 rows) tables. The reason we have some  
>> of
>> our datasets with empty extension files is because all of them are
>> generated from equally-capable databases all maintained by the same
>> software (SpeciesFile), but some of them don't have records for all
>> kinds of information yet (like common names for instance), however in
>> future updates such records may begin to appear.
>>
>> 4. Sometimes the file sizes are incorrectly reported: When managing a
>> resource ( /manage/resource.do?r=resource_name ), some file sizes are
>> incorrectly reported. Example: "vernacular [file]  0 bytes, 1 rows, 3
>> columns." The file contains this one line only, terminated by a
>> newline character: "1";"webspinners";"English". This doesn't seem to
>> be really a problem, but we make the comment just in case it's
>> actually more than a "cosmetic" issue. (The published archive the IPT
>> generates still contains the example line.)
>>
>> 5. Validator (http://tools.gbif.org/dwca-validator/ ) and IPT out of
>> sync: This issue occurred when migrating from the release candidate 3
>> to the 2.0.1-r3048 version. Previously, the types and specimens
>> extension was of rowtype "http://rs.gbif.org/terms/1.0/Specimen",
>> however now it is no longer recognized by the IPT and
>> "http://rs.gbif.org/terms/1.0/TypesAndSpecimen" must be used instead
>> (which in turn is not recognized by the validator and this new
>> rowtype doesn't have the identificationRemarks term.) Also, with the
>> species profile extension we are using the livingPeriod term, but  
>> this
>> one is not recognized by the validator while it is accepted by the
>> IPT.
>>
>> 6. Explicitly set vocabularies in meta.xml are not preserved by the
>> IPT: For several of our columns we are using the vocabularies the IPT
>> comes with and we explicitly advertise that fact in our source
>> meta.xml (for example: <field index="6"
>> term="http://rs.tdwg.org/dwc/terms/taxonRank"
>> vocabulary="http://rs.gbif.org/vocabulary/gbif/rank.xml"/>.) However,
>> when the resource is published, the archive generated by the IPT
>> removes this information. We think that perhaps the IPT should not
>> remove the vocabularies from meta.xml and maybe for those  
>> vocabularies
>> the IPT is aware of it should either complain when a row violates the
>> vocabulary or else set the column value to NULL (like the automap  
>> option
>> of the value translation page does?)
>>
>> 7. When the id and a term in the core file are set to the same column
>> index the IPT generates a separate column with a duplicated value:  
>> Our
>> meta.xml defines the id and taxonID in the same way as in the example
>> at http://rs.tdwg.org/dwc/terms/guides/text/index.htm#implement (id
>> and taxonID both mapped to column 0.) However, when we publish the
>> resource, the IPT keeps the id at column 0, but also creates a new
>> column for taxonID containing the same value column 0 has. This is  
>> not
>> much of a problem, but in doing so it takes more space than  
>> necessary.
>>
>> 8. The documentation is heavily biased toward Linux  
>> (IPTServerPreparation).
>>
>> No support of Windows Server environments was apparent.  We elected  
>> to
>> implement
>> a standalone Linux server, rather than try to use Microsoft  
>> interoperability
>> tools
>> under Windows Server.
>>
>> It seems that the presumption is that the installer knows the
>> names and locations of the relevant files, which are often not  
>> explicitly
>> stated.
>>
>> No step-by-step instruction is available.  The section related
>> to Tomcat comprises about 9 sentences.  The time it required to
>> research and implement the actual steps was several hours, shortened
>> by liberal use of a Linux-familiar collaborator.
>>
>> 9. No criteria for selection of the server infrastructure are
>> provided.  This appears to be oriented toward making a given
>> pre-existing web server installation adapt to the IPT.  Since in our
>> case, we created a server from the ground up, we selected an
>> Ubuntu/Tomcat approach for no better reason than we had some minimal
>> familiarity with them, and access to greater expertise.
>>
>> 10. Maintenance/version upgrade process (Starting Over) is
>> similarly sparsely detailed, and described in general terms.  Since
>> in-house expertise is oriented toward WS2008 and IIS 7, the  
>> management
>> and maintenance of this web application in the selected environment  
>> is
>> by no means obvious.  The initial setup eventually worked, but the
>> structure of the web server and relationships among the programs and
>> files was essentially mysterious, without reference material like:  
>> how
>> web applications are managed under Tomcat.
>>
>> 11. Pictorial information is limited to post-installation data
>> management user operations.  Installation-related graphics would have
>> been useful.  A disclaimer emphasizing the value of Linux expertise
>> would prevent the mistaken assumption that this is a cookbook
>> installation.  This section of the documentation appears to be where
>> the effort was concentrated, but this was not the area of my activity
>> (setting up the server and site).
>>
>> Hope this information is useful.
>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.gbif.org/pipermail/ipt/attachments/20110412/2da27cb3/attachment-0001.html 


More information about the IPT mailing list