Thanks Marie. Just quick answers (im currently on leave)
1. BIE isnt required, but there should be an index on the biocache service machine in the usual place (/data/lucence/namematching). This will then be used for taxon resolution.
2. Im surprised this causes an issue. Whitespace in those codes can be an issue.
3. Can you supply more detail ? A NPE would suggest a bug or bad config. The way we index large datasets is to use the offline method of indexing using the "bulk-processor" option in the command line tool.
Dave