1. BIE isnt required, but there should be an index on the biocache service machine in the usual place (/data/lucence/namematching). This will then be used for taxon resolution.

2. Im surprised this causes an issue. Whitespace in those codes can be an issue.

3. Can you supply more detail ? A NPE would suggest a bug or bad config. The way we index large datasets is to use the offline method of indexing using the "bulk-processor" option in the command line tool.

From: Ala-portal <ala-portal-bounces@lists.gbif.org> on behalf of Marie Elise Lecoq <melecoq@gbif.fr>
Sent: 25 May 2016 03:36
To: ala-portal@lists.gbif.org
Subject: [Ala-portal] [Indexation] Questions

Hi all !

I have few questions about the indexation :

1. It seems that some occurrences are wrongly indexed. For example, if I search "Pica Pica", the three first results will be not relevant (http://recherche.gbif.fr/occurrences/search?taxa=Pica+pica). Do I need to change something on the nameindexer ? I don't have a BIE instance on our system, do I need to install one in order to help ?

2. We have some provider codes with punctuation (e.g. comma, dot ). It's seems that the link between collection, institution and dataresource is not made due to this. It works with accents.

3. I try to index a data resource with more than 20 million occurrences and I have a NullPointerException, it's seems that guid is not found. I can upload data resource with much less data inside so I guess the problem comme from the data resource itself (size ?). Do you have a special way to deal with huge data resource ?

Thanks in advance for your help :-)!

Cheers,

Marie