[Ala-portal] [Indexation] Questions
Marie Elise Lecoq
melecoq at gbif.fr
Wed May 25 23:55:02 CEST 2016
Thanks a lot Dave, especially if you are currently in leave :-)!
1. This index should be Catalog of Life if I have understood well. Maybe, I
should create a new name index (using the nameindexer tool) with the
backbone taxonomy list from GBIF.
2. It works with others codes that contain whitespaces. The only
difference that I can see between those codes are punctuation.
3. Sorry for my first explanation not really helpful ! :-). Actually, I was
wrong, it's not a NFE.
The error takes place before the indexation itself, it happens when I try
to create the data resource (using GBIF tool or directly by creating a
dataresource and then uploading a ZIP file).
The DwC is downloaded and directly after, I got the error (see error track
I think that the error come from this function (
so I guess it is when the zip file is unzipped.
2016-05-23 16:56:08,179 INFO [DataResourceController] Downloading file:
2016-05-23 16:56:37,965 INFO
[org.jasig.cas.services.DefaultServicesManagerImpl] - <Reloading registered
2016-05-23 16:56:37,976 DEBUG
[org.jasig.cas.services.DefaultServicesManagerImpl] - <Adding registered
2016-05-23 16:56:37,976 INFO
[org.jasig.cas.services.DefaultServicesManagerImpl] - <Loaded 1 services.>
2016-05-23 16:57:57,911 INFO [GbifService] dr172 null null
2016-05-23 16:57:58,155 ERROR [DataResourceController] JSONObject["guid"]
org.codehaus.groovy.grails.web.json.JSONException: JSONObject["guid"] not
On Wed, May 25, 2016 at 12:26 PM, <David.Martin at csiro.au> wrote:
> Thanks Marie. Just quick answers (im currently on leave)
> 1. BIE isnt required, but there should be an index on the biocache service
> machine in the usual place (/data/lucence/namematching). This will then be
> used for taxon resolution.
> 2. Im surprised this causes an issue. Whitespace in those codes can be an
> 3. Can you supply more detail ? A NPE would suggest a bug or bad config.
> The way we index large datasets is to use the offline method of indexing
> using the "bulk-processor" option in the command line tool.
> *From:* Ala-portal <ala-portal-bounces at lists.gbif.org> on behalf of Marie
> Elise Lecoq <melecoq at gbif.fr>
> *Sent:* 25 May 2016 03:36
> *To:* ala-portal at lists.gbif.org
> *Subject:* [Ala-portal] [Indexation] Questions
> Hi all !
> I have few questions about the indexation :
> 1. It seems that some occurrences are wrongly indexed. For example, if I
> search "Pica Pica", the three first results will be not relevant (
> http://recherche.gbif.fr/occurrences/search?taxa=Pica+pica). Do I need to
> change something on the nameindexer ? I don't have a BIE instance on our
> system, do I need to install one in order to help ?
> 2. We have some provider codes with punctuation (e.g. comma, dot ). It's
> seems that the link between collection, institution and dataresource is not
> made due to this. It works with accents.
> 3. I try to index a data resource with more than 20 million occurrences
> and I have a NullPointerException, it's seems that guid is not found. I can
> upload data resource with much less data inside so I guess the problem
> comme from the data resource itself (size ?). Do you have a special way to
> deal with huge data resource ?
> Thanks in advance for your help :-)!
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Ala-portal