[Ala-portal] [Indexation] Questions

Marie Elise Lecoq melecoq at gbif.fr
Wed May 25 23:55:02 CEST 2016

Thanks a lot Dave, especially if you are currently in leave :-)!

1. This index should be Catalog of Life if I have understood well. Maybe, I
should create a new name index (using the nameindexer tool) with the
backbone taxonomy list from GBIF.

2. It works with others codes that contain whitespaces.  The only
difference that I can see between those codes are punctuation.

3. Sorry for my first explanation not really helpful ! :-). Actually, I was
wrong, it's not a NFE.
The error takes place before the indexation itself, it happens when I try
to create the data resource (using GBIF tool or directly by creating a
dataresource and then uploading a ZIP file).
The DwC is downloaded and directly after, I got the error (see error track
I think that the error come from this function (
so I guess it is when the zip file is unzipped.


2016-05-23 16:56:08,179 INFO  [DataResourceController]  Downloading file:
2016-05-23 16:56:37,965 INFO
[org.jasig.cas.services.DefaultServicesManagerImpl] - <Reloading registered
2016-05-23 16:56:37,976 DEBUG
[org.jasig.cas.services.DefaultServicesManagerImpl] - <Adding registered
service ^(https?|imaps?)://.*>
2016-05-23 16:56:37,976 INFO
[org.jasig.cas.services.DefaultServicesManagerImpl] - <Loaded 1 services.>
2016-05-23 16:57:57,911 INFO  [GbifService]  dr172  null null
2016-05-23 16:57:58,155 ERROR [DataResourceController]  JSONObject["guid"]
not found.
org.codehaus.groovy.grails.web.json.JSONException: JSONObject["guid"] not
at com.brandseye.cors.CorsFilter.doFilter(CorsFilter.java:82)
at java.lang.Thread.run(Thread.java:745)


On Wed, May 25, 2016 at 12:26 PM, <David.Martin at csiro.au> wrote:

> Thanks Marie. Just quick answers (im currently on leave)
> 1. BIE isnt required, but there should be an index on the biocache service
> machine in the usual place (/data/lucence/namematching). This will then be
> used for taxon resolution.
> 2. Im surprised this causes an issue. Whitespace in those codes can be an
> issue.
> 3. Can you supply more detail ? A NPE would suggest a bug or bad config.
> The way we index large datasets is to use the offline method of indexing
> using the "bulk-processor" option in the command line tool.
> Dave
> ------------------------------
> *From:* Ala-portal <ala-portal-bounces at lists.gbif.org> on behalf of Marie
> Elise Lecoq <melecoq at gbif.fr>
> *Sent:* 25 May 2016 03:36
> *To:* ala-portal at lists.gbif.org
> *Subject:* [Ala-portal] [Indexation] Questions
> Hi all !
> I  have few questions about the indexation :
> 1. It seems that some occurrences are wrongly indexed. For example, if I
> search "Pica Pica", the three first results will be not relevant (
> http://recherche.gbif.fr/occurrences/search?taxa=Pica+pica). Do I need to
> change something on the nameindexer ? I don't have a BIE instance on our
> system, do I need to install one in order to help ?
> 2. We have some provider codes with punctuation (e.g. comma, dot ). It's
> seems that the link between collection, institution and dataresource is not
> made due to this. It works with accents.
> 3. I try to index a data resource with more than 20 million occurrences
> and I have a NullPointerException, it's seems that guid is not found. I can
> upload data resource with much less data inside so I guess the problem
> comme from the data resource itself (size ?). Do you have a special way to
> deal with huge data resource ?
> Thanks in advance for your help :-)!
> Cheers,
> Marie
> --

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gbif.org/pipermail/ala-portal/attachments/20160525/b1c8db08/attachment-0001.html>

More information about the Ala-portal mailing list