Re: [Ala-portal] [Indexation] Questions

25 May 2016

      Thanks a lot Dave, especially if you are currently in leave :-)!

1. This index should be Catalog of Life if I have understood well. Maybe, I
should create a new name index (using the nameindexer tool) with the
backbone taxonomy list from GBIF.

2. It works with others codes that contain whitespaces.  The only
difference that I can see between those codes are punctuation.

3. Sorry for my first explanation not really helpful ! :-). Actually, I was
wrong, it's not a NFE.
The error takes place before the indexation itself, it happens when I try
to create the data resource (using GBIF tool or directly by creating a
dataresource and then uploading a ZIP file).
The DwC is downloaded and directly after, I got the error (see error track
below).
I think that the error come from this function (
https://github.com/AtlasOfLivingAustralia/collectory-plugin/blob/master/grai...)
so I guess it is when the zip file is unzipped.

-----------------------------------------------------

2016-05-23 16:56:08,179 INFO  [DataResourceController]  Downloading file:
http://api.gbif.org/v1/occurrence/download/request/0007506-160118175350007.z...
2016-05-23 16:56:37,965 INFO
[org.jasig.cas.services.DefaultServicesManagerImpl] - <Reloading registered
services.>
2016-05-23 16:56:37,976 DEBUG
[org.jasig.cas.services.DefaultServicesManagerImpl] - <Adding registered
service ^(https?|imaps?)://.*>
2016-05-23 16:56:37,976 INFO
[org.jasig.cas.services.DefaultServicesManagerImpl] - <Loaded 1 services.>
2016-05-23 16:57:57,911 INFO  [GbifService]  dr172  null null
2016-05-23 16:57:58,155 ERROR [DataResourceController]  JSONObject["guid"]
not found.
org.codehaus.groovy.grails.web.json.JSONException: JSONObject["guid"] not
found.
at
au.org.ala.collectory.GbifService.createOrUpdateGBIFResource(GbifService.groovy:324)
at
au.org.ala.collectory.GbifService.createGBIFResourceFromArchiveURL(GbifService.groovy:294)
at
au.org.ala.collectory.ProviderGroupController$_closure23.doCall(ProviderGroupController.groovy:557)
at
grails.plugin.cache.web.filter.PageFragmentCachingFilter.doFilter(PageFragmentCachingFilter.java:198)
at
grails.plugin.cache.web.filter.AbstractFilter.doFilter(AbstractFilter.java:63)
at com.brandseye.cors.CorsFilter.doFilter(CorsFilter.java:82)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Cheers,
Marie

On Wed, May 25, 2016 at 12:26 PM, <David.Martin@csiro.au> wrote:
...
Thanks Marie. Just quick answers (im currently on leave)
1. BIE isnt required, but there should be an index on the biocache service
machine in the usual place (/data/lucence/namematching). This will then be
used for taxon resolution.
2. Im surprised this causes an issue. Whitespace in those codes can be an
issue.
3. Can you supply more detail ? A NPE would suggest a bug or bad config.
The way we index large datasets is to use the offline method of indexing
using the "bulk-processor" option in the command line tool.
Dave
------------------------------
*From:* Ala-portal <ala-portal-bounces@lists.gbif.org> on behalf of Marie
Elise Lecoq <melecoq@gbif.fr>
*Sent:* 25 May 2016 03:36
*To:* ala-portal@lists.gbif.org
*Subject:* [Ala-portal] [Indexation] Questions
Hi all !
I  have few questions about the indexation :
1. It seems that some occurrences are wrongly indexed. For example, if I
search "Pica Pica", the three first results will be not relevant (
http://recherche.gbif.fr/occurrences/search?taxa=Pica+pica). Do I need to
change something on the nameindexer ? I don't have a BIE instance on our
system, do I need to install one in order to help ?
2. We have some provider codes with punctuation (e.g. comma, dot ). It's
seems that the link between collection, institution and dataresource is not
made due to this. It works with accents.
3. I try to index a data resource with more than 20 million occurrences
and I have a NullPointerException, it's seems that guid is not found. I can
upload data resource with much less data inside so I guess the problem
comme from the data resource itself (size ?). Do you have a special way to
deal with huge data resource ?
Thanks in advance for your help :-)!
Cheers,
Marie
--
--

Re: [Ala-portal] [Indexation] Questions

Marie Elise Lecoq