[Ala-portal] [Indexation] Questions

Wed May 25 23:55:02 CEST 2016

Thanks a lot Dave, especially if you are currently in leave :-)!

1. This index should be Catalog of Life if I have understood well. Maybe, I
should create a new name index (using the nameindexer tool) with the
backbone taxonomy list from GBIF.

2. It works with others codes that contain whitespaces.  The only
difference that I can see between those codes are punctuation.

3. Sorry for my first explanation not really helpful ! :-). Actually, I was
wrong, it's not a NFE.
The error takes place before the indexation itself, it happens when I try
to create the data resource (using GBIF tool or directly by creating a
dataresource and then uploading a ZIP file).
The DwC is downloaded and directly after, I got the error (see error track
below).
I think that the error come from this function (
https://github.com/AtlasOfLivingAustralia/collectory-plugin/blob/master/grails-app/services/au/org/ala/collectory/GbifService.groovy#L371)
so I guess it is when the zip file is unzipped.

-----------------------------------------------------

2016-05-23 16:56:08,179 INFO  [DataResourceController]  Downloading file:
http://api.gbif.org/v1/occurrence/download/request/0007506-160118175350007.zip
2016-05-23 16:56:37,965 INFO
[org.jasig.cas.services.DefaultServicesManagerImpl] - <Reloading registered
services.>
2016-05-23 16:56:37,976 DEBUG
[org.jasig.cas.services.DefaultServicesManagerImpl] - <Adding registered
service ^(https?|imaps?)://.*>
2016-05-23 16:56:37,976 INFO
[org.jasig.cas.services.DefaultServicesManagerImpl] - <Loaded 1 services.>
2016-05-23 16:57:57,911 INFO  [GbifService]  dr172  null null
2016-05-23 16:57:58,155 ERROR [DataResourceController]  JSONObject["guid"]
not found.
org.codehaus.groovy.grails.web.json.JSONException: JSONObject["guid"] not
found.
at
au.org.ala.collectory.GbifService.createOrUpdateGBIFResource(GbifService.groovy:324)
at
au.org.ala.collectory.GbifService.createGBIFResourceFromArchiveURL(GbifService.groovy:294)
at
au.org.ala.collectory.ProviderGroupController$_closure23.doCall(ProviderGroupController.groovy:557)
at
grails.plugin.cache.web.filter.PageFragmentCachingFilter.doFilter(PageFragmentCachingFilter.java:198)
at
grails.plugin.cache.web.filter.AbstractFilter.doFilter(AbstractFilter.java:63)
at com.brandseye.cors.CorsFilter.doFilter(CorsFilter.java:82)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Cheers,
Marie

On Wed, May 25, 2016 at 12:26 PM, <David.Martin at csiro.au> wrote:

> Thanks Marie. Just quick answers (im currently on leave)
>
>
> 1. BIE isnt required, but there should be an index on the biocache service
> machine in the usual place (/data/lucence/namematching). This will then be
> used for taxon resolution.
>
>
> 2. Im surprised this causes an issue. Whitespace in those codes can be an
> issue.
>
>
> 3. Can you supply more detail ? A NPE would suggest a bug or bad config.
> The way we index large datasets is to use the offline method of indexing
> using the "bulk-processor" option in the command line tool.
>
>
> Dave
>
>
> ------------------------------
> *From:* Ala-portal <ala-portal-bounces at lists.gbif.org> on behalf of Marie
> Elise Lecoq <melecoq at gbif.fr>
> *Sent:* 25 May 2016 03:36
> *To:* ala-portal at lists.gbif.org
> *Subject:* [Ala-portal] [Indexation] Questions
>
> Hi all !
>
> I  have few questions about the indexation :
>
> 1. It seems that some occurrences are wrongly indexed. For example, if I
> search "Pica Pica", the three first results will be not relevant (
> http://recherche.gbif.fr/occurrences/search?taxa=Pica+pica). Do I need to
> change something on the nameindexer ? I don't have a BIE instance on our
> system, do I need to install one in order to help ?
>
> 2. We have some provider codes with punctuation (e.g. comma, dot ). It's
> seems that the link between collection, institution and dataresource is not
> made due to this. It works with accents.
>
> 3. I try to index a data resource with more than 20 million occurrences
> and I have a NullPointerException, it's seems that guid is not found. I can
> upload data resource with much less data inside so I guess the problem
> comme from the data resource itself (size ?). Do you have a special way to
> deal with huge data resource ?
>
> Thanks in advance for your help :-)!
> Cheers,
> Marie
>
>
> --
>
>

--
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gbif.org/pipermail/ala-portal/attachments/20160525/b1c8db08/attachment-0001.html>