Thanks a lot Dave, especially if you are currently in leave :-)!
1. This index should be Catalog of Life if I have understood well. Maybe, I should create a new name index (using the nameindexer tool) with the backbone taxonomy list from GBIF.
2. It works with others codes that contain whitespaces. The only difference that I can see between those codes are punctuation.
3. Sorry for my first explanation not really helpful ! :-). Actually, I was wrong, it's not a NFE. The error takes place before the indexation itself, it happens when I try to create the data resource (using GBIF tool or directly by creating a dataresource and then uploading a ZIP file). The DwC is downloaded and directly after, I got the error (see error track below). I think that the error come from this function ( https://github.com/AtlasOfLivingAustralia/collectory-plugin/blob/master/grai...) so I guess it is when the zip file is unzipped.
-----------------------------------------------------
2016-05-23 16:56:08,179 INFO [DataResourceController] Downloading file: http://api.gbif.org/v1/occurrence/download/request/0007506-160118175350007.z... 2016-05-23 16:56:37,965 INFO [org.jasig.cas.services.DefaultServicesManagerImpl] - <Reloading registered services.> 2016-05-23 16:56:37,976 DEBUG [org.jasig.cas.services.DefaultServicesManagerImpl] - <Adding registered service ^(https?|imaps?)://.*> 2016-05-23 16:56:37,976 INFO [org.jasig.cas.services.DefaultServicesManagerImpl] - <Loaded 1 services.> 2016-05-23 16:57:57,911 INFO [GbifService] dr172 null null 2016-05-23 16:57:58,155 ERROR [DataResourceController] JSONObject["guid"] not found. org.codehaus.groovy.grails.web.json.JSONException: JSONObject["guid"] not found. at au.org.ala.collectory.GbifService.createOrUpdateGBIFResource(GbifService.groovy:324) at au.org.ala.collectory.GbifService.createGBIFResourceFromArchiveURL(GbifService.groovy:294) at au.org.ala.collectory.ProviderGroupController$_closure23.doCall(ProviderGroupController.groovy:557) at grails.plugin.cache.web.filter.PageFragmentCachingFilter.doFilter(PageFragmentCachingFilter.java:198) at grails.plugin.cache.web.filter.AbstractFilter.doFilter(AbstractFilter.java:63) at com.brandseye.cors.CorsFilter.doFilter(CorsFilter.java:82) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
Cheers, Marie
On Wed, May 25, 2016 at 12:26 PM, David.Martin@csiro.au wrote:
Thanks Marie. Just quick answers (im currently on leave)
- BIE isnt required, but there should be an index on the biocache service
machine in the usual place (/data/lucence/namematching). This will then be used for taxon resolution.
- Im surprised this causes an issue. Whitespace in those codes can be an
issue.
- Can you supply more detail ? A NPE would suggest a bug or bad config.
The way we index large datasets is to use the offline method of indexing using the "bulk-processor" option in the command line tool.
Dave
*From:* Ala-portal ala-portal-bounces@lists.gbif.org on behalf of Marie Elise Lecoq melecoq@gbif.fr *Sent:* 25 May 2016 03:36 *To:* ala-portal@lists.gbif.org *Subject:* [Ala-portal] [Indexation] Questions
Hi all !
I have few questions about the indexation :
- It seems that some occurrences are wrongly indexed. For example, if I
search "Pica Pica", the three first results will be not relevant ( http://recherche.gbif.fr/occurrences/search?taxa=Pica+pica). Do I need to change something on the nameindexer ? I don't have a BIE instance on our system, do I need to install one in order to help ?
- We have some provider codes with punctuation (e.g. comma, dot ). It's
seems that the link between collection, institution and dataresource is not made due to this. It works with accents.
- I try to index a data resource with more than 20 million occurrences
and I have a NullPointerException, it's seems that guid is not found. I can upload data resource with much less data inside so I guess the problem comme from the data resource itself (size ?). Do you have a special way to deal with huge data resource ?
Thanks in advance for your help :-)! Cheers, Marie
--
--