[Ala-portal] Problem generating a new name index

Tim Robertson trobertson at gbif.org
Thu Sep 4 15:25:32 CEST 2014


Hi Santiago

This is likely to need ALA folks, but since they are asleep, this might give you some ideas to explore before they come online.

I’ve logged the issue with a proposed fix: 
  https://github.com/AtlasOfLivingAustralia/ala-name-matching/issues/4 

What it fails on though is that it is getting NULL names.  Perhaps you can modify your input checklist to not have null names ever?

You might for example use this kind of SQL or similar for whatever you are using to generate the names list:

SELECT
  kingdom, phylum, class, order, family, genus, 
    COALESCE (name, genus, family, order, class, phylum, kingdom) AS scientificName
FROM ...

The COALESCE function will then set the name to be the first non NULL value.

I tried to build the project the fix myself, but “mvn:assembly:single” did not produce me a fat jar, and the project read me doesn’t tell me how they did it… sorry.

I hope this helps,
Tim



On 04 Sep 2014, at 14:50, Santiago Martinez de la Riva <sama at gbif.es> wrote:

> Hi all,
> 
> 
> I'm trying to create our own name index. I'm following the steps of the wiki in GitHub: https://github.com/AtlasOfLivingAustralia/documentation/wiki/Creating-a-name-index
> 
> Our dwca has the same estructura that dwca-col-mammals, but the problem is that when I try to generate the name index with the command: sudo nameindexer -dwca /...
> 
> I get the next exception:
> 
> vagrant at ala:/data/lucene/sources/dwca-spe2000-plantae$ sudo nameindexer -dwca /data/lucene/sources/dwca-spe2000-plantae
> 2014-09-04 12:04:26,093 INFO : [DwcaNameIndexer] - Generating loading index: true
> 2014-09-04 12:04:26,094 INFO : [DwcaNameIndexer] - Generating searching index: true
> 2014-09-04 12:04:26,094 INFO : [DwcaNameIndexer] - Using the  DwCA name file: /data/lucene/sources/dwca-spe2000-plantae
> 2014-09-04 12:04:26,094 INFO : [DwcaNameIndexer] - Using the default IRMNG name file: /data/lucene/sources/IRMNG_DWC_HOMONYMS
> 2014-09-04 12:04:26,095 INFO : [DwcaNameIndexer] - Using the default common name file: /data/lucene/sources/col_vernacular.txt
> 2014-09-04 12:04:26,182 INFO : [DwcaNameIndexer] - Starting to create the temporary loading index.
> 2014-09-04 12:08:10,283 INFO : [DwcaNameIndexer] - Finished creating the temporary load index with 1070805 concepts
> java.lang.NullPointerException
>        at au.org.ala.names.search.ALANameIndexer.isBlacklisted(ALANameIndexer.java:778)
>        at au.org.ala.names.search.ALANameIndexer.createALAIndexDocument(ALANameIndexer.java:788)
>        at au.org.ala.names.search.ALANameIndexer.createALAIndexDocument(ALANameIndexer.java:757)
>        at au.org.ala.names.search.DwcaNameIndexer.addIndex(DwcaNameIndexer.java:350)
>        at au.org.ala.names.search.DwcaNameIndexer.generateIndex(DwcaNameIndexer.java:281)
>        at au.org.ala.names.search.DwcaNameIndexer.create(DwcaNameIndexer.java:101)
>        at au.org.ala.names.search.DwcaNameIndexer.main(DwcaNameIndexer.java:527)
> 
> And when I try to search some name, I get this other one expection:
> 
> vagrant at ala:/data/lucene$ sudo nameindexer -testSearch "Nepeta Catarea"
> Search for name
> org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.NIOFSDirectory@/data/lucene/namematching/cb lockFactory=org.apache.lucene.store.NativeFSLockFactory at c22530: files: [write.lock]
>        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:741)
>        at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
>        at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:65)
>        at au.org.ala.names.search.ALANameSearcher.<init>(ALANameSearcher.java:122)
>        at au.org.ala.names.search.DwcaNameIndexer.main(DwcaNameIndexer.java:465)
> 
> 
> Because the nameindexer didn't generate the necessary files:
> 
> Help meee!! xD
> 
> Cheers,
> SaMa
> 
> 
> ---------------------------------------------------------------------------------------
> Santiago Martínez de la Riva
> GBIF.ES, Unidad de Coordinación         Tel. +34 91 4203017 x 273
> Real Jardín Botánico - CSIC                     Fax +34 91 429 2405
> Plaza de Murillo, 2                                     sama at gbif.es
> 28014 Madrid, Spain                                 www.gbif.es
> _______________________________________________
> Ala-portal mailing list
> Ala-portal at lists.gbif.org
> http://lists.gbif.org/mailman/listinfo/ala-portal
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.gbif.org/pipermail/ala-portal/attachments/20140904/20e04c66/attachment.html 


More information about the Ala-portal mailing list