Hi Santiago

This is likely to need ALA folks, but since they are asleep, this might give you some ideas to explore before they come online.

I’ve logged the issue with a proposed fix: 
  https://github.com/AtlasOfLivingAustralia/ala-name-matching/issues/4 

What it fails on though is that it is getting NULL names.  Perhaps you can modify your input checklist to not have null names ever?

You might for example use this kind of SQL or similar for whatever you are using to generate the names list:

SELECT
  kingdom, phylum, class, order, family, genus, 
    COALESCE (name, genus, family, order, class, phylum, kingdom) AS scientificName
FROM ...

The COALESCE function will then set the name to be the first non NULL value.

I tried to build the project the fix myself, but “mvn:assembly:single” did not produce me a fat jar, and the project read me doesn’t tell me how they did it… sorry.

I hope this helps,
Tim



On 04 Sep 2014, at 14:50, Santiago Martinez de la Riva <sama@gbif.es> wrote:

Hi all,


I'm trying to create our own name index. I'm following the steps of the wiki in GitHub: https://github.com/AtlasOfLivingAustralia/documentation/wiki/Creating-a-name-index

Our dwca has the same estructura that dwca-col-mammals, but the problem is that when I try to generate the name index with the command: sudo nameindexer -dwca /...

I get the next exception:

vagrant@ala:/data/lucene/sources/dwca-spe2000-plantae$ sudo nameindexer -dwca /data/lucene/sources/dwca-spe2000-plantae
2014-09-04 12:04:26,093 INFO : [DwcaNameIndexer] - Generating loading index: true
2014-09-04 12:04:26,094 INFO : [DwcaNameIndexer] - Generating searching index: true
2014-09-04 12:04:26,094 INFO : [DwcaNameIndexer] - Using the  DwCA name file: /data/lucene/sources/dwca-spe2000-plantae
2014-09-04 12:04:26,094 INFO : [DwcaNameIndexer] - Using the default IRMNG name file: /data/lucene/sources/IRMNG_DWC_HOMONYMS
2014-09-04 12:04:26,095 INFO : [DwcaNameIndexer] - Using the default common name file: /data/lucene/sources/col_vernacular.txt
2014-09-04 12:04:26,182 INFO : [DwcaNameIndexer] - Starting to create the temporary loading index.
2014-09-04 12:08:10,283 INFO : [DwcaNameIndexer] - Finished creating the temporary load index with 1070805 concepts
java.lang.NullPointerException
       at au.org.ala.names.search.ALANameIndexer.isBlacklisted(ALANameIndexer.java:778)
       at au.org.ala.names.search.ALANameIndexer.createALAIndexDocument(ALANameIndexer.java:788)
       at au.org.ala.names.search.ALANameIndexer.createALAIndexDocument(ALANameIndexer.java:757)
       at au.org.ala.names.search.DwcaNameIndexer.addIndex(DwcaNameIndexer.java:350)
       at au.org.ala.names.search.DwcaNameIndexer.generateIndex(DwcaNameIndexer.java:281)
       at au.org.ala.names.search.DwcaNameIndexer.create(DwcaNameIndexer.java:101)
       at au.org.ala.names.search.DwcaNameIndexer.main(DwcaNameIndexer.java:527)

And when I try to search some name, I get this other one expection:

vagrant@ala:/data/lucene$ sudo nameindexer -testSearch "Nepeta Catarea"
Search for name
org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.NIOFSDirectory@/data/lucene/namematching/cb lockFactory=org.apache.lucene.store.NativeFSLockFactory@c22530: files: [write.lock]
       at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:741)
       at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
       at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:65)
       at au.org.ala.names.search.ALANameSearcher.<init>(ALANameSearcher.java:122)
       at au.org.ala.names.search.DwcaNameIndexer.main(DwcaNameIndexer.java:465)


Because the nameindexer didn't generate the necessary files:

Help meee!! xD

Cheers,
SaMa


---------------------------------------------------------------------------------------
Santiago Martínez de la Riva
GBIF.ES, Unidad de Coordinación         Tel. +34 91 4203017 x 273
Real Jardín Botánico - CSIC                     Fax +34 91 429 2405
Plaza de Murillo, 2                                     sama@gbif.es
28014 Madrid, Spain                                 www.gbif.es
_______________________________________________
Ala-portal mailing list
Ala-portal@lists.gbif.org
http://lists.gbif.org/mailman/listinfo/ala-portal