[Ala-portal] I don't get create the namematching with the last release of the nameindexer

Santiago Martinez de la Riva sama at gbif.es
Fri Sep 12 14:02:19 CEST 2014

Hi all,

We have some problems trying to generate our own namematching. Our Dwc-a has the 3 required files ( attached to mail)

 - The meta.xml that points to spe_lamiacea_accepted.txt.
 - The eml.xml
 - spe_lamiaceae_accepted.txt, that contains only the accepted name of that family.

Also we have our own vernaculars name in the vernacular_name.txt. The question is that when I try to generate the namematching following the steps in this url https://github.com/AtlasOfLivingAustralia/ala-name-matching or using the nameindexer that it's installed in the last ansible, the names are never indexed.
The process shows this text:

2014-09-12 10:07:58,559 INFO : [DwcaNameIndexer] - Generating loading index: true
2014-09-12 10:07:58,559 INFO : [DwcaNameIndexer] - Generating searching index: true
2014-09-12 10:07:58,560 INFO : [DwcaNameIndexer] - Using the  DwCA name file: /data/lucene/sources/spe_lamiaceae_accepted
2014-09-12 10:07:58,560 INFO : [DwcaNameIndexer] - Using the default IRMNG name file: /data/lucene/sources/IRMNG_DWC_HOMONYMS
2014-09-12 10:07:58,560 INFO : [DwcaNameIndexer] - Using the default common name file: /data/lucene/sources/col_vernacular.txt
2014-09-12 10:07:58,642 INFO : [DwcaNameIndexer] - Starting to create the temporary loading index.
2014-09-12 10:07:59,365 INFO : [DwcaNameIndexer] - Finished creating the temporary load index with 881 concepts
2014-09-12 10:07:59,744 INFO : [ALANameIndexer] - Creating the IRMNG index from the DWCA /data/lucene/sources/IRMNG_DWC_HOMONYMS
2014-09-12 10:08:07,658 INFO : [DwcaNameIndexer] - Starting to load the common names
2014-09-12 10:08:07,699 INFO : [DwcaNameIndexer] - Finished processing 1000 common names with 0 added to index
2014-09-12 10:08:07,724 INFO : [DwcaNameIndexer] - Finished processing 2000 common names with 0 added to index


2014-09-12 10:08:09,516 INFO : [DwcaNameIndexer] - Finished processing 332000 common names with 0 added to index
2014-09-12 10:08:09,518 INFO : [DwcaNameIndexer] - Finished processing 332199 common names with 0 added to index

And when I try to search a scientific name through: sudo nameindexer -testSearch "scientific_name",

I always obtain the same result:
No match for this "scientic_name"

In the beginning, we realized that your col_dwc.txt only has accepted names, then we thought that this file can only have accepted names. But also we have the this file with synonyms, and with not accepted names. But we don't get create the namematching in any case.

Is it possible that are there some bug in the indexer or in the ala-name-matching-2.1-distribution?
I think that our zip has the same structure that your zip (http://biocache.ala.org.au/archives/dwca-col.zip), and we have done several test with different kind of files of names. For this reason, we have thought that this possibility can exist.

In other hand, I think that the Brasilian team got generate their own namematching, could someone of this team, I think that Allan did this, explain how they have generated their namematching??

Thanks a lot,

Santiago Martínez de la Riva
GBIF.ES, Unidad de Coordinación         Tel. +34 91 4203017 x 273
Real Jardín Botánico - CSIC                     Fax +34 91 429 2405
Plaza de Murillo, 2                                     sama at gbif.es
28014 Madrid, Spain                                 www.gbif.es
-------------- next part --------------
A non-text attachment was scrubbed...
Name: spe_lamiaceae_accepted.zip
Type: application/zip
Size: 35188 bytes
Desc: spe_lamiaceae_accepted.zip
Url : http://lists.gbif.org/pipermail/ala-portal/attachments/20140912/add72c04/attachment-0001.zip 

More information about the Ala-portal mailing list