[Ala-portal] I don't get create the namematching with the last release of the nameindexer

David.Martin at csiro.au David.Martin at csiro.au
Mon Sep 15 07:24:00 CEST 2014

Thanks Santiago

The DWCA file that you use to generate the index must be complete.
What I mean by this is that the taxonomy must be complete all the way up
to kingdom.
So the DWCA needs a separate row for each kingdom, phylum etc. that maybe
referred to by child taxa.

For the data you supplied, there isn¹t a row for Plantae.

As an example, have a look at the mammals dataset that is installed with
the ansible scripts.

Currently this name indexer doesn¹t give any warnings of any problems
(e.g. ³No record found for referenced kingdom Plantae²).
This would be a welcome addition if anyone wants to contribute to this
code base.

Hope this helps,


On 12/09/2014 10:02 pm, "Santiago Martinez de la Riva" <sama at gbif.es>

>Hi all,
>We have some problems trying to generate our own namematching. Our Dwc-a
>has the 3 required files ( attached to mail)
> - The meta.xml that points to spe_lamiacea_accepted.txt.
> - The eml.xml
> - spe_lamiaceae_accepted.txt, that contains only the accepted name of
>that family.
>Also we have our own vernaculars name in the vernacular_name.txt. The
>question is that when I try to generate the namematching following the
>steps in this url 
>https://github.com/AtlasOfLivingAustralia/ala-name-matching or using the
>nameindexer that it's installed in the last ansible, the names are never
>The process shows this text:
>2014-09-12 10:07:58,559 INFO : [DwcaNameIndexer] - Generating loading
>index: true
>2014-09-12 10:07:58,559 INFO : [DwcaNameIndexer] - Generating searching
>index: true
>2014-09-12 10:07:58,560 INFO : [DwcaNameIndexer] - Using the  DwCA name
>file: /data/lucene/sources/spe_lamiaceae_accepted
>2014-09-12 10:07:58,560 INFO : [DwcaNameIndexer] - Using the default
>IRMNG name file: /data/lucene/sources/IRMNG_DWC_HOMONYMS
>2014-09-12 10:07:58,560 INFO : [DwcaNameIndexer] - Using the default
>common name file: /data/lucene/sources/col_vernacular.txt
>2014-09-12 10:07:58,642 INFO : [DwcaNameIndexer] - Starting to create the
>temporary loading index.
>2014-09-12 10:07:59,365 INFO : [DwcaNameIndexer] - Finished creating the
>temporary load index with 881 concepts
>2014-09-12 10:07:59,744 INFO : [ALANameIndexer] - Creating the IRMNG
>index from the DWCA /data/lucene/sources/IRMNG_DWC_HOMONYMS
>2014-09-12 10:08:07,658 INFO : [DwcaNameIndexer] - Starting to load the
>common names
>2014-09-12 10:08:07,699 INFO : [DwcaNameIndexer] - Finished processing
>1000 common names with 0 added to index
>2014-09-12 10:08:07,724 INFO : [DwcaNameIndexer] - Finished processing
>2000 common names with 0 added to index
>2014-09-12 10:08:09,516 INFO : [DwcaNameIndexer] - Finished processing
>332000 common names with 0 added to index
>2014-09-12 10:08:09,518 INFO : [DwcaNameIndexer] - Finished processing
>332199 common names with 0 added to index
>And when I try to search a scientific name through: sudo nameindexer
>-testSearch "scientific_name",
>I always obtain the same result:
>No match for this "scientic_name"
>In the beginning, we realized that your col_dwc.txt only has accepted
>names, then we thought that this file can only have accepted names. But
>also we have the this file with synonyms, and with not accepted names.
>But we don't get create the namematching in any case.
>Is it possible that are there some bug in the indexer or in the
>I think that our zip has the same structure that your zip
>(http://biocache.ala.org.au/archives/dwca-col.zip), and we have done
>several test with different kind of files of names. For this reason, we
>have thought that this possibility can exist.
>In other hand, I think that the Brasilian team got generate their own
>namematching, could someone of this team, I think that Allan did this,
>explain how they have generated their namematching??
>Thanks a lot,
>Santiago Martínez de la Riva
>GBIF.ES, Unidad de Coordinación         Tel. +34 91 4203017 x 273
>Real Jardín Botánico - CSIC                     Fax +34 91 429 2405
>Plaza de Murillo, 2                                     sama at gbif.es
>28014 Madrid, Spain                                 www.gbif.es

More information about the Ala-portal mailing list