Re: [Ala-portal] I don't get create the namematching with the last release of the nameindexer

15 Sep 2014

      Thanks Santiago

The DWCA file that you use to generate the index must be complete.
What I mean by this is that the taxonomy must be complete all the way up
to kingdom.
So the DWCA needs a separate row for each kingdom, phylum etc. that maybe
referred to by child taxa.

For the data you supplied, there isn¹t a row for Plantae.

As an example, have a look at the mammals dataset that is installed with
the ansible scripts.

Currently this name indexer doesn¹t give any warnings of any problems
(e.g. ³No record found for referenced kingdom Plantae²).
This would be a welcome addition if anyone wants to contribute to this
code base.

Hope this helps,

Dave

On 12/09/2014 10:02 pm, "Santiago Martinez de la Riva" <sama@gbif.es>
wrote:
...
Hi all,
We have some problems trying to generate our own namematching. Our Dwc-a
has the 3 required files ( attached to mail)
- The meta.xml that points to spe_lamiacea_accepted.txt.
- The eml.xml
- spe_lamiaceae_accepted.txt, that contains only the accepted name of
that family.
Also we have our own vernaculars name in the vernacular_name.txt. The
question is that when I try to generate the namematching following the
steps in this url 
https://github.com/AtlasOfLivingAustralia/ala-name-matching or using the
nameindexer that it's installed in the last ansible, the names are never
indexed.
The process shows this text:
2014-09-12 10:07:58,559 INFO : [DwcaNameIndexer] - Generating loading
index: true
2014-09-12 10:07:58,559 INFO : [DwcaNameIndexer] - Generating searching
index: true
2014-09-12 10:07:58,560 INFO : [DwcaNameIndexer] - Using the  DwCA name
file: /data/lucene/sources/spe_lamiaceae_accepted
2014-09-12 10:07:58,560 INFO : [DwcaNameIndexer] - Using the default
IRMNG name file: /data/lucene/sources/IRMNG_DWC_HOMONYMS
2014-09-12 10:07:58,560 INFO : [DwcaNameIndexer] - Using the default
common name file: /data/lucene/sources/col_vernacular.txt
2014-09-12 10:07:58,642 INFO : [DwcaNameIndexer] - Starting to create the
temporary loading index.
2014-09-12 10:07:59,365 INFO : [DwcaNameIndexer] - Finished creating the
temporary load index with 881 concepts
2014-09-12 10:07:59,744 INFO : [ALANameIndexer] - Creating the IRMNG
index from the DWCA /data/lucene/sources/IRMNG_DWC_HOMONYMS
2014-09-12 10:08:07,658 INFO : [DwcaNameIndexer] - Starting to load the
common names
2014-09-12 10:08:07,699 INFO : [DwcaNameIndexer] - Finished processing
1000 common names with 0 added to index
2014-09-12 10:08:07,724 INFO : [DwcaNameIndexer] - Finished processing
2000 common names with 0 added to index
......
2014-09-12 10:08:09,516 INFO : [DwcaNameIndexer] - Finished processing
332000 common names with 0 added to index
2014-09-12 10:08:09,518 INFO : [DwcaNameIndexer] - Finished processing
332199 common names with 0 added to index
And when I try to search a scientific name through: sudo nameindexer
-testSearch "scientific_name",
I always obtain the same result:
No match for this "scientic_name"
In the beginning, we realized that your col_dwc.txt only has accepted
names, then we thought that this file can only have accepted names. But
also we have the this file with synonyms, and with not accepted names.
But we don't get create the namematching in any case.
Is it possible that are there some bug in the indexer or in the
ala-name-matching-2.1-distribution?
I think that our zip has the same structure that your zip
(http://biocache.ala.org.au/archives/dwca-col.zip), and we have done
several test with different kind of files of names. For this reason, we
have thought that this possibility can exist.
In other hand, I think that the Brasilian team got generate their own
namematching, could someone of this team, I think that Allan did this,
explain how they have generated their namematching??
Thanks a lot,
SaMa
--------------------------------------------------------------------------
-------------
Santiago Martínez de la Riva
GBIF.ES, Unidad de Coordinación         Tel. +34 91 4203017 x 273
Real Jardín Botánico - CSIC                     Fax +34 91 429 2405
Plaza de Murillo, 2                                     sama@gbif.es
28014 Madrid, Spain                                 www.gbif.es

Re: [Ala-portal] I don't get create the namematching with the last release of the nameindexer

David.Martin＠csiro.au