Thanks Santiago
The DWCA file that you use to generate the index must be complete. What I mean by this is that the taxonomy must be complete all the way up to kingdom. So the DWCA needs a separate row for each kingdom, phylum etc. that maybe referred to by child taxa.
For the data you supplied, there isn¹t a row for Plantae.
As an example, have a look at the mammals dataset that is installed with the ansible scripts.
Currently this name indexer doesn¹t give any warnings of any problems (e.g. ³No record found for referenced kingdom Plantae²). This would be a welcome addition if anyone wants to contribute to this code base.
Hope this helps,
Dave
On 12/09/2014 10:02 pm, "Santiago Martinez de la Riva" sama@gbif.es wrote:
Hi all,
We have some problems trying to generate our own namematching. Our Dwc-a has the 3 required files ( attached to mail)
- The meta.xml that points to spe_lamiacea_accepted.txt.
- The eml.xml
- spe_lamiaceae_accepted.txt, that contains only the accepted name of
that family.
Also we have our own vernaculars name in the vernacular_name.txt. The question is that when I try to generate the namematching following the steps in this url https://github.com/AtlasOfLivingAustralia/ala-name-matching or using the nameindexer that it's installed in the last ansible, the names are never indexed. The process shows this text:
2014-09-12 10:07:58,559 INFO : [DwcaNameIndexer] - Generating loading index: true 2014-09-12 10:07:58,559 INFO : [DwcaNameIndexer] - Generating searching index: true 2014-09-12 10:07:58,560 INFO : [DwcaNameIndexer] - Using the DwCA name file: /data/lucene/sources/spe_lamiaceae_accepted 2014-09-12 10:07:58,560 INFO : [DwcaNameIndexer] - Using the default IRMNG name file: /data/lucene/sources/IRMNG_DWC_HOMONYMS 2014-09-12 10:07:58,560 INFO : [DwcaNameIndexer] - Using the default common name file: /data/lucene/sources/col_vernacular.txt 2014-09-12 10:07:58,642 INFO : [DwcaNameIndexer] - Starting to create the temporary loading index. 2014-09-12 10:07:59,365 INFO : [DwcaNameIndexer] - Finished creating the temporary load index with 881 concepts 2014-09-12 10:07:59,744 INFO : [ALANameIndexer] - Creating the IRMNG index from the DWCA /data/lucene/sources/IRMNG_DWC_HOMONYMS 2014-09-12 10:08:07,658 INFO : [DwcaNameIndexer] - Starting to load the common names 2014-09-12 10:08:07,699 INFO : [DwcaNameIndexer] - Finished processing 1000 common names with 0 added to index 2014-09-12 10:08:07,724 INFO : [DwcaNameIndexer] - Finished processing 2000 common names with 0 added to index
......
2014-09-12 10:08:09,516 INFO : [DwcaNameIndexer] - Finished processing 332000 common names with 0 added to index 2014-09-12 10:08:09,518 INFO : [DwcaNameIndexer] - Finished processing 332199 common names with 0 added to index
And when I try to search a scientific name through: sudo nameindexer -testSearch "scientific_name",
I always obtain the same result: No match for this "scientic_name"
In the beginning, we realized that your col_dwc.txt only has accepted names, then we thought that this file can only have accepted names. But also we have the this file with synonyms, and with not accepted names. But we don't get create the namematching in any case.
Is it possible that are there some bug in the indexer or in the ala-name-matching-2.1-distribution? I think that our zip has the same structure that your zip (http://biocache.ala.org.au/archives/dwca-col.zip), and we have done several test with different kind of files of names. For this reason, we have thought that this possibility can exist.
In other hand, I think that the Brasilian team got generate their own namematching, could someone of this team, I think that Allan did this, explain how they have generated their namematching??
Thanks a lot, SaMa
Santiago Martínez de la Riva GBIF.ES, Unidad de Coordinación Tel. +34 91 4203017 x 273 Real Jardín Botánico - CSIC Fax +34 91 429 2405 Plaza de Murillo, 2 sama@gbif.es 28014 Madrid, Spain www.gbif.es