Hello all,
Thanks again for your help. The 10 years anniversary of GBIF France went well (I guess), nobody told us that it was bad so ...
I sent a lot of questions the last few weeks and I would like to give solutions that I found for my issues. It could help others people :-).
INDEXATION :
1. About the wrong indexation, I found two bugs on the checklist used for the name indexing : - some of species don't have the entire classification (e.g. http://www.gbif.org/species/4814179) - some of them send a NullPointerException (see error below) when I ran the searchText command directly on the server.
$ sudo nameindexer -testSearch "Canis familiaris Linnaeus, 1758" org.apache.lucene.index.IndexFormatTooNewException: Format version is not supported (resource: MMapIndexInput(path="/data/lucene/namematching/cb/segments.gen")): -3 (needs to be between -2 and -2) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:722) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:65) at au.org.ala.names.search.ALANameSearcher.<init>(ALANameSearcher.java:117) at au.org.ala.names.search.DwcaNameIndexer.main(DwcaNameIndexer.java:488)
2. I still got issues with the punctuation or space in provider codes. My future work will focus on this.
3. I have successfully uploaded my dataset with more than 20 millions occurrences by following those steps : a. I uploaded a DwCArchive with 15 occurrences in order to create the dataset into the system. I need to do this because the Zip File library using in biocache store can't open a file bigger than 1Go. b. I copied the real DwC-Archive instead of the fake one on the /collectory/upload/ folder c. I asked our system administrator to increase the RAM in our Virtual Machine (from 4Go to 80Go). d. I made some correction into the collectory-pluggin (you can see my email that I sent on June, 1st) and the load, process and indexation works well after this. It took ages but it worked. e. Our data is now visible into our portal ( http://metadonnee.gbif.fr/public/showDataResource/dr179) I'm not sure it's the good way to do it but it works !
SPATIAL :
I removed all the tools using environmental layers but I will be really interested by a training about it in order to install it :-)!
DATA :
For my error with Institution UID instead of name, I just changed "caches.collections.enabled" to true in the configuration file of biocache and it works perfectly.
Thanks again! cheers, Marie
--