[Ala-portal] DwC-A loading problems

Daniel Lins daniel.lins at gmail.com
Fri Jun 27 06:34:56 CEST 2014


Thanks David,

We will check these files.

Regards,

Daniel Lins da Silva
(Mobile) 55 11 96144-4050
Research Center on Biodiversity and Computing (Biocomp)
University of Sao Paulo, Brazil
daniellins at usp.br
daniel.lins at gmail.com


Daniel.


2014-06-27 1:31 GMT-03:00 <David.Martin at csiro.au>:

>   Thanks for resending Pedro.
>
>  We are recommending using the Ansible scripts [1] for installation of
> components.
> If this isn't possible, its worth checking out the templates in use by
> these scripts [2].
> There is a property at the bottom of this file that disables the API key
> check.
>
>  security.apikey.checkEnabled=false
>
>  That said, that error looks like a network issue between the machine
> that is running the collectory and the machine you are loading the data to.
> It's worth checking to see if the ports are open between these machines.
>
>  Cheers
>
>  Dave
>
>  [1] https://github.com/gbif/ala-install
> <https://github.com/gbif/ala-install/blob/master/ansible/roles/collectory/templates/config/collectory-config.properties>
> [2]
> https://github.com/gbif/ala-install/blob/master/ansible/roles/collectory/templates/config/collectory-config.properties
>
>     ------------------------------
> *From:* Daniel Lins [daniel.lins at gmail.com]
> *Sent:* 27 June 2014 13:54
> *To:* Martin, Dave (CES, Black Mountain)
> *Cc:* ala-portal at lists.gbif.org; Pedro Corrêa
> *Subject:* Re: [Ala-portal] DwC-A loading problems
>
>   Hi Dave,
>
> Did you see this mail? Do you think this issue can be something related to
> the configuration of api_key property?
>
>  Thanks.
>
>  Regards,
>
>
> 2014-06-25 2:14 GMT-03:00 Daniel Lins <daniel.lins at gmail.com>:
>
>> Hi Dave,
>>
>>  Thanks for the support.
>>
>>  The data loading in the biocache is working properly now. But
>> error continues during the update of collectory (see below).
>>
>>  *java.net.SocketTimeoutException: Read timed out*
>> *at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)*
>> *at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)*
>> *at
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)*
>> *at java.lang.reflect.Constructor.newInstance(Constructor.java:526)*
>> *at
>> sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1675)*
>> *at
>> sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1673)*
>> *at java.security.AccessController.doPrivileged(Native Method)*
>> *at
>> sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1671)*
>> *at
>> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1244)*
>> *at
>> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)*
>> *at scalaj.http.Http$Request.liftedTree1$1(Http.scala:107)*
>> *at scalaj.http.Http$Request.process(Http.scala:103)*
>> *at scalaj.http.Http$Request.responseCode(Http.scala:120)*
>> *at
>> au.org.ala.biocache.load.DataLoader$class.updateLastChecked(DataLoader.scala:354)*
>> *at
>> au.org.ala.biocache.load.DwCALoader.updateLastChecked(DwCALoader.scala:74)*
>> *at au.org.ala.biocache.load.DwCALoader.load(DwCALoader.scala:103)*
>> *at au.org.ala.biocache.load.Loader.load(Loader.scala:75)*
>> *at
>> au.org.ala.biocache.cmd.CMD$$anonfun$executeCommand$7.apply(CMD.scala:69)*
>> *at
>> au.org.ala.biocache.cmd.CMD$$anonfun$executeCommand$7.apply(CMD.scala:69)*
>> *at
>> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)*
>> *at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)*
>> *at au.org.ala.biocache.cmd.CMD$.executeCommand(CMD.scala:69)*
>> *at
>> au.org.ala.biocache.cmd.CommandLineTool$.main(CommandLineTool.scala:22)*
>> *at au.org.ala.biocache.cmd.CommandLineTool.main(CommandLineTool.scala)*
>> *Caused by: java.net.SocketTimeoutException: Read timed out*
>> *at java.net.SocketInputStream.socketRead0(Native Method)*
>> *at java.net.SocketInputStream.read(SocketInputStream.java:152)*
>> *at java.net.SocketInputStream.read(SocketInputStream.java:122)*
>> *at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)*
>> *at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)*
>> *at java.io.BufferedInputStream.read(BufferedInputStream.java:334)*
>> *at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)*
>> *at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)*
>> *at
>> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)*
>> *at
>> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)*
>> *at
>> scalaj.http.Http$Request$$anonfun$responseCode$1.apply(Http.scala:120)*
>> *atscala
>> j.http.Http$Request$$anonfun$responseCode$1.apply(Http.scala:120)*
>> *at scalaj.http.Http$Request.liftedTree1$1(Http.scala:104)*
>> *... 13 more*
>>
>>  In the external configuration file
>> (/data/biocache/config/biocache-config.properties) the property registry.url
>> is correct (registry.url=http://192.168.15.132:8080/collectory/ws), indicating
>> the URL of the collectory WS page.
>>
>>  It could be something related to permission for external access? How
>> works this *api_key* property in the collectory?
>>
>>  Thanks!
>>
>>  Regards,
>>
>>   Daniel Lins da Silva
>>  (Mobile) 55 11 96144-4050
>>  Research Center on Biodiversity and Computing (Biocomp)
>> University of Sao Paulo, Brazil
>>  daniellins at usp.br
>>  daniel.lins at gmail.com
>>
>>
>> 2014-06-20 2:26 GMT-03:00 <David.Martin at csiro.au>:
>>
>>>  Hi Daniel.
>>>
>>>  There is a version updated of biocache-store of 1.1.1 that helped fix
>>> some of the problems Burke spotted when loading darwin core archives
>>> downloaded from the GBIF portal. The symptom where similar (only one record
>>> loaded for a dataset).
>>>
>>>  The exception in point 3) indicates the URL you have configured for
>>> the collectory (registry.url in biocache.properties) is either incorrect,
>>> or the collectory can not be accessed for some reason. At the end of data
>>> load, the collectory is updated to indicate the last loaded date for that
>>> dataset. This is done using a webservice.
>>>
>>>  One thing to mention - if you want to remove all data from your
>>> database, the easiest thing to do is use the cassandra-cli and run the
>>> command:
>>>
>>>  >> truncate occ;
>>>
>>>  This will remove all occurrence records from the database, but not
>>> from the index.
>>>
>>>  The warnings you are seeing in the processing phase e.g.
>>>
>>>  *2014-06-20 01:51:20,505 WARN : [ALANameSearcher] - Unable to parse
>>> Abaca bunchy top  (Babuvirus). Name of type virus unparsable: Abaca bunchy
>>> top  (Babuvirus)*
>>>
>>>  are normal. This is referring to the sensitive species list in use.
>>>
>>>  Cheers
>>>
>>> Dave
>>>
>>>
>>>  ------------------------------
>>> *From:* Daniel Lins [daniel.lins at gmail.com]
>>> *Sent:* 20 June 2014 15:11
>>> *To:* Martin, Dave (CES, Black Mountain)
>>> *Cc:* ala-portal at lists.gbif.org; dos Remedios, Nick (CES, Black
>>> Mountain); Pedro Corrêa
>>> *Subject:* Re: [Ala-portal] DwC-A loading problems
>>>
>>>   Hi Dave, thanks for the information from the last email.
>>>
>>>  I'm following your advice and performing the update of our test
>>> environment for biocache version 1.1. But I'm having some problems and I
>>> would like to know if you or anyone has already found this issue and know a
>>> solution.
>>>
>>>  To update the biocache version I did these steps below (based on the
>>> Vagrant/Ansible installation process):
>>>
>>>  1. Cleaning of the database and index through delete-resource function
>>> (delete-resource dr0 dr1 dr2 ...);
>>> 2. An update of the Biocache config file
>>> (/data/biocache/config/biocache-config.properties) (copied from the Vagrant
>>> VM, with some configuration changes);
>>> 3. An update of the biocache build file (biocache. jar) (copied from
>>> the Vagrant VM - /usr/lib/biocache);
>>> 4. Deployment of the new biocache-service build (copied from the Vagrant
>>> VM - tomcat7/webapps/biocache-service.war)
>>> 5. An update of the Solr config files (schema.xml, solrconfig.xml)
>>> (copied from the Vagrant VM - /data/solr/biocache);
>>> 6. Exclusion of the indexing folder of Biocache Core (/data/solr
>>> /biocache/data);
>>>
>>>  Notes 1 ** No change was made in the Hubs-Webapp and Collectory.
>>>
>>>  Notes 2 **  The import of CSV files is working (using load-local-csv
>>> dr0 /<file_location>/xxx.csv).
>>>
>>>
>>>  I tried to import a Darwin Core Archive by following these steps:
>>>
>>>  1. Created a data resource (dr0);
>>>
>>>  2. Uploaded a DWC-A zip file into the DR using the "Upload File"
>>> option.
>>>
>>>  *Protocol:DarwinCore archive*
>>> *Location
>>> URL:file:////data/collectory/upload/1403239521145/dwca-ocorrencias_lobo_guara_1.zip*
>>> *Automatically loaded:false*
>>> *DwC terms that uniquely identify a record: occurrenceID*
>>> *Strip whitespaces in key: false*
>>> *Incremental Load: false*
>>>
>>>  3. Used the Command Line Tool (Biocache) to Load (*load dr0*), Process
>>> (*process dr0*) and Index (*index dr0*) data.
>>>
>>>
>>>  During the data loading phase, the system generated these errors:
>>>
>>>  *...*
>>>  *2014-06-20 01:49:12,506 INFO : [DataLoader] - Finished DwC loader.
>>> Records processed: 32*
>>> *java.net.SocketTimeoutException: Read timed out*
>>> *at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>> Method)*
>>> *at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)*
>>> *at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)*
>>> *at java.lang.reflect.Constructor.newInstance(Constructor.java:526)*
>>> *at
>>> sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1675)*
>>> *at
>>> sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1673)*
>>> *at java.security.AccessController.doPrivileged(Native Method)*
>>> *at
>>> sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1671)*
>>> *at
>>> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1244)*
>>> *at
>>> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)*
>>> *at scalaj.http.Http$Request.liftedTree1$1(Http.scala:107)*
>>> *at scalaj.http.Http$Request.process(Http.scala:103)*
>>> *at scalaj.http.Http$Request.responseCode(Http.scala:120)*
>>> *at
>>> au.org.ala.biocache.load.DataLoader$class.updateLastChecked(DataLoader.scala:354)*
>>> *at
>>> au.org.ala.biocache.load.DwCALoader.updateLastChecked(DwCALoader.scala:74)*
>>> *at au.org.ala.biocache.load.DwCALoader.load(DwCALoader.scala:103)*
>>> *at au.org.ala.biocache.load.Loader.load(Loader.scala:75)*
>>> *at
>>> au.org.ala.biocache.cmd.CMD$$anonfun$executeCommand$7.apply(CMD.scala:69)*
>>> *at
>>> au.org.ala.biocache.cmd.CMD$$anonfun$executeCommand$7.apply(CMD.scala:69)*
>>> *at
>>> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)*
>>> *at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)*
>>> *at au.org.ala.biocache.cmd.CMD$.executeCommand(CMD.scala:69)*
>>> *at
>>> au.org.ala.biocache.cmd.CommandLineTool$.main(CommandLineTool.scala:22)*
>>> *at au.org.ala.biocache.cmd.CommandLineTool.main(CommandLineTool.scala)*
>>> *Caused by: java.net.SocketTimeoutException: Read timed out*
>>> *at java.net.SocketInputStream.socketRead0(Native Method)*
>>> *at java.net.SocketInputStream.read(SocketInputStream.java:152)*
>>> *at java.net.SocketInputStream.read(SocketInputStream.java:122)*
>>> *at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)*
>>> *at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)*
>>> *at java.io.BufferedInputStream.read(BufferedInputStream.java:334)*
>>> *at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)*
>>> *at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)*
>>> *at
>>> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)*
>>> *at
>>> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)*
>>> *at
>>> scalaj.http.Http$Request$$anonfun$responseCode$1.apply(Http.scala:120)*
>>> *at
>>> scalaj.http.Http$Request$$anonfun$responseCode$1.apply(Http.scala:120)*
>>> *at scalaj.http.Http$Request.liftedTree1$1(Http.scala:104)*
>>> *... 13 more*
>>>
>>>
>>>  And in Cassandra was saved only one record:
>>>
>>>
>>> *cqlsh:occ> select * from occ; *
>>>
>>>  * key      | portalId | uuid*
>>> *----------+----------+--------------------------------------*
>>> * dr0|null |     null | 1b5b21fc-594a-46e6-b8db-cf37c50b8f7b*
>>>
>>>
>>>  During the data processing phase, the system generated these
>>> additional errors:
>>>
>>>  ...
>>>  *Jun 20, 2014 1:51:08 AM
>>> org.geotools.referencing.factory.epsg.ThreadedHsqlEpsgFactory
>>> createDataSource*
>>> *INFO: Building new data source for
>>> org.geotools.referencing.factory.epsg.ThreadedHsqlEpsgFactory*
>>> *Jun 20, 2014 1:51:08 AM
>>> org.geotools.referencing.factory.epsg.ThreadedHsqlEpsgFactory
>>> createBackingStore*
>>> *INFO: Building backing store for
>>> org.geotools.referencing.factory.epsg.ThreadedHsqlEpsgFactory*
>>> *2014-06-20 01:51:20,505 WARN : [ALANameSearcher] - Unable to parse
>>> Abaca bunchy top  (Babuvirus). Name of type virus unparsable: Abaca bunchy
>>> top  (Babuvirus)*
>>> *2014-06-20 01:51:20,509 WARN : [ALANameSearcher] - Unable to parse
>>> Abaca mosaic, sugarcane mosaic (Potyvirus). Name of type virus unparsable:
>>> Abaca mosaic, sugarcane mosaic (Potyvirus)*
>>> *2014-06-20 01:51:21,210 WARN : [ALANameSearcher] - Unable to parse
>>> Acute bee paralysis  (Cripavirus). Name of type virus unparsable: Acute bee
>>> paralysis  (Cripavirus)*
>>> *2014-06-20 01:51:21,255 WARN : [ALANameSearcher] - Unable to parse
>>> Agropyron mosaic  (Rymovirus). Name of type virus unparsable: Agropyron
>>> mosaic  (Rymovirus)*
>>> *2014-06-20 01:51:21,289 WARN : [ALANameSearcher] - Unable to parse
>>> Alphacrytovirus vicia. Name of type virus unparsable: Alphacrytovirus vicia*
>>> *2014-06-20 01:51:21,334 WARN : [ALANameSearcher] - Unable to parse
>>> American plum line pattern  (APLPV, Ilaravirus). Name of type virus
>>> unparsable: American plum line pattern  (APLPV, Ilaravirus)*
>>> *2014-06-20 01:51:21,525 WARN : [ALANameSearcher] - Unable to parse Apis
>>> iridescent  (Iridovirus). Name of type virus unparsable: Apis iridescent
>>>  (Iridovirus)*
>>> *2014-06-20 01:51:21,546 WARN : [ALANameSearcher] - Unable to parse
>>> Apricot ring pox  (Unassigned). Name of type blacklisted unparsable:
>>> Apricot ring pox  (Unassigned)*
>>> *2014-06-20 01:51:21,549 WARN : [ALANameSearcher] - Unable to parse
>>> Arabis mosaic  (Nepovirus). Name of type virus unparsable: Arabis mosaic
>>>  (Nepovirus)*
>>> *2014-06-20 01:51:21,623 WARN : [ALANameSearcher] - Unable to parse
>>> Artichoke Italian latent  (Nepovirus). Name of type virus unparsable:
>>> Artichoke Italian latent  (Nepovirus)*
>>> *2014-06-20 01:51:21,640 WARN : [ALANameSearcher] - Unable to parse
>>> Asparagus   (Ilarvirus). Name of type virus unparsable: Asparagus
>>> (Ilarvirus)*
>>> *2014-06-20 01:51:21,641 WARN : [ALANameSearcher] - Unable to parse
>>> Asparagus   (Potyvirus). Name of type virus unparsable: Asparagus
>>> (Potyvirus)*
>>>  ...
>>>
>>>  During the last phase there were no errors. However, only one record
>>> was indexed.
>>>
>>>  *2014-06-20 01:54:07,739 INFO : [SolrIndexDAO] - >>>>>>>>>>>>>
>>> Document count of index: 1*
>>> *2014-06-20 01:54:07,741 INFO : [SolrIndexDAO] - Finalise finished.*
>>>
>>>  I attached a file with the complete messages generated by Biocache
>>> during this test.
>>>
>>>
>>>  Thanks!
>>>
>>>  Cheers.
>>>
>>>   Daniel Lins da Silva
>>>  (Mobile) 55 11 96144-4050
>>>  Research Center on Biodiversity and Computing (Biocomp)
>>> University of Sao Paulo, Brazil
>>>  daniellins at usp.br
>>>  daniel.lins at gmail.com
>>>
>>>
>>>
>>> 2014-06-18 6:15 GMT-03:00 <David.Martin at csiro.au>:
>>>
>>>>  Hi Daniel,
>>>>
>>>>  From what you've said, Im not clear on what customisations you have
>>>> made so its difficult to make a call on the impact of migrating to 1.1. We
>>>> also do not know what subversion revisions you started with.
>>>>
>>>>  We can tell you that functionally there wasn't a great deal of
>>>> difference between the later snapshots of 1.0 and 1.1.
>>>> The changes where largely structural i.e. a clean up of packages,
>>>> removal of redundant code. We did this largely because we needed to (this
>>>> code base is now over 5 years old) and we wanted to clean things up before
>>>> other projects started to work with the software.
>>>>
>>>>  Upgrading to biocache-service 1.1 and biocache-store shouldnt require
>>>> any changes to cassandra, but it may require and upgrade of SOLR. If this
>>>> is the case, you'll need to regenerate your index using the biocache
>>>> commandline tool. Upgrading to 1.1 shouldnt require any changes to
>>>> hubs-webapp if you've customised this component.
>>>>
>>>>  I'd really recommend move to 1.1 sooner rather than later as it'll
>>>> give you a stable baseline to work against.
>>>>
>>>>  Hope this helps,
>>>>
>>>>  Dave Martin
>>>> ALA
>>>>
>>>>  ------------------------------
>>>> *From:* Daniel Lins [daniel.lins at gmail.com]
>>>> *Sent:* 18 June 2014 15:54
>>>>
>>>> *To:* Martin, Dave (CES, Black Mountain)
>>>> *Cc:* ala-portal at lists.gbif.org; dos Remedios, Nick (CES, Black
>>>> Mountain); Pedro Corrêa; Nicholls, Miles (CES, Black Mountain)
>>>> *Subject:* Re: [Ala-portal] DwC-A loading problems
>>>>
>>>>     Hi Dave,
>>>>
>>>>  How can I update the Biocache-1.0-SNAPSHOT to the version 1.1? I
>>>> updated the biocache-store (biocache.jar) and the config file
>>>> (/data/biocache/conf/config.properties-biocache) but I still have problems.
>>>> Which other steps I need  to do? Apparently this new version of the
>>>> biocache configuration file generates impacts directly in my
>>>> Biocache-Services and Solr.
>>>>
>>>>  This update will generate some impacts in other components?
>>>>
>>>>  I cannot use the installation process based on the Vagrant/Ansible
>>>> because our environment is different and already have customizations. So I
>>>> would like to update the biocache with minimum impact, if possible. After
>>>> we will have to plan the update of the other components.
>>>>
>>>>  Can you advise me as to the best way forward?
>>>>
>>>>  Thanks!!
>>>>
>>>>  Regards,
>>>>
>>>>   Daniel Lins da Silva
>>>> (Mobile) 55 11 96144-4050
>>>>  Research Center on Biodiversity and Computing (Biocomp)
>>>> University of Sao Paulo, Brazil
>>>>  daniellins at usp.br
>>>> daniel.lins at gmail.com
>>>>
>>>>
>>>>
>>>> 2014-05-26 3:58 GMT-03:00 <David.Martin at csiro.au>:
>>>>
>>>>>  Thanks Daniel.
>>>>>
>>>>>  I'd recommend upgrading to 1.1 and I'd recommend installation with
>>>>> the ansible scripts. This will give you base line configuration.
>>>>> The scripts can be tested on a local machine using vagrant.
>>>>> The configuration between 1.0 and 1.1 changed significantly - removal
>>>>> of redundant, legacy properties, adoption of standard format for property
>>>>> names.
>>>>> Heres the template used for the configuration file in the ansible
>>>>> scripts:
>>>>>
>>>>>
>>>>> https://github.com/gbif/ala-install/blob/master/ansible/roles/biocache-service/templates/config/biocache-config.properties
>>>>>
>>>>>  Cheers
>>>>>
>>>>>  Dave Martin
>>>>> ALA
>>>>>
>>>>>  ------------------------------
>>>>> *From:* Daniel Lins [daniel.lins at gmail.com]
>>>>> *Sent:* 26 May 2014 15:02
>>>>> *To:* Martin, Dave (CES, Black Mountain)
>>>>> *Cc:* ala-portal at lists.gbif.org; dos Remedios, Nick (CES, Black
>>>>> Mountain); Pedro Corrêa; Nicholls, Miles (CES, Black Mountain)
>>>>> *Subject:* Re: [Ala-portal] DwC-A loading problems
>>>>>
>>>>>   Hi Dave,
>>>>>
>>>>>  When I ran the ingest command (ingest dr0), the system showed errors
>>>>> like these below. However, after the error messages, I ran the index
>>>>> command (index dr0), and data were published on the Portal.
>>>>>
>>>>>  2014-05-20 14:15:05,412 ERROR: [Grid] - cannot find GRID: /data/ala
>>>>> /data/layers/ready/diva/worldclim_bio_19
>>>>> 2014-05-20 14:15:05,414 ERROR: [Grid] - java.io.FileNotFoundException:
>>>>> /data/ala/data/layers/ready/diva/worldclim_bio_19.gri (No such file
>>>>> or directory)
>>>>> at java.io.RandomAccessFile.open(Native Method)
>>>>> at java.io.RandomAccessFile.<init>(RandomAccessFile.java:241)
>>>>> at java.io.RandomAccessFile.<init>(RandomAccessFile.java:122)
>>>>> at org.ala.layers.intersect.Grid.getValues3(Grid.java:1017)
>>>>> at org.ala.layers.intersect.SamplingThread.intersectGrid(
>>>>> SamplingThread.java:112)
>>>>> at org.ala.layers.intersect.SamplingThread.sample(SamplingThread.java:
>>>>> 97)
>>>>> at org.ala.layers.intersect.SamplingThread.run(SamplingThread.java:67)
>>>>>
>>>>>  2014-05-20 14:15:05,447 INFO : [Sampling] - ********* END - TEST
>>>>> BATCH SAMPLING FROM FILE ***************
>>>>> 2014-05-20 14:15:05,496 INFO : [Sampling] - Finished loading:
>>>>> /tmp/sampling-dr0.txt in 49ms
>>>>> 2014-05-20 14:15:05,496 INFO : [Sampling] - Removing temporary file:
>>>>> /tmp/sampling-dr0.txt
>>>>> 2014-05-20 14:15:05,553 INFO : [Consumer] - Initialising thread: 0
>>>>> 2014-05-20 14:15:05,575 INFO : [Consumer] - Initialising thread: 1
>>>>> 2014-05-20 14:15:05,575 INFO : [Consumer] - Initialising thread: 2
>>>>> 2014-05-20 14:15:05,577 INFO : [Consumer] - In thread: 0
>>>>> 2014-05-20 14:15:05,579 INFO : [Consumer] - Initialising thread: 3
>>>>> 2014-05-20 14:15:05,579 INFO : [ProcessWithActors] - Starting with
>>>>> dr0| endingwith dr0|~
>>>>> 2014-05-20 14:15:05,581 INFO : [Consumer] - In thread: 2
>>>>> 2014-05-20 14:15:05,581 INFO : [Consumer] - In thread: 1
>>>>> 2014-05-20 14:15:05,584 INFO : [Consumer] - In thread: 3
>>>>> 2014-05-20 14:15:05,592 INFO : [ProcessWithActors] - Initialised
>>>>> actors...
>>>>> 2014-05-20 14:15:05,647 INFO : [ProcessWithActors] - First rowKey
>>>>> processed: dr0|urn:lsid:icmbio.gov.br:icmbio.parnaso.occurrence:
>>>>> MA120999
>>>>> 2014-05-20 14:15:05,998 INFO : [ProcessWithActors] - Last row key
>>>>> processed: dr0|urn:lsid:icmbio.gov.br:icmbio.parnaso.occurrence:
>>>>> MA99991
>>>>> 2014-05-20 14:15:06,006 INFO : [ProcessWithActors] - Finished.
>>>>> 2014-05-20 14:15:06,015 INFO : [AttributionDAO] - Calling web service
>>>>> for dr0
>>>>> 2014-05-20 14:15:06,017 INFO : [Consumer] - Killing (Actor.act)
>>>>> thread: 3
>>>>> 2014-05-20 14:15:06,016 INFO : [Consumer] - Killing (Actor.act)
>>>>> thread: 2
>>>>> 2014-05-20 14:15:06,015 INFO : [Consumer] - Killing (Actor.act)
>>>>> thread: 1
>>>>> 2014-05-20 14:15:06,289 INFO : [AttributionDAO] - Looking up
>>>>> collectory web service for ICMBIO|PARNASO
>>>>> May 20, 2014 2:15:10 PM org.geotools.referencing.factory.epsg.ThreadedEpsgFactory
>>>>> <init>
>>>>> INFO: Setting the EPSG factory org.geotools.referencing.factory.epsg.DefaultFactory
>>>>> to a 1800000ms timeout
>>>>> May 20, 2014 2:15:10 PM org.geotools.referencing.factory.epsg.ThreadedEpsgFactory
>>>>> <init>
>>>>> INFO: Setting the EPSG factory org.geotools.referencing.factory.epsg.ThreadedHsqlEpsgFactory
>>>>> to a 1800000ms timeout
>>>>> May 20, 2014 2:15:10 PM org.geotools.referencing.factory.epsg.ThreadedHsqlEpsgFactory
>>>>> createDataSource
>>>>> INFO: Building new data source for org.geotools.referencing.factory.
>>>>> epsg.ThreadedHsqlEpsgFactory
>>>>> May 20, 2014 2:15:10 PM org.geotools.referencing.factory.epsg.ThreadedHsqlEpsgFactory
>>>>> createBackingStore
>>>>> INFO: Building backing store for org.geotools.referencing.factory.epsg
>>>>> .ThreadedHsqlEpsgFactory
>>>>> 2014-05-20 14:15:32,105 INFO : [Consumer] - Killing (Actor.act)
>>>>> thread: 0
>>>>> Indexing live with URL: null, and params: null&dataResource=dr0
>>>>> java.lang.NullPointerException
>>>>> at
>>>>> au.org.ala.util.CMD$.au$org$ala$util$CMD$$indexDataResourceLive$1(CommandLineTool.scala:371)
>>>>> at
>>>>> au.org.ala.util.CMD$$anonfun$executeCommand$2.apply(CommandLineTool.scala:90)
>>>>> at
>>>>> au.org.ala.util.CMD$$anonfun$executeCommand$2.apply(CommandLineTool.scala:86)
>>>>> at scala.collection.IndexedSeqOptimized$class.foreach(
>>>>> IndexedSeqOptimized.scala:33)
>>>>> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:105)
>>>>> at au.org.ala.util.CMD$.executeCommand(CommandLineTool.scala:86)
>>>>> at au.org.ala.util.CommandLineTool$.main(CommandLineTool.scala:26)
>>>>> at au.org.ala.util.CommandLineTool.main(CommandLineTool.scala)
>>>>>
>>>>>
>>>>>
>>>>>  Nowadays we use the* biocache-1.0-SNAPSHOT* in our environment. But
>>>>> in your last mail you mentioned the version *biocache-1.1-assembly*.
>>>>>
>>>>>  I did download of this newer version, but when I ran it (ingest dr0)
>>>>> in our environment the system showed many errors (see below).
>>>>>
>>>>>
>>>>>  log4j:WARN custom level class [org.ala.client.appender.RestLevel]
>>>>> not found.
>>>>>   Exception in thread "main" java.lang.ExceptionInInitializerError
>>>>> at
>>>>> au.org.ala.biocache.load.DataLoader$class.$init$(DataLoader.scala:28)
>>>>> at au.org.ala.biocache.load.Loader.<init>(Loader.scala:34)
>>>>> at au.org.ala.biocache.cmd.CMD$.executeCommand(CMD.scala:29)
>>>>> at
>>>>> au.org.ala.biocache.cmd.CommandLineTool$.main(CommandLineTool.scala:22)
>>>>> at au.org.ala.biocache.cmd.CommandLineTool.main(CommandLineTool.scala)
>>>>> Caused by: com.google.inject.CreationException: Guice creation errors:
>>>>>
>>>>>  1) No implementation for java.lang.Integer annotated with @com.google
>>>>> .inject.name.Named(value=cassandra.max.connections) was bound.
>>>>>   while locating java.lang.Integer annotated with @com.google.inject.
>>>>> name.Named(value=cassandra.max.connections)
>>>>>     for parameter 4 at
>>>>> au.org.ala.biocache.persistence.CassandraPersistenceManager.<init
>>>>> >(CassandraPersistenceManager.scala:24)
>>>>>   at au.org.ala.biocache.ConfigModule.configure(Config.scala:184)
>>>>>
>>>>>  2) No implementation for java.lang.Integer annotated with @com.google
>>>>> .inject.name.Named(value=cassandra.max.retries) was bound.
>>>>>   while locating java.lang.Integer annotated with @com.google.inject.
>>>>> name.Named(value=cassandra.max.retries)
>>>>>     for parameter 5 at
>>>>> au.org.ala.biocache.persistence.CassandraPersistenceManager.<init
>>>>> >(CassandraPersistenceManager.scala:24)
>>>>>   at au.org.ala.biocache.ConfigModule.configure(Config.scala:184)
>>>>>
>>>>>  3) No implementation for java.lang.Integer annotated with @com.google
>>>>> .inject.name.Named(value=cassandra.port) was bound.
>>>>>   while locating java.lang.Integer annotated with @com.google.inject.
>>>>> name.Named(value=cassandra.port)
>>>>>     for parameter 1 at
>>>>> au.org.ala.biocache.persistence.CassandraPersistenceManager.<init
>>>>> >(CassandraPersistenceManager.scala:24)
>>>>>   at au.org.ala.biocache.ConfigModule.configure(Config.scala:184)
>>>>>
>>>>>  4) No implementation for java.lang.Integer annotated with @com.google
>>>>> .inject.name.Named(value=thrift.operation.timeout) was bound.
>>>>>   while locating java.lang.Integer annotated with @com.google.inject.
>>>>> name.Named(value=thrift.operation.timeout)
>>>>>     for parameter 6 at
>>>>> au.org.ala.biocache.persistence.CassandraPersistenceManager.<init
>>>>> >(CassandraPersistenceManager.scala:24)
>>>>>   at au.org.ala.biocache.ConfigModule.configure(Config.scala:184)
>>>>>
>>>>>  5) No implementation for java.lang.String annotated with @com.google.
>>>>> inject.name.Named(value=cassandra.hosts) was bound.
>>>>>   while locating java.lang.String annotated with @com.google.inject.
>>>>> name.Named(value=cassandra.hosts)
>>>>>     for parameter 0 at
>>>>> au.org.ala.biocache.persistence.CassandraPersistenceManager.<init
>>>>> >(CassandraPersistenceManager.scala:24)
>>>>>   at au.org.ala.biocache.ConfigModule.configure(Config.scala:184)
>>>>>
>>>>>  6) No implementation for java.lang.String annotated with @com.google.
>>>>> inject.name.Named(value=cassandra.keyspace) was bound.
>>>>>   while locating java.lang.String annotated with @com.google.inject.
>>>>> name.Named(value=cassandra.keyspace)
>>>>>     for parameter 3 at
>>>>> au.org.ala.biocache.persistence.CassandraPersistenceManager.<init
>>>>> >(CassandraPersistenceManager.scala:24)
>>>>>   at au.org.ala.biocache.ConfigModule.configure(Config.scala:184)
>>>>>
>>>>>  7) No implementation for java.lang.String annotated with @com.google.
>>>>> inject.name.Named(value=cassandra.pool) was bound.
>>>>>   while locating java.lang.String annotated with @com.google.inject.
>>>>> name.Named(value=cassandra.pool)
>>>>>     for parameter 2 at
>>>>> au.org.ala.biocache.persistence.CassandraPersistenceManager.<init
>>>>> >(CassandraPersistenceManager.scala:24)
>>>>>   at au.org.ala.biocache.ConfigModule.configure(Config.scala:184)
>>>>>
>>>>>  8) No implementation for java.lang.String annotated with @com.google.
>>>>> inject.name.Named(value=exclude.sensitive.values) was bound.
>>>>>   while locating java.lang.String annotated with @com.google.inject.
>>>>> name.Named(value=exclude.sensitive.values)
>>>>>     for parameter 1 at au.org.ala.biocache.index.SolrIndexDAO.<init
>>>>> >(SolrIndexDAO.scala:28)
>>>>>   at au.org.ala.biocache.ConfigModule.configure(Config.scala:164)
>>>>>
>>>>>  9) No implementation for java.lang.String annotated with @com.google.
>>>>> inject.name.Named(value=extra.misc.fields) was bound.
>>>>>   while locating java.lang.String annotated with @com.google.inject.
>>>>> name.Named(value=extra.misc.fields)
>>>>>     for parameter 2 at au.org.ala.biocache.index.SolrIndexDAO.<init
>>>>> >(SolrIndexDAO.scala:28)
>>>>>   at au.org.ala.biocache.ConfigModule.configure(Config.scala:164)
>>>>>
>>>>>  10) No implementation for java.lang.String annotated with @com.google
>>>>> .inject.name.Named(value=solr.home) was bound.
>>>>>   while locating java.lang.String annotated with @com.google.inject.
>>>>> name.Named(value=solr.home)
>>>>>     for parameter 0 at au.org.ala.biocache.index.SolrIndexDAO.<init
>>>>> >(SolrIndexDAO.scala:28)
>>>>>   at au.org.ala.biocache.ConfigModule.configure(Config.scala:164)
>>>>>
>>>>>  10 errors
>>>>> at com.google.inject.internal.Errors.
>>>>> throwCreationExceptionIfErrorsExist(Errors.java:354)
>>>>> at com.google.inject.InjectorBuilder.initializeStatically(
>>>>> InjectorBuilder.java:152)
>>>>> at com.google.inject.InjectorBuilder.build(InjectorBuilder.java:105)
>>>>> at com.google.inject.Guice.createInjector(Guice.java:92)
>>>>> at com.google.inject.Guice.createInjector(Guice.java:69)
>>>>> at com.google.inject.Guice.createInjector(Guice.java:59)
>>>>> at au.org.ala.biocache.Config$.<init>(Config.scala:24)
>>>>> at au.org.ala.biocache.Config$.<clinit>(Config.scala)
>>>>> ... 5 more
>>>>>
>>>>>
>>>>>  Related to these specific issues (data update and incremental load),
>>>>> I will need to upgrade the biocache version (1.1 or newer) or I could
>>>>> work with the version 1.0-SNAPSHOT? If I update this version, I will have
>>>>> compatibility with the other components? How should I proceed?
>>>>>
>>>>> Which layer files should I include in my environment to run these
>>>>> tests?
>>>>>
>>>>>  Thanks!
>>>>>
>>>>>  Regards,
>>>>>
>>>>>   Daniel Lins da Silva
>>>>> (Mobile) 55 11 96144-4050
>>>>>  Research Center on Biodiversity and Computing (Biocomp)
>>>>> University of Sao Paulo, Brazil
>>>>>  daniellins at usp.br
>>>>> daniel.lins at gmail.com
>>>>>
>>>>>
>>>>> 2014-05-09 2:14 GMT-03:00 <David.Martin at csiro.au>:
>>>>>
>>>>>>  Thanks Daniel.
>>>>>>
>>>>>>  I've spotted the problem:
>>>>>>
>>>>>>  java -cp .:biocache.jar au.org.ala.util.DwcCSVLoader dr0 -l
>>>>>> dataset-updated.csv -b true
>>>>>>
>>>>>>  this bypasses lookups against the collectory for the metadata.
>>>>>>
>>>>>>  To load this dataset, you can use the biocache commandline tool
>>>>>> like so:
>>>>>>
>>>>>>   $ java -cp /usr/lib/biocache:/usr/lib/biocache/biocache
>>>>>> -store-1.1-assembly.jar -Xms2g -Xmx2g
>>>>>> au.org.ala.biocache.cmd.CommandLineTool
>>>>>>
>>>>>>
>>>>>>  ----------------------------
>>>>>>
>>>>>> | Biocache management tool |
>>>>>>
>>>>>> ----------------------------
>>>>>>
>>>>>> Please supply a command or hit ENTER to view command list.
>>>>>>
>>>>>> biocache> ingest dr8
>>>>>>
>>>>>>  This will:
>>>>>>
>>>>>>  1) Retrieve the metadata from the configured instance of the
>>>>>> collectory
>>>>>> 2) Load, process, sample (if there are layers configured and
>>>>>> available) and index
>>>>>>
>>>>>>  Cheers
>>>>>>
>>>>>>  Dave
>>>>>>
>>>>>>   ------------------------------
>>>>>> *From:* Daniel Lins [daniel.lins at gmail.com]
>>>>>> *Sent:* 09 May 2014 14:27
>>>>>>
>>>>>> *To:* Martin, Dave (CES, Black Mountain)
>>>>>> *Cc:* ala-portal at lists.gbif.org; dos Remedios, Nick (CES, Black
>>>>>> Mountain); Pedro Corrêa; Nicholls, Miles (CES, Black Mountain)
>>>>>> *Subject:* Re: [Ala-portal] DwC-A loading problems
>>>>>>
>>>>>>    David,
>>>>>>
>>>>>>  The dr0 configuration:
>>>>>>
>>>>>>  https://www.dropbox.com/s/lsy11jadwmyghjj/collectoryConfig1.png
>>>>>>
>>>>>>  Sorry, but this server doesn't have external access yet.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2014-05-09 1:06 GMT-03:00 <David.Martin at csiro.au>:
>>>>>>
>>>>>>>  As an example of what it should look like, see:
>>>>>>>
>>>>>>>
>>>>>>> http://ala-demo.gbif.org/collectory/dataResource/edit/dr8?page=contribution
>>>>>>>
>>>>>>>
>>>>>>>  ------------------------------
>>>>>>> *From:* Daniel Lins [daniel.lins at gmail.com]
>>>>>>>
>>>>>>> *Sent:* 09 May 2014 13:59
>>>>>>> *To:* Martin, Dave (CES, Black Mountain)
>>>>>>> *Cc:* ala-portal at lists.gbif.org; dos Remedios, Nick (CES, Black
>>>>>>> Mountain); Pedro Corrêa; Nicholls, Miles (CES, Black Mountain)
>>>>>>>  *Subject:* Re: [Ala-portal] DwC-A loading problems
>>>>>>>
>>>>>>>    Thanks David,
>>>>>>>
>>>>>>>  We use the DwC term "occurrenceID" to identify the records. It's a
>>>>>>> unique key.
>>>>>>>
>>>>>>>  However, when I reload a dataset to update some DwC terms of the
>>>>>>> records, the system duplicates this data (keeps the old record and creates
>>>>>>> another with changes).
>>>>>>>
>>>>>>>  For instance (update of locality).
>>>>>>>
>>>>>>>  Load 1 ($ java -cp .:biocache.jar au.org.ala.util.DwcCSVLoader dr0
>>>>>>> -l dataset.csv -b true)
>>>>>>>
>>>>>>>  {OccurrenceID: 1, municipality: Sao Paulo, ...},
>>>>>>> {OccurrenceID: 2, municipality: Sao Paulo, ...}
>>>>>>>
>>>>>>>  Process 1 (biocache$ process dr0)
>>>>>>> Index 1 (biocache$ index dr0)
>>>>>>>
>>>>>>>  Load 2 (updated records and new records) (($ java -cp .:biocache.jar
>>>>>>> au.org.ala.util.DwcCSVLoader dr0 -l dataset-updated.csv -b true)
>>>>>>>
>>>>>>>  {OccurrenceID: 1, municipality: Rio de Janeiro, ...},
>>>>>>> {OccurrenceID: 2, municipality: Rio de Janeiro, ...},
>>>>>>>  {OccurrenceID: 3, municipality: Sao Paulo, ...}
>>>>>>>
>>>>>>>  Process 2 (biocache$ process dr0)
>>>>>>> Index 2 (biocache$ index dr0)
>>>>>>>
>>>>>>>  Results shown by ALA:
>>>>>>>
>>>>>>>  {OccurrenceID: 1, municipality: Sao Paulo, ...},
>>>>>>> {OccurrenceID: 2, municipality: Sao Paulo, ...},
>>>>>>>   {OccurrenceID: 1, municipality: Rio de Janeiro, ...},
>>>>>>> {OccurrenceID: 2, municipality: Rio de Janeiro, ...}
>>>>>>>  {OccurrenceID: 3, municipality: Sao Paulo, ...}
>>>>>>>
>>>>>>>  But I expected:
>>>>>>>
>>>>>>>  {OccurrenceID: 1, municipality: Rio de Janeiro, ...},
>>>>>>>  {OccurrenceID: 2, municipality: Rio de Janeiro, ...}
>>>>>>>  {OccurrenceID: 3, municipality: Sao Paulo, ...}
>>>>>>>
>>>>>>>  I need to delete (delete-resource function) existing data before
>>>>>>> the reload? If no, what I did wrong to generate this data duplication?
>>>>>>>
>>>>>>>  Thanks!
>>>>>>>
>>>>>>>
>>>>>>>  Regards,
>>>>>>>
>>>>>>>   Daniel Lins da Silva
>>>>>>>  (Mobile) 55 11 96144-4050
>>>>>>>  Research Center on Biodiversity and Computing (Biocomp)
>>>>>>> University of Sao Paulo, Brazil
>>>>>>>  daniellins at usp.br
>>>>>>>  daniel.lins at gmail.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2014-05-07 0:46 GMT-03:00 <David.Martin at csiro.au>:
>>>>>>>
>>>>>>>>   Thanks Daniel. Natasha has now left the ALA.
>>>>>>>>
>>>>>>>>  The uniqueness of records is determined by information stored in
>>>>>>>> the collectory. See screenshot [1].
>>>>>>>> By default, "catalogNumber" is used but you can change this to any
>>>>>>>> number of fields that should be stable in the data.
>>>>>>>> Using unstable fields for the ID isn't recommended (e.g.
>>>>>>>> scientificName).  To update the records, the process is to just
>>>>>>>> re-load the dataset.
>>>>>>>>
>>>>>>>>  Automatically loaded - this isnt in use and we may remove from
>>>>>>>> the UI in future iterations.
>>>>>>>> Incremental Load - affects the sample/process/index steps to only
>>>>>>>> run these against the new records.  Load is always incremental based on the
>>>>>>>> key field(s) but if the incremental load box isn’t checked it runs the
>>>>>>>> sample/process/index steps against the whole data set. This can cause a
>>>>>>>> large processing overhead when there’s a minor update to a large data set.
>>>>>>>>
>>>>>>>>  Cheers
>>>>>>>>
>>>>>>>>  Dave Martin
>>>>>>>>  ALA
>>>>>>>>
>>>>>>>>  [1] http://bit.ly/1g72HFN
>>>>>>>>
>>>>>>>>  ------------------------------
>>>>>>>> *From:* Daniel Lins [daniel.lins at gmail.com]
>>>>>>>> *Sent:* 05 May 2014 15:39
>>>>>>>> *To:* Quimby, Natasha (CES, Black Mountain)
>>>>>>>> *Cc:* ala-portal at lists.gbif.org; dos Remedios, Nick (CES, Black
>>>>>>>> Mountain); Martin, Dave (CES, Black Mountain); Pedro Corrêa
>>>>>>>> *Subject:* Re: [Ala-portal] DwC-A loading problems
>>>>>>>>
>>>>>>>>     Hi Natasha,
>>>>>>>>
>>>>>>>>  I managed to import the DwC-A file following the steps reported
>>>>>>>> in the previous email. Thank you!
>>>>>>>>
>>>>>>>>  However, when I tried to update some metadata of an occurrence
>>>>>>>> record (already stored in the database), the system created a new record
>>>>>>>> with these duplicated information. So I started to have several records
>>>>>>>> with the same occurrenceID (I did set in the data resource
>>>>>>>> configuration to use "OcurrenceID" to uniquely identify a record).
>>>>>>>>
>>>>>>>>  How can I update existing records in the database? For instance,
>>>>>>>> the location's metadata of an occurrence record stored in my database?
>>>>>>>>
>>>>>>>>  I also would like to better understand the behavior of the
>>>>>>>> properties "Automatically loaded" and "Incremental Load".
>>>>>>>>
>>>>>>>>  Thanks!!
>>>>>>>>
>>>>>>>>  Regards,
>>>>>>>>
>>>>>>>>   Daniel Lins da Silva
>>>>>>>>  (Mobile) 55 11 96144-4050
>>>>>>>>  Research Center on Biodiversity and Computing (Biocomp)
>>>>>>>> University of Sao Paulo, Brazil
>>>>>>>>  daniellins at usp.br
>>>>>>>>  daniel.lins at gmail.com
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-04-28 3:52 GMT-03:00 Daniel Lins <daniel.lins at gmail.com>:
>>>>>>>>
>>>>>>>>> Thanks Natasha!
>>>>>>>>>
>>>>>>>>>  I will try your recommendations. Once finished, I will contact
>>>>>>>>> you.
>>>>>>>>>
>>>>>>>>>  Regards
>>>>>>>>>
>>>>>>>>>  Daniel Lins da Silva
>>>>>>>>>  (Mobile) 55 11 96144-4050
>>>>>>>>>  Research Center on Biodiversity and Computing (Biocomp)
>>>>>>>>> University of Sao Paulo, Brazil
>>>>>>>>>  daniellins at usp.br
>>>>>>>>>  daniel.lins at gmail.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  2014-04-28 3:26 GMT-03:00 <Natasha.Quimby at csiro.au>:
>>>>>>>>>
>>>>>>>>>  Hi Daniel,
>>>>>>>>>>
>>>>>>>>>>  When you specify a local DwcA Load the archive needs to be
>>>>>>>>>> unzipped. Try unzipping *2f676abc-4503-489e-8f0c-fcb6e1bc554b.zip
>>>>>>>>>> *and then running the following:
>>>>>>>>>>  s*udo** java -cp .:biocache.jar au.org.ala.util.DwCALoader dr7
>>>>>>>>>> -l
>>>>>>>>>> /data/collectory/upload/1398658607824/2f676abc-4503-489e-8f0c-fcb6e1bc554b*
>>>>>>>>>>
>>>>>>>>>>  If you configure the collectory to provide the dwca the biocache
>>>>>>>>>> automatically unzips the archive for you.  You would need to configure dr7
>>>>>>>>>> with the following connection parameters:
>>>>>>>>>>
>>>>>>>>>>  "protocol":"DwCA"
>>>>>>>>>> "termsForUniqueKey":["occurrenceID"],
>>>>>>>>>> "url":"file:////data/collectory/upload/
>>>>>>>>>> 1398658607824/2f676abc-4503-489e-8f0c-fcb6e1bc554b.zip"
>>>>>>>>>>
>>>>>>>>>>  You could then load the resource by:
>>>>>>>>>>  s*udo** java -cp .:biocache.jar au.org.ala.util.DwCALoader dr7*
>>>>>>>>>>
>>>>>>>>>>  If you continue to have issues please let us know.
>>>>>>>>>>
>>>>>>>>>>  Hope that this helps.
>>>>>>>>>>
>>>>>>>>>>  Regards
>>>>>>>>>> Natasha
>>>>>>>>>>
>>>>>>>>>>   From: Daniel Lins <daniel.lins at gmail.com>
>>>>>>>>>> Date: Monday, 28 April 2014 3:54 PM
>>>>>>>>>> To: "ala-portal at lists.gbif.org" <ala-portal at lists.gbif.org
>>>>>>>>>> <ala-portal at lists.gbif.org> <ala-portal at lists.gbif.org>
>>>>>>>>>> <ala-portal at lists.gbif.org> <ala-portal at lists.gbif.org>
>>>>>>>>>> <ala-portal at lists.gbif.org> <ala-portal at lists.gbif.org>>, "dos
>>>>>>>>>> Remedios, Nick (CES, Black Mountain)" <Nick.Dosremedios at csiro.au
>>>>>>>>>> >, "Martin, Dave (CES, Black Mountain)" <David.Martin at csiro.au>
>>>>>>>>>> Subject: [Ala-portal] DwC-A loading problems
>>>>>>>>>>
>>>>>>>>>>   Hi Nick and Dave,
>>>>>>>>>>
>>>>>>>>>>  We are having some problems in Biocache during the upload of
>>>>>>>>>> DwC-A files.
>>>>>>>>>>
>>>>>>>>>>  As shown below, after run the method
>>>>>>>>>> "au.org.ala.util.DwCALoader", our system returns the error message "Exception
>>>>>>>>>> in thread "main" org.gbif.dwc.text.UnkownDelimitersException:
>>>>>>>>>> Unable to detect field delimiter"
>>>>>>>>>>
>>>>>>>>>>  I accomplished tests using DwC-A files with tab-delimited text
>>>>>>>>>> files and comma-delimited text files. In both cases the error generated was
>>>>>>>>>> the same.
>>>>>>>>>>
>>>>>>>>>>  What causes these problems? (** CSV Loader works great)
>>>>>>>>>>
>>>>>>>>>>  *tab-delimited file test*
>>>>>>>>>>
>>>>>>>>>>  poliusp at poliusp-VirtualBox:~/dev/biocache$ s*udo java -cp
>>>>>>>>>> .:biocache.jar au.org.ala.util.DwCALoader dr7 -l
>>>>>>>>>> /data/collectory/upload/1398658607824/2f676abc-4503-489e-8f0c-fcb6e1bc554b.zip*
>>>>>>>>>> 2014-04-28 01:44:02,837 INFO : [ConfigModule] - Loading
>>>>>>>>>> configuration from /data/biocache/config/biocache-config.properties
>>>>>>>>>> 2014-04-28 01:44:03,090 INFO : [ConfigModule] - Initialise SOLR
>>>>>>>>>> 2014-04-28 01:44:03,103 INFO : [ConfigModule] - Initialise name
>>>>>>>>>> matching indexes
>>>>>>>>>> 2014-04-28 01:44:03,605 INFO : [ConfigModule] - Initialise
>>>>>>>>>> persistence manager
>>>>>>>>>> 2014-04-28 01:44:03,606 INFO : [ConfigModule] - Configure
>>>>>>>>>> complete
>>>>>>>>>> Loading archive /data/collectory
>>>>>>>>>> /upload/1398658607824/2f676abc-4503-489e-8f0c-fcb6e1bc554b.zip
>>>>>>>>>> for resource dr7 with unique terms List(dwc:occurrenceID)
>>>>>>>>>> stripping spaces false incremental false testing false
>>>>>>>>>> *Exception in thread "main"
>>>>>>>>>> org.gbif.dwc.text.UnkownDelimitersException: Unable to detect field
>>>>>>>>>> delimiter*
>>>>>>>>>>         at org.gbif.file.CSVReaderFactory.buildArchiveFile(
>>>>>>>>>> CSVReaderFactory.java:129)
>>>>>>>>>>         at org.gbif.file.CSVReaderFactory.build(
>>>>>>>>>> CSVReaderFactory.java:46)
>>>>>>>>>>         at org.gbif.dwc.text.ArchiveFactory.readFileHeaders(
>>>>>>>>>> ArchiveFactory.java:344)
>>>>>>>>>>         at org.gbif.dwc.text.ArchiveFactory.openArchive(
>>>>>>>>>> ArchiveFactory.java:289)
>>>>>>>>>>         at
>>>>>>>>>> au.org.ala.util.DwCALoader.loadArchive(DwCALoader.scala:129)
>>>>>>>>>>         at
>>>>>>>>>> au.org.ala.util.DwCALoader.loadLocal(DwCALoader.scala:106)
>>>>>>>>>>         at au.org.ala.util.DwCALoader$.main(DwCALoader.scala:52)
>>>>>>>>>>         at au.org.ala.util.DwCALoader.main(DwCALoader.scala)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  *comma-delimited file test*
>>>>>>>>>>
>>>>>>>>>>  poliusp at poliusp-VirtualBox:~/dev/biocache$ *sudo java -cp
>>>>>>>>>> .:biocache.jar au.org.ala.util.DwCALoader dr7 -l ./dwca-teste3.zip*
>>>>>>>>>> 2014-04-28 01:56:04,683 INFO : [ConfigModule] - Loading
>>>>>>>>>> configuration from /data/biocache/config/biocache-config.properties
>>>>>>>>>> 2014-04-28 01:56:04,940 INFO : [ConfigModule] - Initialise SOLR
>>>>>>>>>> 2014-04-28 01:56:04,951 INFO : [ConfigModule] - Initialise name
>>>>>>>>>> matching indexes
>>>>>>>>>> 2014-04-28 01:56:05,437 INFO : [ConfigModule] - Initialise
>>>>>>>>>> persistence manager
>>>>>>>>>> 2014-04-28 01:56:05,438 INFO : [ConfigModule] - Configure
>>>>>>>>>> complete
>>>>>>>>>> Loading archive ./dwca-teste3.zip for resource dr7 with unique
>>>>>>>>>> terms List(dwc:occurrenceID) stripping spaces false incremental
>>>>>>>>>> false testing false
>>>>>>>>>> *Exception in thread "main"
>>>>>>>>>> org.gbif.dwc.text.UnkownDelimitersException: Unable to detect field
>>>>>>>>>> delimiter*
>>>>>>>>>>         at org.gbif.file.CSVReaderFactory.buildArchiveFile(
>>>>>>>>>> CSVReaderFactory.java:129)
>>>>>>>>>>         at org.gbif.file.CSVReaderFactory.build(
>>>>>>>>>> CSVReaderFactory.java:46)
>>>>>>>>>>         at org.gbif.dwc.text.ArchiveFactory.readFileHeaders(
>>>>>>>>>> ArchiveFactory.java:344)
>>>>>>>>>>         at org.gbif.dwc.text.ArchiveFactory.openArchive(
>>>>>>>>>> ArchiveFactory.java:289)
>>>>>>>>>>         at
>>>>>>>>>> au.org.ala.util.DwCALoader.loadArchive(DwCALoader.scala:129)
>>>>>>>>>>         at
>>>>>>>>>> au.org.ala.util.DwCALoader.loadLocal(DwCALoader.scala:106)
>>>>>>>>>>         at au.org.ala.util.DwCALoader$.main(DwCALoader.scala:52)
>>>>>>>>>>         at au.org.ala.util.DwCALoader.main(DwCALoader.scala)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  Thanks!
>>>>>>>>>>
>>>>>>>>>>  Regards.
>>>>>>>>>> --
>>>>>>>>>>  Daniel Lins da Silva
>>>>>>>>>> (Mobile) 55 11 96144-4050
>>>>>>>>>>  Research Center on Biodiversity and Computing (Biocomp)
>>>>>>>>>> University of Sao Paulo, Brazil
>>>>>>>>>>  daniellins at usp.br
>>>>>>>>>> daniel.lins at gmail.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  --
>>>>>>>>> Daniel Lins da Silva
>>>>>>>>> (Cel) 11 6144-4050
>>>>>>>>> daniel.lins at gmail.com
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  --
>>>>>>>> Daniel Lins da Silva
>>>>>>>> (Cel) 11 6144-4050
>>>>>>>> daniel.lins at gmail.com
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>>> Daniel Lins da Silva
>>>>>>> (Cel) 11 6144-4050
>>>>>>> daniel.lins at gmail.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>  --
>>>>>> Daniel Lins da Silva
>>>>>> (Cel) 11 6144-4050
>>>>>> daniel.lins at gmail.com
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  --
>>>>> Daniel Lins da Silva
>>>>> (Cel) 11 6144-4050
>>>>> daniel.lins at gmail.com
>>>>>
>>>>
>>>>
>>>>
>>>>  --
>>>> Daniel Lins da Silva
>>>> (Cel) 11 6144-4050
>>>> daniel.lins at gmail.com
>>>>
>>>
>>>
>>>
>>>  --
>>> Daniel Lins da Silva
>>> (Cel) 11 6144-4050
>>> daniel.lins at gmail.com
>>>
>>
>>
>>
>>  --
>> Daniel Lins da Silva
>> (Cel) 11 6144-4050
>> daniel.lins at gmail.com
>>
>
>
>
>  --
> Daniel Lins da Silva
> (Cel) 11 6144-4050
> daniel.lins at gmail.com
>



-- 
Daniel Lins da Silva
(Cel) 11 6144-4050
daniel.lins at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.gbif.org/pipermail/ala-portal/attachments/20140627/36591aba/attachment-0001.html 


More information about the Ala-portal mailing list