DwC-A Loading and Biocache Methods.
Hi guys,
I am having troubles to load DwC-A files in Biocache store. The error occurs during the archive loading (The error message is shown below).
*$ sudo java -cp .:biocache.jar au.org.ala.util.DwCALoader dr3 -l dwca-teste3.zip * *2014-03-25 02:40:50,769 INFO : [ConfigModule] - Loading configuration from /data/biocache/config/biocache-config.properties* *2014-03-25 02:40:51,031 INFO : [ConfigModule] - Initialise SOLR* *2014-03-25 02:40:51,035 INFO : [ConfigModule] - Initialise name matching indexes* *2014-03-25 02:40:51,539 INFO : [ConfigModule] - Initialise persistence manager* *2014-03-25 02:40:51,541 INFO : [ConfigModule] - Configure complete* *Loading archive dwca-teste3.zip for resource dr3 with unique terms List(dwc:occurrenceID) stripping spaces false incremental false testing false* *Exception in thread "main" org.gbif.dwc.text.UnkownDelimitersException: Unable to detect field delimiter* * at org.gbif.file.CSVReaderFactory.buildArchiveFile(CSVReaderFactory.java:129)* * at org.gbif.file.CSVReaderFactory.build(CSVReaderFactory.java:46)* * at org.gbif.dwc.text.ArchiveFactory.readFileHeaders(ArchiveFactory.java:344)* * at org.gbif.dwc.text.ArchiveFactory.openArchive(ArchiveFactory.java:289)* * at au.org.ala.util.DwCALoader.loadArchive(DwCALoader.scala:129)* * at au.org.ala.util.DwCALoader.loadLocal(DwCALoader.scala:106)* * at au.org.ala.util.DwCALoader$.main(DwCALoader.scala:52)* * at au.org.ala.util.DwCALoader.main(DwCALoader.scala)*
I used a DwC-A file with comma delimiter and another file with tab delimiter. But both files generated the same error.
During the tests, I replaced the biocache.jar file from the other .jar file sent by Natasha in a previous email ( http://maven.ala.org.au/repository/au/org/ala/biocache-store/1.0-SNAPSHOT/bi...) but I had the same problem.
I also verified that list() and describe() methods of the Biocache are producing some errors too.
These problems may be related to the *registryURL* property. When *registryURL= http://192.168.15.132:8080/collectory http://192.168.15.132:8080/collectory*, I have the errors below:
*biocache> list* *2014-03-25 03:09:24,305 INFO : [ConfigModule] - Loading configuration from /data/biocache/config/biocache-config.properties* *2014-03-25 03:09:24,586 INFO : [ConfigModule] - Initialise SOLR* *2014-03-25 03:09:24,591 INFO : [ConfigModule] - Initialise name matching indexes* *2014-03-25 03:09:25,133 INFO : [ConfigModule] - Initialise persistence manager* *2014-03-25 03:09:25,143 INFO : [ConfigModule] - Configure complete* *java.net.ConnectException: Connection refused* * at java.net.PlainSocketImpl.socketConnect(Native Method)* * at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)* * at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)* * at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)* * at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)* * at java.net.Socket.connect(Socket.java:579)* * at java.net.Socket.connect(Socket.java:528)* * at sun.net.NetworkClient.doConnect(NetworkClient.java:180)* * at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)* * at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)* * at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)* * at sun.net.www.http.HttpClient.New(HttpClient.java:308)* * at sun.net.www.http.HttpClient.New(HttpClient.java:326)* * at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:996)* * at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932)* * at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)* * at sun.net.www.protocol.http.HttpURLConnection.followRedirect(HttpURLConnection.java:2398)* * at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1557)* * at java.net.URL.openStream(URL.java:1037)* * at scala.io.Source$.fromURL(Source.scala:140)* * at scala.io.Source$.fromURL(Source.scala:130)* * at au.org.ala.util.Loader.printResourceList(Loader.scala:52)* * at au.org.ala.util.CMD$.executeCommand(CommandLineTool.scala:78)* * at au.org.ala.util.CommandLineTool$.main(CommandLineTool.scala:26)* * at au.org.ala.util.CommandLineTool.main(CommandLineTool.scala)*
*biocache> describe dr3* *UID: dr3* *This data resource was last checked None* *Protocol: DwCA* *URL: http://192.168.15.132:8080/collectory/upload/1395634757631/dwca-teste3.zip http://192.168.15.132:8080/collectory/upload/1395634757631/dwca-teste3.zip* *Unique terms: occurrenceID* *url: http://192.168.15.132:8080/collectory/upload/1395634757631/dwca-teste3.zip http://192.168.15.132:8080/collectory/upload/1395634757631/dwca-teste3.zip* *java.lang.ClassCastException: java.lang.Boolean cannot be cast to java.lang.String* * at au.org.ala.util.Loader$$anonfun$describeResource$1$$anonfun$apply$1.apply(Loader.scala:45)* * at au.org.ala.util.Loader$$anonfun$describeResource$1$$anonfun$apply$1.apply(Loader.scala:45)* * at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224)* * at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)* * at au.org.ala.util.Loader$$anonfun$describeResource$1.apply(Loader.scala:45)* * at au.org.ala.util.Loader$$anonfun$describeResource$1.apply(Loader.scala:38)* * at scala.collection.immutable.List.foreach(List.scala:309)* * at au.org.ala.util.Loader.describeResource(Loader.scala:38)* * at au.org.ala.util.CMD$.executeCommand(CommandLineTool.scala:77)* * at au.org.ala.util.CommandLineTool$.main(CommandLineTool.scala:26)* * at au.org.ala.util.CommandLineTool.main(CommandLineTool.scala)*
When *registryURL= http://192.168.15.132:8080/collectory/ws http://192.168.15.132:8080/collectory/ws*, I have the below errors :
*biocache> list* *2014-03-25 03:22:52,278 INFO : [ConfigModule] - Loading configuration from /data/biocache/config/biocache-config.properties* *2014-03-25 03:22:52,526 INFO : [ConfigModule] - Initialise SOLR* *2014-03-25 03:22:52,531 INFO : [ConfigModule] - Initialise name matching indexes* *2014-03-25 03:22:53,051 INFO : [ConfigModule] - Initialise persistence manager* *2014-03-25 03:22:53,060 INFO : [ConfigModule] - Configure complete* * -----------------------------------------------------------------------------------------------------------------------* * | name | uri | uid |* * |---------------------------------------------------------------------------------------------------------------------|* * | Teste 10 | http://192.168.15.132:8080/collectory/ws/dataResource/dr4 http://192.168.15.132:8080/collectory/ws/dataResource/dr4 | dr4 |* * | Teste 11 | http://192.168.15.132:8080/collectory/ws/dataResource/dr0 http://192.168.15.132:8080/collectory/ws/dataResource/dr0 | dr0 |* * | Teste 12 | http://192.168.15.132:8080/collectory/ws/dataResource/dr3 http://192.168.15.132:8080/collectory/ws/dataResource/dr3 | dr3 |* * | Teste 2 | http://192.168.15.132:8080/collectory/ws/dataResource/dr2 http://192.168.15.132:8080/collectory/ws/dataResource/dr2 | dr2 |* * | Teste 3 | http://192.168.15.132:8080/collectory/ws/dataResource/dr1 http://192.168.15.132:8080/collectory/ws/dataResource/dr1 | dr1 |* * -----------------------------------------------------------------------------------------------------------------------*
*biocache> describe dr3* *java.io.FileNotFoundException: http://192.168.15.132:8080/collectory/ws/ws/dataResource/dr3.json http://192.168.15.132:8080/collectory/ws/ws/dataResource/dr3.json* * at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624)* * at java.net.URL.openStream(URL.java:1037)* * at scala.io.Source$.fromURL(Source.scala:140)* * at scala.io.Source$.fromURL(Source.scala:130)* * at au.org.ala.biocache.DataLoader$class.getDataResourceDetailsAsMap(dataimport.scala:99)* * at au.org.ala.util.Loader.getDataResourceDetailsAsMap(Loader.scala:31)* * at au.org.ala.biocache.DataLoader$class.retrieveConnectionParameters(dataimport.scala:116)* * at au.org.ala.util.Loader.retrieveConnectionParameters(Loader.scala:31)* * at au.org.ala.util.Loader$$anonfun$describeResource$1.apply(Loader.scala:39)* * at au.org.ala.util.Loader$$anonfun$describeResource$1.apply(Loader.scala:38)* * at scala.collection.immutable.List.foreach(List.scala:309)* * at au.org.ala.util.Loader.describeResource(Loader.scala:38)* * at au.org.ala.util.CMD$.executeCommand(CommandLineTool.scala:77)* * at au.org.ala.util.CommandLineTool$.main(CommandLineTool.scala:26)* * at au.org.ala.util.CommandLineTool.main(CommandLineTool.scala)*
How could I do the loading of DwC-A files? These problems are related?
Thanks!
Cheers,
Thanks Daniel.
What svn revision version of biocache-store are you working against? I think the problems you have with bad URLs are fixed on the current trunk. Your registryUrl property in the external configuration shouldn't have a "/ws" suffix.
As I've mentioned in previous threads, we have some work to do at the ALA to have better versioning of components and we'll be posting an update on this to this list soon.
Regarding the DWCA error - Im not sure what the issue is here. I suggest posting the meta.xml. Theres a tool here (that I havent tested recently) for validating an archive:
http://tools.gbif.org/dwca-validator/
Cheers
Dave Martin ALA
________________________________ From: ala-portal-bounces@lists.gbif.org [ala-portal-bounces@lists.gbif.org] on behalf of Daniel Lins [daniel.lins@gmail.com] Sent: 25 March 2014 17:42 To: ala-portal@lists.gbif.org Subject: [Ala-portal] DwC-A Loading and Biocache Methods.
Hi guys,
I am having troubles to load DwC-A files in Biocache store. The error occurs during the archive loading (The error message is shown below).
$ sudo java -cp .:biocache.jar au.org.ala.util.DwCALoader dr3 -l dwca-teste3.zip 2014-03-25 02:40:50,769 INFO : [ConfigModule] - Loading configuration from /data/biocache/config/biocache-config.properties 2014-03-25 02:40:51,031 INFO : [ConfigModule] - Initialise SOLR 2014-03-25 02:40:51,035 INFO : [ConfigModule] - Initialise name matching indexes 2014-03-25 02:40:51,539 INFO : [ConfigModule] - Initialise persistence manager 2014-03-25 02:40:51,541 INFO : [ConfigModule] - Configure complete Loading archive dwca-teste3.zip for resource dr3 with unique terms List(dwc:occurrenceID) stripping spaces false incremental false testing false Exception in thread "main" org.gbif.dwc.text.UnkownDelimitersException: Unable to detect field delimiter at org.gbif.file.CSVReaderFactory.buildArchiveFile(CSVReaderFactory.java:129) at org.gbif.file.CSVReaderFactory.build(CSVReaderFactory.java:46) at org.gbif.dwc.text.ArchiveFactory.readFileHeaders(ArchiveFactory.java:344) at org.gbif.dwc.text.ArchiveFactory.openArchive(ArchiveFactory.java:289) at au.org.ala.util.DwCALoader.loadArchive(DwCALoader.scala:129) at au.org.ala.util.DwCALoader.loadLocal(DwCALoader.scala:106) at au.org.ala.util.DwCALoader$.main(DwCALoader.scala:52) at au.org.ala.util.DwCALoader.main(DwCALoader.scala)
I used a DwC-A file with comma delimiter and another file with tab delimiter. But both files generated the same error.
During the tests, I replaced the biocache.jar file from the other .jar file sent by Natasha in a previous email (http://maven.ala.org.au/repository/au/org/ala/biocache-store/1.0-SNAPSHOT/bi...) but I had the same problem.
I also verified that list() and describe() methods of the Biocache are producing some errors too.
These problems may be related to the registryURL property. When registryURL= http://192.168.15.132:8080/collectory, I have the errors below:
biocache> list 2014-03-25 03:09:24,305 INFO : [ConfigModule] - Loading configuration from /data/biocache/config/biocache-config.properties 2014-03-25 03:09:24,586 INFO : [ConfigModule] - Initialise SOLR 2014-03-25 03:09:24,591 INFO : [ConfigModule] - Initialise name matching indexes 2014-03-25 03:09:25,133 INFO : [ConfigModule] - Initialise persistence manager 2014-03-25 03:09:25,143 INFO : [ConfigModule] - Configure complete java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at sun.net.NetworkClient.doConnect(NetworkClient.java:180) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.<init>(HttpClient.java:211) at sun.net.www.http.HttpClient.New(HttpClient.java:308) at sun.net.www.http.HttpClient.New(HttpClient.java:326) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:996) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850) at sun.net.www.protocol.http.HttpURLConnection.followRedirect(HttpURLConnection.java:2398) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1557) at java.net.URL.openStream(URL.java:1037) at scala.io.Source$.fromURL(Source.scala:140) at scala.io.Source$.fromURL(Source.scala:130) at au.org.ala.util.Loader.printResourceList(Loader.scala:52) at au.org.ala.util.CMD$.executeCommand(CommandLineTool.scala:78) at au.org.ala.util.CommandLineTool$.main(CommandLineTool.scala:26) at au.org.ala.util.CommandLineTool.main(CommandLineTool.scala)
biocache> describe dr3 UID: dr3 This data resource was last checked None Protocol: DwCA URL: http://192.168.15.132:8080/collectory/upload/1395634757631/dwca-teste3.zip Unique terms: occurrenceID url: http://192.168.15.132:8080/collectory/upload/1395634757631/dwca-teste3.zip java.lang.ClassCastException: java.lang.Boolean cannot be cast to java.lang.String at au.org.ala.util.Loader$$anonfun$describeResource$1$$anonfun$apply$1.apply(Loader.scala:45) at au.org.ala.util.Loader$$anonfun$describeResource$1$$anonfun$apply$1.apply(Loader.scala:45) at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at au.org.ala.util.Loader$$anonfun$describeResource$1.apply(Loader.scala:45) at au.org.ala.util.Loader$$anonfun$describeResource$1.apply(Loader.scala:38) at scala.collection.immutable.List.foreach(List.scala:309) at au.org.ala.util.Loader.describeResource(Loader.scala:38) at au.org.ala.util.CMD$.executeCommand(CommandLineTool.scala:77) at au.org.ala.util.CommandLineTool$.main(CommandLineTool.scala:26) at au.org.ala.util.CommandLineTool.main(CommandLineTool.scala)
When registryURL= http://192.168.15.132:8080/collectory/ws, I have the below errors :
biocache> list 2014-03-25 03:22:52,278 INFO : [ConfigModule] - Loading configuration from /data/biocache/config/biocache-config.properties 2014-03-25 03:22:52,526 INFO : [ConfigModule] - Initialise SOLR 2014-03-25 03:22:52,531 INFO : [ConfigModule] - Initialise name matching indexes 2014-03-25 03:22:53,051 INFO : [ConfigModule] - Initialise persistence manager 2014-03-25 03:22:53,060 INFO : [ConfigModule] - Configure complete ----------------------------------------------------------------------------------------------------------------------- | name | uri | uid | |---------------------------------------------------------------------------------------------------------------------| | Teste 10 | http://192.168.15.132:8080/collectory/ws/dataResource/dr4 | dr4 | | Teste 11 | http://192.168.15.132:8080/collectory/ws/dataResource/dr0 | dr0 | | Teste 12 | http://192.168.15.132:8080/collectory/ws/dataResource/dr3 | dr3 | | Teste 2 | http://192.168.15.132:8080/collectory/ws/dataResource/dr2 | dr2 | | Teste 3 | http://192.168.15.132:8080/collectory/ws/dataResource/dr1 | dr1 | -----------------------------------------------------------------------------------------------------------------------
biocache> describe dr3 java.io.FileNotFoundException: http://192.168.15.132:8080/collectory/ws/ws/dataResource/dr3.json at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624) at java.net.URL.openStream(URL.java:1037) at scala.io.Source$.fromURL(Source.scala:140) at scala.io.Source$.fromURL(Source.scala:130) at au.org.ala.biocache.DataLoader$class.getDataResourceDetailsAsMap(dataimport.scala:99) at au.org.ala.util.Loader.getDataResourceDetailsAsMap(Loader.scala:31) at au.org.ala.biocache.DataLoader$class.retrieveConnectionParameters(dataimport.scala:116) at au.org.ala.util.Loader.retrieveConnectionParameters(Loader.scala:31) at au.org.ala.util.Loader$$anonfun$describeResource$1.apply(Loader.scala:39) at au.org.ala.util.Loader$$anonfun$describeResource$1.apply(Loader.scala:38) at scala.collection.immutable.List.foreach(List.scala:309) at au.org.ala.util.Loader.describeResource(Loader.scala:38) at au.org.ala.util.CMD$.executeCommand(CommandLineTool.scala:77) at au.org.ala.util.CommandLineTool$.main(CommandLineTool.scala:26) at au.org.ala.util.CommandLineTool.main(CommandLineTool.scala)
How could I do the loading of DwC-A files? These problems are related?
Thanks!
Cheers,
-- Daniel Lins da Silva (Cell) 55 11 96144-4050tel:55%2011%2096144-4050 Research Center on Biodiversity and Computing (Biocomp) University of Sao Paulo, Brazil daniellins@usp.brmailto:daniellins@usp.br daniel.lins@gmail.commailto:daniel.lins@gmail.com
participants (2)
-
Daniel Lins
-
David.Martin@csiro.au