[Ala-portal] DwC-A Loading and Biocache Methods.

David.Martin at csiro.au David.Martin at csiro.au
Tue Mar 25 09:29:24 CET 2014


Thanks Daniel.

What svn revision version of biocache-store are you working against? I think the problems you have with bad URLs are fixed on the current trunk. Your registryUrl property in the external configuration shouldn't have a "/ws" suffix.

As I've mentioned in previous threads, we have some work to do at the ALA to have better versioning of components and we'll be posting an update on this to this list soon.

Regarding the DWCA error - Im not sure what the issue is here. I suggest posting the meta.xml. Theres a tool here  (that I havent tested recently) for validating an archive:

http://tools.gbif.org/dwca-validator/

Cheers

Dave Martin
ALA

________________________________
From: ala-portal-bounces at lists.gbif.org [ala-portal-bounces at lists.gbif.org] on behalf of Daniel Lins [daniel.lins at gmail.com]
Sent: 25 March 2014 17:42
To: ala-portal at lists.gbif.org
Subject: [Ala-portal] DwC-A Loading and Biocache Methods.

Hi guys,

I am having troubles to load DwC-A files in Biocache store. The error occurs during the archive loading (The error message is shown below).

$ sudo java -cp .:biocache.jar au.org.ala.util.DwCALoader dr3 -l dwca-teste3.zip
2014-03-25 02:40:50,769 INFO : [ConfigModule] - Loading configuration from /data/biocache/config/biocache-config.properties
2014-03-25 02:40:51,031 INFO : [ConfigModule] - Initialise SOLR
2014-03-25 02:40:51,035 INFO : [ConfigModule] - Initialise name matching indexes
2014-03-25 02:40:51,539 INFO : [ConfigModule] - Initialise persistence manager
2014-03-25 02:40:51,541 INFO : [ConfigModule] - Configure complete
Loading archive dwca-teste3.zip for resource dr3 with unique terms List(dwc:occurrenceID) stripping spaces false incremental false testing false
Exception in thread "main" org.gbif.dwc.text.UnkownDelimitersException: Unable to detect field delimiter
at org.gbif.file.CSVReaderFactory.buildArchiveFile(CSVReaderFactory.java:129)
at org.gbif.file.CSVReaderFactory.build(CSVReaderFactory.java:46)
at org.gbif.dwc.text.ArchiveFactory.readFileHeaders(ArchiveFactory.java:344)
at org.gbif.dwc.text.ArchiveFactory.openArchive(ArchiveFactory.java:289)
at au.org.ala.util.DwCALoader.loadArchive(DwCALoader.scala:129)
at au.org.ala.util.DwCALoader.loadLocal(DwCALoader.scala:106)
at au.org.ala.util.DwCALoader$.main(DwCALoader.scala:52)
at au.org.ala.util.DwCALoader.main(DwCALoader.scala)

I used a DwC-A file with comma delimiter and another file with tab delimiter. But both files generated the same error.

During the tests, I replaced the biocache.jar file from the other .jar file sent by Natasha in a previous email (http://maven.ala.org.au/repository/au/org/ala/biocache-store/1.0-SNAPSHOT/biocache-store-1.0-SNAPSHOT-assembly.jar) but I had the same problem.

I also verified that list() and describe() methods of the Biocache are producing some errors too.

These problems may be related to the registryURL property. When registryURL= http://192.168.15.132:8080/collectory, I have the errors below:

biocache> list
2014-03-25 03:09:24,305 INFO : [ConfigModule] - Loading configuration from /data/biocache/config/biocache-config.properties
2014-03-25 03:09:24,586 INFO : [ConfigModule] - Initialise SOLR
2014-03-25 03:09:24,591 INFO : [ConfigModule] - Initialise name matching indexes
2014-03-25 03:09:25,133 INFO : [ConfigModule] - Initialise persistence manager
2014-03-25 03:09:25,143 INFO : [ConfigModule] - Configure complete
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:996)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
at sun.net.www.protocol.http.HttpURLConnection.followRedirect(HttpURLConnection.java:2398)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1557)
at java.net.URL.openStream(URL.java:1037)
at scala.io.Source$.fromURL(Source.scala:140)
at scala.io.Source$.fromURL(Source.scala:130)
at au.org.ala.util.Loader.printResourceList(Loader.scala:52)
at au.org.ala.util.CMD$.executeCommand(CommandLineTool.scala:78)
at au.org.ala.util.CommandLineTool$.main(CommandLineTool.scala:26)
at au.org.ala.util.CommandLineTool.main(CommandLineTool.scala)

biocache> describe dr3
UID: dr3
This data resource was last checked None
Protocol: DwCA
URL: http://192.168.15.132:8080/collectory/upload/1395634757631/dwca-teste3.zip
Unique terms: occurrenceID
url: http://192.168.15.132:8080/collectory/upload/1395634757631/dwca-teste3.zip
java.lang.ClassCastException: java.lang.Boolean cannot be cast to java.lang.String
at au.org.ala.util.Loader$$anonfun$describeResource$1$$anonfun$apply$1.apply(Loader.scala:45)
at au.org.ala.util.Loader$$anonfun$describeResource$1$$anonfun$apply$1.apply(Loader.scala:45)
at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224)
at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
at au.org.ala.util.Loader$$anonfun$describeResource$1.apply(Loader.scala:45)
at au.org.ala.util.Loader$$anonfun$describeResource$1.apply(Loader.scala:38)
at scala.collection.immutable.List.foreach(List.scala:309)
at au.org.ala.util.Loader.describeResource(Loader.scala:38)
at au.org.ala.util.CMD$.executeCommand(CommandLineTool.scala:77)
at au.org.ala.util.CommandLineTool$.main(CommandLineTool.scala:26)
at au.org.ala.util.CommandLineTool.main(CommandLineTool.scala)

When registryURL= http://192.168.15.132:8080/collectory/ws, I have the below errors :

biocache> list
2014-03-25 03:22:52,278 INFO : [ConfigModule] - Loading configuration from /data/biocache/config/biocache-config.properties
2014-03-25 03:22:52,526 INFO : [ConfigModule] - Initialise SOLR
2014-03-25 03:22:52,531 INFO : [ConfigModule] - Initialise name matching indexes
2014-03-25 03:22:53,051 INFO : [ConfigModule] - Initialise persistence manager
2014-03-25 03:22:53,060 INFO : [ConfigModule] - Configure complete
 -----------------------------------------------------------------------------------------------------------------------
 | name                                              | uri                                                       | uid |
 |---------------------------------------------------------------------------------------------------------------------|
 | Teste 10   | http://192.168.15.132:8080/collectory/ws/dataResource/dr4 | dr4 |
 | Teste 11 | http://192.168.15.132:8080/collectory/ws/dataResource/dr0 | dr0 |
 | Teste 12                                       | http://192.168.15.132:8080/collectory/ws/dataResource/dr3 | dr3 |
 | Teste 2                                           | http://192.168.15.132:8080/collectory/ws/dataResource/dr2 | dr2 |
 | Teste 3                                           | http://192.168.15.132:8080/collectory/ws/dataResource/dr1 | dr1 |
 -----------------------------------------------------------------------------------------------------------------------

biocache> describe dr3
java.io.FileNotFoundException: http://192.168.15.132:8080/collectory/ws/ws/dataResource/dr3.json
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624)
at java.net.URL.openStream(URL.java:1037)
at scala.io.Source$.fromURL(Source.scala:140)
at scala.io.Source$.fromURL(Source.scala:130)
at au.org.ala.biocache.DataLoader$class.getDataResourceDetailsAsMap(dataimport.scala:99)
at au.org.ala.util.Loader.getDataResourceDetailsAsMap(Loader.scala:31)
at au.org.ala.biocache.DataLoader$class.retrieveConnectionParameters(dataimport.scala:116)
at au.org.ala.util.Loader.retrieveConnectionParameters(Loader.scala:31)
at au.org.ala.util.Loader$$anonfun$describeResource$1.apply(Loader.scala:39)
at au.org.ala.util.Loader$$anonfun$describeResource$1.apply(Loader.scala:38)
at scala.collection.immutable.List.foreach(List.scala:309)
at au.org.ala.util.Loader.describeResource(Loader.scala:38)
at au.org.ala.util.CMD$.executeCommand(CommandLineTool.scala:77)
at au.org.ala.util.CommandLineTool$.main(CommandLineTool.scala:26)
at au.org.ala.util.CommandLineTool.main(CommandLineTool.scala)

How could I do the loading of DwC-A files? These problems are related?

Thanks!

Cheers,


--
Daniel Lins da Silva
(Cell) 55 11 96144-4050<tel:55%2011%2096144-4050>
Research Center on Biodiversity and Computing (Biocomp)
University of Sao Paulo, Brazil
daniellins at usp.br<mailto:daniellins at usp.br>
daniel.lins at gmail.com<mailto:daniel.lins at gmail.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.gbif.org/pipermail/ala-portal/attachments/20140325/9a730313/attachment-0001.html 


More information about the Ala-portal mailing list