[Ala-portal] DwC-A loading problems

Daniel Lins daniel.lins at gmail.com
Wed Jun 18 07:54:52 CEST 2014


Hi Dave,

How can I update the Biocache-1.0-SNAPSHOT to the version 1.1? I updated
the biocache-store (biocache.jar) and the config file
(/data/biocache/conf/config.properties-biocache) but I still have problems.
Which other steps I need  to do? Apparently this new version of the
biocache configuration file generates impacts directly in my
Biocache-Services and Solr.

This update will generate some impacts in other components?

I cannot use the installation process based on the Vagrant/Ansible because
our environment is different and already have customizations. So I would
like to update the biocache with minimum impact, if possible. After we will
have to plan the update of the other components.

Can you advise me as to the best way forward?

Thanks!!

Regards,

Daniel Lins da Silva
(Mobile) 55 11 96144-4050
Research Center on Biodiversity and Computing (Biocomp)
University of Sao Paulo, Brazil
daniellins at usp.br
daniel.lins at gmail.com



2014-05-26 3:58 GMT-03:00 <David.Martin at csiro.au>:

>  Thanks Daniel.
>
>  I'd recommend upgrading to 1.1 and I'd recommend installation with the
> ansible scripts. This will give you base line configuration.
> The scripts can be tested on a local machine using vagrant.
> The configuration between 1.0 and 1.1 changed significantly - removal of
> redundant, legacy properties, adoption of standard format for property
> names.
> Heres the template used for the configuration file in the ansible scripts:
>
>
> https://github.com/gbif/ala-install/blob/master/ansible/roles/biocache-service/templates/config/biocache-config.properties
>
>  Cheers
>
>  Dave Martin
> ALA
>
>  ------------------------------
> *From:* Daniel Lins [daniel.lins at gmail.com]
> *Sent:* 26 May 2014 15:02
> *To:* Martin, Dave (CES, Black Mountain)
> *Cc:* ala-portal at lists.gbif.org; dos Remedios, Nick (CES, Black
> Mountain); Pedro Corrêa; Nicholls, Miles (CES, Black Mountain)
> *Subject:* Re: [Ala-portal] DwC-A loading problems
>
>   Hi Dave,
>
>  When I ran the ingest command (ingest dr0), the system showed errors
> like these below. However, after the error messages, I ran the index
> command (index dr0), and data were published on the Portal.
>
>  2014-05-20 14:15:05,412 ERROR: [Grid] - cannot find GRID: /data/ala
> /data/layers/ready/diva/worldclim_bio_19
> 2014-05-20 14:15:05,414 ERROR: [Grid] - java.io.FileNotFoundException:
> /data/ala/data/layers/ready/diva/worldclim_bio_19.gri (No such file or
> directory)
> at java.io.RandomAccessFile.open(Native Method)
> at java.io.RandomAccessFile.<init>(RandomAccessFile.java:241)
> at java.io.RandomAccessFile.<init>(RandomAccessFile.java:122)
> at org.ala.layers.intersect.Grid.getValues3(Grid.java:1017)
> at org.ala.layers.intersect.SamplingThread.intersectGrid(
> SamplingThread.java:112)
> at org.ala.layers.intersect.SamplingThread.sample(SamplingThread.java:97)
> at org.ala.layers.intersect.SamplingThread.run(SamplingThread.java:67)
>
>  2014-05-20 14:15:05,447 INFO : [Sampling] - ********* END - TEST BATCH
> SAMPLING FROM FILE ***************
> 2014-05-20 14:15:05,496 INFO : [Sampling] - Finished loading:
> /tmp/sampling-dr0.txt in 49ms
> 2014-05-20 14:15:05,496 INFO : [Sampling] - Removing temporary file:
> /tmp/sampling-dr0.txt
> 2014-05-20 14:15:05,553 INFO : [Consumer] - Initialising thread: 0
> 2014-05-20 14:15:05,575 INFO : [Consumer] - Initialising thread: 1
> 2014-05-20 14:15:05,575 INFO : [Consumer] - Initialising thread: 2
> 2014-05-20 14:15:05,577 INFO : [Consumer] - In thread: 0
> 2014-05-20 14:15:05,579 INFO : [Consumer] - Initialising thread: 3
> 2014-05-20 14:15:05,579 INFO : [ProcessWithActors] - Starting with dr0|
> endingwith dr0|~
> 2014-05-20 14:15:05,581 INFO : [Consumer] - In thread: 2
> 2014-05-20 14:15:05,581 INFO : [Consumer] - In thread: 1
> 2014-05-20 14:15:05,584 INFO : [Consumer] - In thread: 3
> 2014-05-20 14:15:05,592 INFO : [ProcessWithActors] - Initialised actors...
> 2014-05-20 14:15:05,647 INFO : [ProcessWithActors] - First rowKey
> processed: dr0|urn:lsid:icmbio.gov.br:icmbio.parnaso.occurrence:MA120999
> 2014-05-20 14:15:05,998 INFO : [ProcessWithActors] - Last row key
> processed: dr0|urn:lsid:icmbio.gov.br:icmbio.parnaso.occurrence:MA99991
> 2014-05-20 14:15:06,006 INFO : [ProcessWithActors] - Finished.
> 2014-05-20 14:15:06,015 INFO : [AttributionDAO] - Calling web service for
> dr0
> 2014-05-20 14:15:06,017 INFO : [Consumer] - Killing (Actor.act) thread: 3
> 2014-05-20 14:15:06,016 INFO : [Consumer] - Killing (Actor.act) thread: 2
> 2014-05-20 14:15:06,015 INFO : [Consumer] - Killing (Actor.act) thread: 1
> 2014-05-20 14:15:06,289 INFO : [AttributionDAO] - Looking up collectory
> web service for ICMBIO|PARNASO
> May 20, 2014 2:15:10 PM org.geotools.referencing.factory.epsg.ThreadedEpsgFactory
> <init>
> INFO: Setting the EPSG factory org.geotools.referencing.factory.epsg.DefaultFactory
> to a 1800000ms timeout
> May 20, 2014 2:15:10 PM org.geotools.referencing.factory.epsg.ThreadedEpsgFactory
> <init>
> INFO: Setting the EPSG factory org.geotools.referencing.factory.epsg.ThreadedHsqlEpsgFactory
> to a 1800000ms timeout
> May 20, 2014 2:15:10 PM org.geotools.referencing.factory.epsg.ThreadedHsqlEpsgFactory
> createDataSource
> INFO: Building new data source for org.geotools.referencing.factory.epsg.
> ThreadedHsqlEpsgFactory
> May 20, 2014 2:15:10 PM org.geotools.referencing.factory.epsg.ThreadedHsqlEpsgFactory
> createBackingStore
> INFO: Building backing store for org.geotools.referencing.factory.epsg.
> ThreadedHsqlEpsgFactory
> 2014-05-20 14:15:32,105 INFO : [Consumer] - Killing (Actor.act) thread: 0
> Indexing live with URL: null, and params: null&dataResource=dr0
> java.lang.NullPointerException
> at
> au.org.ala.util.CMD$.au$org$ala$util$CMD$$indexDataResourceLive$1(CommandLineTool.scala:371)
> at
> au.org.ala.util.CMD$$anonfun$executeCommand$2.apply(CommandLineTool.scala:90)
> at
> au.org.ala.util.CMD$$anonfun$executeCommand$2.apply(CommandLineTool.scala:86)
> at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.
> scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:105)
> at au.org.ala.util.CMD$.executeCommand(CommandLineTool.scala:86)
> at au.org.ala.util.CommandLineTool$.main(CommandLineTool.scala:26)
> at au.org.ala.util.CommandLineTool.main(CommandLineTool.scala)
>
>
>
>  Nowadays we use the* biocache-1.0-SNAPSHOT* in our environment. But in
> your last mail you mentioned the version *biocache-1.1-assembly*.
>
>  I did download of this newer version, but when I ran it (ingest dr0) in
> our environment the system showed many errors (see below).
>
>
>  log4j:WARN custom level class [org.ala.client.appender.RestLevel] not
> found.
>   Exception in thread "main" java.lang.ExceptionInInitializerError
> at au.org.ala.biocache.load.DataLoader$class.$init$(DataLoader.scala:28)
> at au.org.ala.biocache.load.Loader.<init>(Loader.scala:34)
> at au.org.ala.biocache.cmd.CMD$.executeCommand(CMD.scala:29)
> at au.org.ala.biocache.cmd.CommandLineTool$.main(CommandLineTool.scala:22)
> at au.org.ala.biocache.cmd.CommandLineTool.main(CommandLineTool.scala)
> Caused by: com.google.inject.CreationException: Guice creation errors:
>
>  1) No implementation for java.lang.Integer annotated with @com.google.
> inject.name.Named(value=cassandra.max.connections) was bound.
>   while locating java.lang.Integer annotated with @com.google.inject.name.
> Named(value=cassandra.max.connections)
>     for parameter 4 at
> au.org.ala.biocache.persistence.CassandraPersistenceManager.<init
> >(CassandraPersistenceManager.scala:24)
>   at au.org.ala.biocache.ConfigModule.configure(Config.scala:184)
>
>  2) No implementation for java.lang.Integer annotated with @com.google.
> inject.name.Named(value=cassandra.max.retries) was bound.
>   while locating java.lang.Integer annotated with @com.google.inject.name.
> Named(value=cassandra.max.retries)
>     for parameter 5 at
> au.org.ala.biocache.persistence.CassandraPersistenceManager.<init
> >(CassandraPersistenceManager.scala:24)
>   at au.org.ala.biocache.ConfigModule.configure(Config.scala:184)
>
>  3) No implementation for java.lang.Integer annotated with @com.google.
> inject.name.Named(value=cassandra.port) was bound.
>   while locating java.lang.Integer annotated with @com.google.inject.name.
> Named(value=cassandra.port)
>     for parameter 1 at
> au.org.ala.biocache.persistence.CassandraPersistenceManager.<init
> >(CassandraPersistenceManager.scala:24)
>   at au.org.ala.biocache.ConfigModule.configure(Config.scala:184)
>
>  4) No implementation for java.lang.Integer annotated with @com.google.
> inject.name.Named(value=thrift.operation.timeout) was bound.
>   while locating java.lang.Integer annotated with @com.google.inject.name.
> Named(value=thrift.operation.timeout)
>     for parameter 6 at
> au.org.ala.biocache.persistence.CassandraPersistenceManager.<init
> >(CassandraPersistenceManager.scala:24)
>   at au.org.ala.biocache.ConfigModule.configure(Config.scala:184)
>
>  5) No implementation for java.lang.String annotated with @com.google.
> inject.name.Named(value=cassandra.hosts) was bound.
>   while locating java.lang.String annotated with @com.google.inject.name.
> Named(value=cassandra.hosts)
>     for parameter 0 at
> au.org.ala.biocache.persistence.CassandraPersistenceManager.<init
> >(CassandraPersistenceManager.scala:24)
>   at au.org.ala.biocache.ConfigModule.configure(Config.scala:184)
>
>  6) No implementation for java.lang.String annotated with @com.google.
> inject.name.Named(value=cassandra.keyspace) was bound.
>   while locating java.lang.String annotated with @com.google.inject.name.
> Named(value=cassandra.keyspace)
>     for parameter 3 at
> au.org.ala.biocache.persistence.CassandraPersistenceManager.<init
> >(CassandraPersistenceManager.scala:24)
>   at au.org.ala.biocache.ConfigModule.configure(Config.scala:184)
>
>  7) No implementation for java.lang.String annotated with @com.google.
> inject.name.Named(value=cassandra.pool) was bound.
>   while locating java.lang.String annotated with @com.google.inject.name.
> Named(value=cassandra.pool)
>     for parameter 2 at
> au.org.ala.biocache.persistence.CassandraPersistenceManager.<init
> >(CassandraPersistenceManager.scala:24)
>   at au.org.ala.biocache.ConfigModule.configure(Config.scala:184)
>
>  8) No implementation for java.lang.String annotated with @com.google.
> inject.name.Named(value=exclude.sensitive.values) was bound.
>   while locating java.lang.String annotated with @com.google.inject.name.
> Named(value=exclude.sensitive.values)
>     for parameter 1 at au.org.ala.biocache.index.SolrIndexDAO.<init
> >(SolrIndexDAO.scala:28)
>   at au.org.ala.biocache.ConfigModule.configure(Config.scala:164)
>
>  9) No implementation for java.lang.String annotated with @com.google.
> inject.name.Named(value=extra.misc.fields) was bound.
>   while locating java.lang.String annotated with @com.google.inject.name.
> Named(value=extra.misc.fields)
>     for parameter 2 at au.org.ala.biocache.index.SolrIndexDAO.<init
> >(SolrIndexDAO.scala:28)
>   at au.org.ala.biocache.ConfigModule.configure(Config.scala:164)
>
>  10) No implementation for java.lang.String annotated with @com.google.
> inject.name.Named(value=solr.home) was bound.
>   while locating java.lang.String annotated with @com.google.inject.name.
> Named(value=solr.home)
>     for parameter 0 at au.org.ala.biocache.index.SolrIndexDAO.<init
> >(SolrIndexDAO.scala:28)
>   at au.org.ala.biocache.ConfigModule.configure(Config.scala:164)
>
>  10 errors
> at com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(
> Errors.java:354)
> at com.google.inject.InjectorBuilder.initializeStatically(
> InjectorBuilder.java:152)
> at com.google.inject.InjectorBuilder.build(InjectorBuilder.java:105)
> at com.google.inject.Guice.createInjector(Guice.java:92)
> at com.google.inject.Guice.createInjector(Guice.java:69)
> at com.google.inject.Guice.createInjector(Guice.java:59)
> at au.org.ala.biocache.Config$.<init>(Config.scala:24)
> at au.org.ala.biocache.Config$.<clinit>(Config.scala)
> ... 5 more
>
>
>  Related to these specific issues (data update and incremental load), I
> will need to upgrade the biocache version (1.1 or newer) or I could work
> with the version 1.0-SNAPSHOT? If I update this version, I will have
> compatibility with the other components? How should I proceed?
>
> Which layer files should I include in my environment to run these tests?
>
>  Thanks!
>
>  Regards,
>
>   Daniel Lins da Silva
> (Mobile) 55 11 96144-4050
>  Research Center on Biodiversity and Computing (Biocomp)
> University of Sao Paulo, Brazil
>  daniellins at usp.br
> daniel.lins at gmail.com
>
>
> 2014-05-09 2:14 GMT-03:00 <David.Martin at csiro.au>:
>
>>  Thanks Daniel.
>>
>>  I've spotted the problem:
>>
>>  java -cp .:biocache.jar au.org.ala.util.DwcCSVLoader dr0 -l
>> dataset-updated.csv -b true
>>
>>  this bypasses lookups against the collectory for the metadata.
>>
>>  To load this dataset, you can use the biocache commandline tool like so:
>>
>>   $ java -cp /usr/lib/biocache:/usr/lib/biocache/biocache
>> -store-1.1-assembly.jar -Xms2g -Xmx2g
>> au.org.ala.biocache.cmd.CommandLineTool
>>
>>
>>  ----------------------------
>>
>> | Biocache management tool |
>>
>> ----------------------------
>>
>> Please supply a command or hit ENTER to view command list.
>>
>> biocache> ingest dr8
>>
>>  This will:
>>
>>  1) Retrieve the metadata from the configured instance of the collectory
>> 2) Load, process, sample (if there are layers configured and available)
>> and index
>>
>>  Cheers
>>
>>  Dave
>>
>>   ------------------------------
>> *From:* Daniel Lins [daniel.lins at gmail.com]
>> *Sent:* 09 May 2014 14:27
>>
>> *To:* Martin, Dave (CES, Black Mountain)
>> *Cc:* ala-portal at lists.gbif.org; dos Remedios, Nick (CES, Black
>> Mountain); Pedro Corrêa; Nicholls, Miles (CES, Black Mountain)
>> *Subject:* Re: [Ala-portal] DwC-A loading problems
>>
>>    David,
>>
>>  The dr0 configuration:
>>
>>  https://www.dropbox.com/s/lsy11jadwmyghjj/collectoryConfig1.png
>>
>>  Sorry, but this server doesn't have external access yet.
>>
>>
>>
>>
>> 2014-05-09 1:06 GMT-03:00 <David.Martin at csiro.au>:
>>
>>>  As an example of what it should look like, see:
>>>
>>>
>>> http://ala-demo.gbif.org/collectory/dataResource/edit/dr8?page=contribution
>>>
>>>
>>>  ------------------------------
>>> *From:* Daniel Lins [daniel.lins at gmail.com]
>>>
>>> *Sent:* 09 May 2014 13:59
>>> *To:* Martin, Dave (CES, Black Mountain)
>>> *Cc:* ala-portal at lists.gbif.org; dos Remedios, Nick (CES, Black
>>> Mountain); Pedro Corrêa; Nicholls, Miles (CES, Black Mountain)
>>>  *Subject:* Re: [Ala-portal] DwC-A loading problems
>>>
>>>    Thanks David,
>>>
>>>  We use the DwC term "occurrenceID" to identify the records. It's a
>>> unique key.
>>>
>>>  However, when I reload a dataset to update some DwC terms of the
>>> records, the system duplicates this data (keeps the old record and creates
>>> another with changes).
>>>
>>>  For instance (update of locality).
>>>
>>>  Load 1 ($ java -cp .:biocache.jar au.org.ala.util.DwcCSVLoader dr0 -l
>>> dataset.csv -b true)
>>>
>>>  {OccurrenceID: 1, municipality: Sao Paulo, ...},
>>> {OccurrenceID: 2, municipality: Sao Paulo, ...}
>>>
>>>  Process 1 (biocache$ process dr0)
>>> Index 1 (biocache$ index dr0)
>>>
>>>  Load 2 (updated records and new records) (($ java -cp .:biocache.jar
>>> au.org.ala.util.DwcCSVLoader dr0 -l dataset-updated.csv -b true)
>>>
>>>  {OccurrenceID: 1, municipality: Rio de Janeiro, ...},
>>> {OccurrenceID: 2, municipality: Rio de Janeiro, ...},
>>>  {OccurrenceID: 3, municipality: Sao Paulo, ...}
>>>
>>>  Process 2 (biocache$ process dr0)
>>> Index 2 (biocache$ index dr0)
>>>
>>>  Results shown by ALA:
>>>
>>>  {OccurrenceID: 1, municipality: Sao Paulo, ...},
>>> {OccurrenceID: 2, municipality: Sao Paulo, ...},
>>>   {OccurrenceID: 1, municipality: Rio de Janeiro, ...},
>>> {OccurrenceID: 2, municipality: Rio de Janeiro, ...}
>>>  {OccurrenceID: 3, municipality: Sao Paulo, ...}
>>>
>>>  But I expected:
>>>
>>>  {OccurrenceID: 1, municipality: Rio de Janeiro, ...},
>>>  {OccurrenceID: 2, municipality: Rio de Janeiro, ...}
>>>  {OccurrenceID: 3, municipality: Sao Paulo, ...}
>>>
>>>  I need to delete (delete-resource function) existing data before the
>>> reload? If no, what I did wrong to generate this data duplication?
>>>
>>>  Thanks!
>>>
>>>
>>>  Regards,
>>>
>>>   Daniel Lins da Silva
>>>  (Mobile) 55 11 96144-4050
>>>  Research Center on Biodiversity and Computing (Biocomp)
>>> University of Sao Paulo, Brazil
>>>  daniellins at usp.br
>>>  daniel.lins at gmail.com
>>>
>>>
>>>
>>>
>>>
>>> 2014-05-07 0:46 GMT-03:00 <David.Martin at csiro.au>:
>>>
>>>>   Thanks Daniel. Natasha has now left the ALA.
>>>>
>>>>  The uniqueness of records is determined by information stored in the
>>>> collectory. See screenshot [1].
>>>> By default, "catalogNumber" is used but you can change this to any
>>>> number of fields that should be stable in the data.
>>>> Using unstable fields for the ID isn't recommended (e.g. scientificName).
>>>>  To update the records, the process is to just re-load the dataset.
>>>>
>>>>  Automatically loaded - this isnt in use and we may remove from the UI
>>>> in future iterations.
>>>> Incremental Load - affects the sample/process/index steps to only run
>>>> these against the new records.  Load is always incremental based on the key
>>>> field(s) but if the incremental load box isn’t checked it runs the
>>>> sample/process/index steps against the whole data set. This can cause a
>>>> large processing overhead when there’s a minor update to a large data set.
>>>>
>>>>  Cheers
>>>>
>>>>  Dave Martin
>>>>  ALA
>>>>
>>>>  [1] http://bit.ly/1g72HFN
>>>>
>>>>  ------------------------------
>>>> *From:* Daniel Lins [daniel.lins at gmail.com]
>>>> *Sent:* 05 May 2014 15:39
>>>> *To:* Quimby, Natasha (CES, Black Mountain)
>>>> *Cc:* ala-portal at lists.gbif.org; dos Remedios, Nick (CES, Black
>>>> Mountain); Martin, Dave (CES, Black Mountain); Pedro Corrêa
>>>> *Subject:* Re: [Ala-portal] DwC-A loading problems
>>>>
>>>>     Hi Natasha,
>>>>
>>>>  I managed to import the DwC-A file following the steps reported in
>>>> the previous email. Thank you!
>>>>
>>>>  However, when I tried to update some metadata of an occurrence record
>>>> (already stored in the database), the system created a new record with
>>>> these duplicated information. So I started to have several records with the
>>>> same occurrenceID (I did set in the data resource configuration to use
>>>> "OcurrenceID" to uniquely identify a record).
>>>>
>>>>  How can I update existing records in the database? For instance, the
>>>> location's metadata of an occurrence record stored in my database?
>>>>
>>>>  I also would like to better understand the behavior of the properties
>>>> "Automatically loaded" and "Incremental Load".
>>>>
>>>>  Thanks!!
>>>>
>>>>  Regards,
>>>>
>>>>   Daniel Lins da Silva
>>>>  (Mobile) 55 11 96144-4050
>>>>  Research Center on Biodiversity and Computing (Biocomp)
>>>> University of Sao Paulo, Brazil
>>>>  daniellins at usp.br
>>>>  daniel.lins at gmail.com
>>>>
>>>>
>>>> 2014-04-28 3:52 GMT-03:00 Daniel Lins <daniel.lins at gmail.com>:
>>>>
>>>>> Thanks Natasha!
>>>>>
>>>>>  I will try your recommendations. Once finished, I will contact you.
>>>>>
>>>>>  Regards
>>>>>
>>>>>  Daniel Lins da Silva
>>>>>  (Mobile) 55 11 96144-4050
>>>>>  Research Center on Biodiversity and Computing (Biocomp)
>>>>> University of Sao Paulo, Brazil
>>>>>  daniellins at usp.br
>>>>>  daniel.lins at gmail.com
>>>>>
>>>>>
>>>>>
>>>>>  2014-04-28 3:26 GMT-03:00 <Natasha.Quimby at csiro.au>:
>>>>>
>>>>>  Hi Daniel,
>>>>>>
>>>>>>  When you specify a local DwcA Load the archive needs to be
>>>>>> unzipped. Try unzipping *2f676abc-4503-489e-8f0c-fcb6e1bc554b.zip *and
>>>>>> then running the following:
>>>>>>  s*udo** java -cp .:biocache.jar au.org.ala.util.DwCALoader dr7 -l
>>>>>> /data/collectory/upload/1398658607824/2f676abc-4503-489e-8f0c-fcb6e1bc554b*
>>>>>>
>>>>>>  If you configure the collectory to provide the dwca the biocache
>>>>>> automatically unzips the archive for you.  You would need to configure dr7
>>>>>> with the following connection parameters:
>>>>>>
>>>>>>  "protocol":"DwCA"
>>>>>> "termsForUniqueKey":["occurrenceID"],
>>>>>> "url":"file:////data/collectory/upload/
>>>>>> 1398658607824/2f676abc-4503-489e-8f0c-fcb6e1bc554b.zip"
>>>>>>
>>>>>>  You could then load the resource by:
>>>>>>  s*udo** java -cp .:biocache.jar au.org.ala.util.DwCALoader dr7*
>>>>>>
>>>>>>  If you continue to have issues please let us know.
>>>>>>
>>>>>>  Hope that this helps.
>>>>>>
>>>>>>  Regards
>>>>>> Natasha
>>>>>>
>>>>>>   From: Daniel Lins <daniel.lins at gmail.com>
>>>>>> Date: Monday, 28 April 2014 3:54 PM
>>>>>> To: "ala-portal at lists.gbif.org" <ala-portal at lists.gbif.org
>>>>>> <ala-portal at lists.gbif.org> <ala-portal at lists.gbif.org>
>>>>>> <ala-portal at lists.gbif.org> <ala-portal at lists.gbif.org>
>>>>>> <ala-portal at lists.gbif.org> <ala-portal at lists.gbif.org>>, "dos
>>>>>> Remedios, Nick (CES, Black Mountain)" <Nick.Dosremedios at csiro.au>,
>>>>>> "Martin, Dave (CES, Black Mountain)" <David.Martin at csiro.au>
>>>>>> Subject: [Ala-portal] DwC-A loading problems
>>>>>>
>>>>>>   Hi Nick and Dave,
>>>>>>
>>>>>>  We are having some problems in Biocache during the upload of DwC-A
>>>>>> files.
>>>>>>
>>>>>>  As shown below, after run the method "au.org.ala.util.DwCALoader",
>>>>>> our system returns the error message "Exception in thread "main" org.
>>>>>> gbif.dwc.text.UnkownDelimitersException: Unable to detect field
>>>>>> delimiter"
>>>>>>
>>>>>>  I accomplished tests using DwC-A files with tab-delimited text
>>>>>> files and comma-delimited text files. In both cases the error generated was
>>>>>> the same.
>>>>>>
>>>>>>  What causes these problems? (** CSV Loader works great)
>>>>>>
>>>>>>  *tab-delimited file test*
>>>>>>
>>>>>>  poliusp at poliusp-VirtualBox:~/dev/biocache$ s*udo java -cp
>>>>>> .:biocache.jar au.org.ala.util.DwCALoader dr7 -l
>>>>>> /data/collectory/upload/1398658607824/2f676abc-4503-489e-8f0c-fcb6e1bc554b.zip*
>>>>>> 2014-04-28 01:44:02,837 INFO : [ConfigModule] - Loading
>>>>>> configuration from /data/biocache/config/biocache-config.properties
>>>>>> 2014-04-28 01:44:03,090 INFO : [ConfigModule] - Initialise SOLR
>>>>>> 2014-04-28 01:44:03,103 INFO : [ConfigModule] - Initialise name
>>>>>> matching indexes
>>>>>> 2014-04-28 01:44:03,605 INFO : [ConfigModule] - Initialise
>>>>>> persistence manager
>>>>>> 2014-04-28 01:44:03,606 INFO : [ConfigModule] - Configure complete
>>>>>> Loading archive /data/collectory
>>>>>> /upload/1398658607824/2f676abc-4503-489e-8f0c-fcb6e1bc554b.zip for
>>>>>> resource dr7 with unique terms List(dwc:occurrenceID) stripping
>>>>>> spaces false incremental false testing false
>>>>>> *Exception in thread "main"
>>>>>> org.gbif.dwc.text.UnkownDelimitersException: Unable to detect field
>>>>>> delimiter*
>>>>>>         at org.gbif.file.CSVReaderFactory.buildArchiveFile(
>>>>>> CSVReaderFactory.java:129)
>>>>>>         at org.gbif.file.CSVReaderFactory.build(CSVReaderFactory.java
>>>>>> :46)
>>>>>>         at org.gbif.dwc.text.ArchiveFactory.readFileHeaders(
>>>>>> ArchiveFactory.java:344)
>>>>>>         at org.gbif.dwc.text.ArchiveFactory.openArchive(
>>>>>> ArchiveFactory.java:289)
>>>>>>         at
>>>>>> au.org.ala.util.DwCALoader.loadArchive(DwCALoader.scala:129)
>>>>>>         at au.org.ala.util.DwCALoader.loadLocal(DwCALoader.scala:106)
>>>>>>         at au.org.ala.util.DwCALoader$.main(DwCALoader.scala:52)
>>>>>>         at au.org.ala.util.DwCALoader.main(DwCALoader.scala)
>>>>>>
>>>>>>
>>>>>>  *comma-delimited file test*
>>>>>>
>>>>>>  poliusp at poliusp-VirtualBox:~/dev/biocache$ *sudo java -cp
>>>>>> .:biocache.jar au.org.ala.util.DwCALoader dr7 -l ./dwca-teste3.zip*
>>>>>> 2014-04-28 01:56:04,683 INFO : [ConfigModule] - Loading
>>>>>> configuration from /data/biocache/config/biocache-config.properties
>>>>>> 2014-04-28 01:56:04,940 INFO : [ConfigModule] - Initialise SOLR
>>>>>> 2014-04-28 01:56:04,951 INFO : [ConfigModule] - Initialise name
>>>>>> matching indexes
>>>>>> 2014-04-28 01:56:05,437 INFO : [ConfigModule] - Initialise
>>>>>> persistence manager
>>>>>> 2014-04-28 01:56:05,438 INFO : [ConfigModule] - Configure complete
>>>>>> Loading archive ./dwca-teste3.zip for resource dr7 with unique terms
>>>>>> List(dwc:occurrenceID) stripping spaces false incremental false
>>>>>> testing false
>>>>>> *Exception in thread "main"
>>>>>> org.gbif.dwc.text.UnkownDelimitersException: Unable to detect field
>>>>>> delimiter*
>>>>>>         at org.gbif.file.CSVReaderFactory.buildArchiveFile(
>>>>>> CSVReaderFactory.java:129)
>>>>>>         at org.gbif.file.CSVReaderFactory.build(CSVReaderFactory.java
>>>>>> :46)
>>>>>>         at org.gbif.dwc.text.ArchiveFactory.readFileHeaders(
>>>>>> ArchiveFactory.java:344)
>>>>>>         at org.gbif.dwc.text.ArchiveFactory.openArchive(
>>>>>> ArchiveFactory.java:289)
>>>>>>         at
>>>>>> au.org.ala.util.DwCALoader.loadArchive(DwCALoader.scala:129)
>>>>>>         at au.org.ala.util.DwCALoader.loadLocal(DwCALoader.scala:106)
>>>>>>         at au.org.ala.util.DwCALoader$.main(DwCALoader.scala:52)
>>>>>>         at au.org.ala.util.DwCALoader.main(DwCALoader.scala)
>>>>>>
>>>>>>
>>>>>>  Thanks!
>>>>>>
>>>>>>  Regards.
>>>>>> --
>>>>>>  Daniel Lins da Silva
>>>>>> (Mobile) 55 11 96144-4050
>>>>>>  Research Center on Biodiversity and Computing (Biocomp)
>>>>>> University of Sao Paulo, Brazil
>>>>>>  daniellins at usp.br
>>>>>> daniel.lins at gmail.com
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>  --
>>>>> Daniel Lins da Silva
>>>>> (Cel) 11 6144-4050
>>>>> daniel.lins at gmail.com
>>>>>
>>>>
>>>>
>>>>
>>>>  --
>>>> Daniel Lins da Silva
>>>> (Cel) 11 6144-4050
>>>> daniel.lins at gmail.com
>>>>
>>>
>>>
>>>
>>>  --
>>> Daniel Lins da Silva
>>> (Cel) 11 6144-4050
>>> daniel.lins at gmail.com
>>>
>>
>>
>>
>>  --
>> Daniel Lins da Silva
>> (Cel) 11 6144-4050
>> daniel.lins at gmail.com
>>
>
>
>
>  --
> Daniel Lins da Silva
> (Cel) 11 6144-4050
> daniel.lins at gmail.com
>



-- 
Daniel Lins da Silva
(Cel) 11 6144-4050
daniel.lins at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.gbif.org/pipermail/ala-portal/attachments/20140618/b2cbfd9b/attachment-0001.html 


More information about the Ala-portal mailing list