[Ala-portal] Names Generator Issues

Allan Koch allan.kv at gmail.com
Tue Mar 18 14:44:49 CET 2014


Hi Natasha,

That´s great. Thank you very much.
We are waiting for the DwC-A with CoL list for we merging our names.

We are excited with this news.
Thank you again,

Allan Koch Veiga

Núcleo de Pesquisa em Biodiversidade e Computação - BioComp
Laboratório de Automação Agrícola - LAA
Depto. de Engenharia de Computação e Sistemas Digitais - PCS
Engenharia Elétrica - Escola Politécnica da USP
Celular: +55 11 8401-2277
Email: allan.kv at usp.br

"*Stay hungry, stay foolish.*" Stewart Brand


2014-03-18 3:03 GMT-03:00 <Natasha.Quimby at csiro.au>:

>  Hi Allan,
>
>  We would provide a DwCA with the Catalogue of Life species in it.  Yes,
> you will need to add your species to this file in the same format. If you
> want your species to be merged into the Catalogue of life hierarchy you
> will need to provide appropriate parentIds.
>
>  We would provide a tool within the ala-name-matching (available as a jar
> file in our maven repository) to generate a list based on a DwCA. You would
> need to run the tool pointing at your modified DwCA.
>
>  We will let you know when this is available.
>
>  Hope that this all makes sense.
>
>  Regards
> Natasha
>
>   From: Allan Koch <allan.kv at gmail.com>
> Date: Tuesday, 18 March 2014 7:34 AM
> To: Natasha Carter <natasha.quimby at csiro.au>
> Cc: "Martin, Dave (CES, Black Mountain)" <David.Martin at csiro.au>, "
> Ala-portal at lists.gbif.org" <Ala-portal at lists.gbif.org>
>
> Subject: Re: [Ala-portal] Names Generator Issues
>
>    Hi Natasha,
>
>  It would be great. We have 109 species that we need to include in the
> current name matching index.
>
>  It would be great if you send me a DwC-A with CoL names. I will need
> just to add these 109 species in that archive in the same format, right?
>
>  How will work this proposed solution?
> Will you provide a souce code, a compiled program (JAR) or we will send to
> you the DwC-A and you will generate the Lucene index?
>
>  Thank you very much for helping.
>
>  Best regards,
>
>  Allan Koch Veiga
>
> Núcleo de Pesquisa em Biodiversidade e Computação - BioComp
> Laboratório de Automação Agrícola - LAA
> Depto. de Engenharia de Computação e Sistemas Digitais - PCS
> Engenharia Elétrica - Escola Politécnica da USP
> Celular: +55 11 8401-2277
> Email: allan.kv at usp.br
>
> "*Stay hungry, stay foolish.*" Stewart Brand
>
>
> 2014-03-17 3:12 GMT-03:00 <Natasha.Quimby at csiro.au>:
>
>>  Hi Allan,
>>
>>  The ala-name-generator is useful if you want to use the Australian
>> National Species list as the main source for your namematching index.  We
>> would not suggest using this to supplement the name matching index with
>> additional species.
>>
>>  In order to support custom species lists we are planning an enhancement
>> to generate the namematching index from a DarwinCore Archive. We would
>> envision that all the species would be provided as a single DWCA with the
>> attached meta.xml.   We think that this could be achieved in the 1-2 week
>> window that you mentioned.  We could provide a DWCA which contains
>> Catalogue of Life  as a basis for you to start with. You can then add
>> additional names to the DWCA as you please. Do you think that this would
>> suit you needs?
>>
>>  Regards
>> Natasha
>>
>>  From: Allan Koch <allan.kv at gmail.com>
>> Date: Saturday, 15 March 2014 4:19 AM
>> To: "Martin, Dave (CES, Black Mountain)" <David.Martin at csiro.au>
>>
>> Cc: "Ala-portal at lists.gbif.org" <Ala-portal at lists.gbif.org>
>> Subject: Re: [Ala-portal] Names Generator Issues
>>
>>     I thank you for the quick answer David and Tim .
>>
>> We have studied the process to create the namematching index based on the
>> National List of Australia and I see that reproducing the same process to
>> create a new National List is quite complex.
>>
>> But, for now, we just have the demand to include some names that aren´t
>> included in current namematching index.
>>
>> If we understood, we need at first, run the Names Generator with the
>> input of a set of CSVs from APNI, APC e AFD. Based on the output of the
>> Names Generator we run the Name Mathcing for creating the LUCENE index,
>> right?
>>
>> If it´s right, it would be great If we could execute this same standard
>> process, but with de input CSVs modified, with our set of names included in
>> these CSVs (in the same format).
>>
>> In the future (after this 3 months) we can study the possibility to
>> generate our complete National List .
>>
>> But for now, we need to include a set of names in the namematching index.
>> It could be possible to be realized in a short time, in one or two weeks?
>>
>>  Best regards,
>>
>>  Allan Koch Veiga
>>
>> Núcleo de Pesquisa em Biodiversidade e Computação - BioComp
>> Laboratório de Automação Agrícola - LAA
>> Depto. de Engenharia de Computação e Sistemas Digitais - PCS
>> Engenharia Elétrica - Escola Politécnica da USP
>> Celular: +55 11 8401-2277
>> Email: allan.kv at usp.br
>>
>> "*Stay hungry, stay foolish.*" Stewart Brand
>>
>>
>> 2014-03-12 23:15 GMT-03:00 <David.Martin at csiro.au>:
>>
>>>  Thanks Allan, Paulo, Tim.
>>>
>>>  We appreciate your efforts in setting this software upm locally, and
>>> thanks for emailing the list.
>>>
>>>  1) Versioning
>>>
>>>  While we are on the track of making this software re-usable by other
>>> projects/organisations, it is still very early days. Versioning and
>>> packaging are things that we need to tackle properly in the 3 month
>>> evaulation period [2] and we are working with GBIF on the best approach
>>> here (see Tim's email regarding Ansible). To date, the ALA environment
>>> itself is the only place these components are used in production and we
>>> manage these closely ourselves. We havent had a need to tightly version
>>> components, but as other projects become reliant we need to do this
>>> properly. At this point it time, I'd recommend ignoring developments on
>>> branches within SVN.
>>>
>>>  2) ala-name-generator
>>>
>>>  We didn't anticipate that other projects would be using the
>>> ala-name-generator code at this stage (or at all), and instead would rely
>>> on the Catalogue of Life names lucene index we've produced [1]. The
>>> ala-name-generator code as it currently is isnt suitable for use outside
>>> the Australian context. It is dealing with some of the quirks of Australian
>>> species lists and merging some elements from different sources. We should
>>> have marked wikis to that effect.
>>>
>>>  That said, we appreciate the need for other projects to use their own
>>> taxonomic checklists. This was something I'd hope we tackle in the 3 month
>>> evaluation period [2]. There's a few of potential approaches here we are
>>> exploring and we'll email this list soon with some progress on this front.
>>> I suggest in the meantime, projects make use of the existing index [1].
>>>
>>>  Thanks again,
>>>
>>>  Dave Martin
>>> ALA
>>>
>>>  [1] http://biocache.ala.org.au/archives/col_namematching.tgz
>>> [2] See GBIF's email sent 21st Feb 2014 - "Biodiversity data portals:
>>> Using the ALA tooling"
>>>
>>>
>>>  From: "Tim Robertson [GBIF]" <trobertson at gbif.org>
>>> Date: Thursday, 13 March 2014 2:16 am
>>> To: Paulo André <pfilipak at gmail.com>
>>> Cc: "Ala-portal at lists.gbif.org" <Ala-portal at lists.gbif.org>
>>> Subject: Re: [Ala-portal] Names Generator Issues
>>>
>>>  Hi Paulo
>>>
>>>  Those are all good comments - I'll make sure the ALA dev team are
>>> following those issues.
>>> As this goes forward, it is clear that code releases are going to be
>>> needed, so we get immutable binaries in nexus and tagged SVN branches.
>>>  I'll try and raise this with Dave Martin.
>>>
>>>  I'll try and follow the resolutions for the issues you log, build an
>>> artifact and verify the same results.
>>> I'm not so much into scala, but IIRC I saw that issue with another
>>> artifact.  The solution was to run this before running the command line:
>>>   jar -xf ala-names-generator-1.0-SNAPSHOT-assembly.jar lib
>>>
>>>  I found this in the way they run the biocache command line tools in:
>>>
>>> https://ala-portal.googlecode.com/svn/trunk/biocache-install/ubuntu/install.sh
>>>
>>>  It may not be the solution, but worth trying.
>>>
>>>  I hope this helps,
>>> Tim
>>>
>>>
>>>
>>>  On Mar 12, 2014, at 3:57 PM, Paulo André wrote:
>>>
>>>  Tim
>>>
>>>  I had have several issues on
>>> https://code.google.com/p/ala-portal/source/browse/#svn%2Ftrunk%2Fala-names-generator<https://code.google.com/p/ala-portal/source/browse/#svn/trunk/ala-names-generator>
>>>
>>>  I wrote on Jira: http://dev.gbif.org/issues/browse/ALA
>>>
>>>  []'s
>>> Paulo Andre Filipak
>>>
>>>
>>> 2014-03-12 11:51 GMT-03:00 Tim Robertson [GBIF] <trobertson at gbif.org>:
>>>
>>>>  Hi Allan,
>>>>
>>>>  I am sure the ALA folks will comment when they wake up.  But...
>>>>
>>>>  It doesn't appear to be published as an artifact in the ALA maven
>>>> repository:
>>>>   http://maven.ala.org.au/repository/au/org/ala/
>>>>
>>>>  You could build from source from:
>>>> https://code.google.com/p/ala-portal/source/browse/#svn%2Ftrunk%2Fala-names-generator<https://code.google.com/p/ala-portal/source/browse/#svn/trunk/ala-names-generator>
>>>>
>>>>  I presume using something along the lines of "mvn clean
>>>> assembly:assembly"
>>>>
>>>>  I hope this helps provide some options,
>>>> Tim
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>   On Mar 12, 2014, at 3:32 PM, Allan Koch wrote:
>>>>
>>>>    Does anyone knows where I can download this jar:
>>>> *ala-names-generator-1.0-SNAPSHOT-assembly.jar*?
>>>>
>>>> I´m trying to generate a new Taxon Name List based on NSL for the
>>>> Biocache processing.
>>>> This instructions has been followed:
>>>>
>>>> http://code.google.com/p/ala-portal/wiki/UpgradeALANames
>>>>
>>>> According the instructions, I need to run this command:
>>>>
>>>> java -Xmx1G -Xms1G -cp .:ala-names-generator-1.0-SNAPSHOT-assembly.jar
>>>> au.org.ala.names.NamesGenerator --all
>>>>
>>>> But, I can´t find this JAR.
>>>>
>>>>  We are trying to build the Scala Project, but we are having some
>>>> troubles.
>>>> Would help me, for while, if I could run a ready JAR.
>>>>
>>>>  Best regards,
>>>>
>>>>   Allan Koch Veiga
>>>>
>>>> Research Center on Biodiversity and Computing - BioComp
>>>> University of São Paulo
>>>>
>>>> Laboratório de Automação Agrícola - LAA
>>>> Depto. de Engenharia de Computação e Sistemas Digitais - PCS
>>>> Engenharia Elétrica - Escola Politécnica da USP
>>>> Celular: +55 11 98401-2277
>>>> Email: allan.kv at usp.br
>>>>
>>>> "*Stay hungry, stay foolish.*" Stewart Brand
>>>>     _______________________________________________
>>>> Ala-portal mailing list
>>>> Ala-portal at lists.gbif.org
>>>> http://lists.gbif.org/mailman/listinfo/ala-portal
>>>>
>>>>
>>>>
>>>> ----------------------------------------------------------------------------------------
>>>>  Tim Robertson - GBIF Head of Informatics - trobertson at gbif.org
>>>>  Global Biodiversity Information Facility http://www.gbif.org/
>>>>  GBIF Secretariat, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark
>>>>  Tel: +45 3532 1487  Mob: +45 2826 1487  Fax: +45 2875 1480
>>>> ----------------------------------------------------------------------------------------
>>>>
>>>>
>>>> _______________________________________________
>>>> Ala-portal mailing list
>>>> Ala-portal at lists.gbif.org
>>>> http://lists.gbif.org/mailman/listinfo/ala-portal
>>>>
>>>>
>>>
>>>
>>> ----------------------------------------------------------------------------------------
>>>
>>> Tim Robertson - GBIF Head of Informatics - trobertson at gbif.org
>>>
>>> Global Biodiversity Information Facility http://www.gbif.org/
>>>
>>> GBIF Secretariat, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark
>>>
>>> Tel: +45 3532 1487  Mob: +45 2826 1487  Fax: +45 2875 1480
>>>
>>>
>>> ----------------------------------------------------------------------------------------
>>>
>>>
>>> _______________________________________________
>>> Ala-portal mailing list
>>> Ala-portal at lists.gbif.org
>>> http://lists.gbif.org/mailman/listinfo/ala-portal
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.gbif.org/pipermail/ala-portal/attachments/20140318/41b185e8/attachment-0001.html 


More information about the Ala-portal mailing list