[Ala-portal] Names Generator Issues

Daniel Lins daniel.lins at gmail.com
Tue Mar 18 07:16:31 CET 2014


Hi Natasha,

Thank for the great help. This solution will solve our needs perfectly.

Best Regards.

-- 
Daniel Lins da Silva
(Cel) 55 11 96144-4050
Research Center on Biodiversity and Computing (Biocomp)
University of Sao Paulo, Brazil
daniellins at usp.br
daniel.lins at gmail.com
  Hi Allan,

 We would provide a DwCA with the Catalogue of Life species in it.  Yes,
you will need to add your species to this file in the same format. If you
want your species to be merged into the Catalogue of life hierarchy you
will need to provide appropriate parentIds.

 We would provide a tool within the ala-name-matching (available as a jar
file in our maven repository) to generate a list based on a DwCA. You would
need to run the tool pointing at your modified DwCA.

 We will let you know when this is available.

 Hope that this all makes sense.

 Regards
Natasha

  From: Allan Koch <allan.kv at gmail.com>
Date: Tuesday, 18 March 2014 7:34 AM
To: Natasha Carter <natasha.quimby at csiro.au>
Cc: "Martin, Dave (CES, Black Mountain)" <David.Martin at csiro.au>, "
Ala-portal at lists.gbif.org" <Ala-portal at lists.gbif.org>
Subject: Re: [Ala-portal] Names Generator Issues

   Hi Natasha,

 It would be great. We have 109 species that we need to include in the
current name matching index.

 It would be great if you send me a DwC-A with CoL names. I will need just
to add these 109 species in that archive in the same format, right?

 How will work this proposed solution?
Will you provide a souce code, a compiled program (JAR) or we will send to
you the DwC-A and you will generate the Lucene index?

 Thank you very much for helping.

 Best regards,

 Allan Koch Veiga

Núcleo de Pesquisa em Biodiversidade e Computação - BioComp
Laboratório de Automação Agrícola - LAA
Depto. de Engenharia de Computação e Sistemas Digitais - PCS
Engenharia Elétrica - Escola Politécnica da USP
Celular: +55 11 8401-2277
Email: allan.kv at usp.br

"*Stay hungry, stay foolish.*" Stewart Brand


2014-03-17 3:12 GMT-03:00 <Natasha.Quimby at csiro.au>:

>  Hi Allan,
>
>  The ala-name-generator is useful if you want to use the Australian
> National Species list as the main source for your namematching index.  We
> would not suggest using this to supplement the name matching index with
> additional species.
>
>  In order to support custom species lists we are planning an enhancement
> to generate the namematching index from a DarwinCore Archive. We would
> envision that all the species would be provided as a single DWCA with the
> attached meta.xml.   We think that this could be achieved in the 1-2 week
> window that you mentioned.  We could provide a DWCA which contains
> Catalogue of Life  as a basis for you to start with. You can then add
> additional names to the DWCA as you please. Do you think that this would
> suit you needs?
>
>  Regards
> Natasha
>
>  From: Allan Koch <allan.kv at gmail.com>
> Date: Saturday, 15 March 2014 4:19 AM
> To: "Martin, Dave (CES, Black Mountain)" <David.Martin at csiro.au>
>
> Cc: "Ala-portal at lists.gbif.org" <Ala-portal at lists.gbif.org>
> Subject: Re: [Ala-portal] Names Generator Issues
>
>     I thank you for the quick answer David and Tim .
>
> We have studied the process to create the namematching index based on the
> National List of Australia and I see that reproducing the same process to
> create a new National List is quite complex.
>
> But, for now, we just have the demand to include some names that aren´t
> included in current namematching index.
>
> If we understood, we need at first, run the Names Generator with the input
> of a set of CSVs from APNI, APC e AFD. Based on the output of the Names
> Generator we run the Name Mathcing for creating the LUCENE index, right?
>
> If it´s right, it would be great If we could execute this same standard
> process, but with de input CSVs modified, with our set of names included in
> these CSVs (in the same format).
>
> In the future (after this 3 months) we can study the possibility to
> generate our complete National List .
>
> But for now, we need to include a set of names in the namematching index.
> It could be possible to be realized in a short time, in one or two weeks?
>
>  Best regards,
>
>  Allan Koch Veiga
>
> Núcleo de Pesquisa em Biodiversidade e Computação - BioComp
> Laboratório de Automação Agrícola - LAA
> Depto. de Engenharia de Computação e Sistemas Digitais - PCS
> Engenharia Elétrica - Escola Politécnica da USP
> Celular: +55 11 8401-2277
> Email: allan.kv at usp.br
>
> "*Stay hungry, stay foolish.*" Stewart Brand
>
>
> 2014-03-12 23:15 GMT-03:00 <David.Martin at csiro.au>:
>
>>  Thanks Allan, Paulo, Tim.
>>
>>  We appreciate your efforts in setting this software upm locally, and
>> thanks for emailing the list.
>>
>>  1) Versioning
>>
>>  While we are on the track of making this software re-usable by other
>> projects/organisations, it is still very early days. Versioning and
>> packaging are things that we need to tackle properly in the 3 month
>> evaulation period [2] and we are working with GBIF on the best approach
>> here (see Tim's email regarding Ansible). To date, the ALA environment
>> itself is the only place these components are used in production and we
>> manage these closely ourselves. We havent had a need to tightly version
>> components, but as other projects become reliant we need to do this
>> properly. At this point it time, I'd recommend ignoring developments on
>> branches within SVN.
>>
>>  2) ala-name-generator
>>
>>  We didn't anticipate that other projects would be using the
>> ala-name-generator code at this stage (or at all), and instead would rely
>> on the Catalogue of Life names lucene index we've produced [1]. The
>> ala-name-generator code as it currently is isnt suitable for use outside
>> the Australian context. It is dealing with some of the quirks of Australian
>> species lists and merging some elements from different sources. We should
>> have marked wikis to that effect.
>>
>>  That said, we appreciate the need for other projects to use their own
>> taxonomic checklists. This was something I'd hope we tackle in the 3 month
>> evaluation period [2]. There's a few of potential approaches here we are
>> exploring and we'll email this list soon with some progress on this front.
>> I suggest in the meantime, projects make use of the existing index [1].
>>
>>  Thanks again,
>>
>>  Dave Martin
>> ALA
>>
>>  [1] http://biocache.ala.org.au/archives/col_namematching.tgz
>> [2] See GBIF's email sent 21st Feb 2014 - "Biodiversity data portals:
>> Using the ALA tooling"
>>
>>
>>  From: "Tim Robertson [GBIF]" <trobertson at gbif.org>
>> Date: Thursday, 13 March 2014 2:16 am
>> To: Paulo André <pfilipak at gmail.com>
>> Cc: "Ala-portal at lists.gbif.org" <Ala-portal at lists.gbif.org>
>> Subject: Re: [Ala-portal] Names Generator Issues
>>
>>  Hi Paulo
>>
>>  Those are all good comments - I'll make sure the ALA dev team are
>> following those issues.
>> As this goes forward, it is clear that code releases are going to be
>> needed, so we get immutable binaries in nexus and tagged SVN branches.
>>  I'll try and raise this with Dave Martin.
>>
>>  I'll try and follow the resolutions for the issues you log, build an
>> artifact and verify the same results.
>> I'm not so much into scala, but IIRC I saw that issue with another
>> artifact.  The solution was to run this before running the command line:
>>   jar -xf ala-names-generator-1.0-SNAPSHOT-assembly.jar lib
>>
>>  I found this in the way they run the biocache command line tools in:
>>
>> https://ala-portal.googlecode.com/svn/trunk/biocache-install/ubuntu/install.sh
>>
>>  It may not be the solution, but worth trying.
>>
>>  I hope this helps,
>> Tim
>>
>>
>>
>>  On Mar 12, 2014, at 3:57 PM, Paulo André wrote:
>>
>>  Tim
>>
>>  I had have several issues on
>> https://code.google.com/p/ala-portal/source/browse/#svn%2Ftrunk%2Fala-names-generator<https://code.google.com/p/ala-portal/source/browse/#svn/trunk/ala-names-generator>
>>
>>  I wrote on Jira: http://dev.gbif.org/issues/browse/ALA
>>
>>  []'s
>> Paulo Andre Filipak
>>
>>
>> 2014-03-12 11:51 GMT-03:00 Tim Robertson [GBIF] <trobertson at gbif.org>:
>>
>>>  Hi Allan,
>>>
>>>  I am sure the ALA folks will comment when they wake up.  But...
>>>
>>>  It doesn't appear to be published as an artifact in the ALA maven
>>> repository:
>>>   http://maven.ala.org.au/repository/au/org/ala/
>>>
>>>  You could build from source from:
>>> https://code.google.com/p/ala-portal/source/browse/#svn%2Ftrunk%2Fala-names-generator<https://code.google.com/p/ala-portal/source/browse/#svn/trunk/ala-names-generator>
>>>
>>>  I presume using something along the lines of "mvn clean
>>> assembly:assembly"
>>>
>>>  I hope this helps provide some options,
>>> Tim
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>   On Mar 12, 2014, at 3:32 PM, Allan Koch wrote:
>>>
>>>    Does anyone knows where I can download this jar:
>>> *ala-names-generator-1.0-SNAPSHOT-assembly.jar*?
>>>
>>> I´m trying to generate a new Taxon Name List based on NSL for the
>>> Biocache processing.
>>> This instructions has been followed:
>>>
>>> http://code.google.com/p/ala-portal/wiki/UpgradeALANames
>>>
>>> According the instructions, I need to run this command:
>>>
>>> java -Xmx1G -Xms1G -cp .:ala-names-generator-1.0-SNAPSHOT-assembly.jar
>>> au.org.ala.names.NamesGenerator --all
>>>
>>> But, I can´t find this JAR.
>>>
>>>  We are trying to build the Scala Project, but we are having some
>>> troubles.
>>> Would help me, for while, if I could run a ready JAR.
>>>
>>>  Best regards,
>>>
>>>   Allan Koch Veiga
>>>
>>> Research Center on Biodiversity and Computing - BioComp
>>> University of São Paulo
>>>
>>> Laboratório de Automação Agrícola - LAA
>>> Depto. de Engenharia de Computação e Sistemas Digitais - PCS
>>> Engenharia Elétrica - Escola Politécnica da USP
>>> Celular: +55 11 98401-2277
>>> Email: allan.kv at usp.br
>>>
>>> "*Stay hungry, stay foolish.*" Stewart Brand
>>>     _______________________________________________
>>> Ala-portal mailing list
>>> Ala-portal at lists.gbif.org
>>> http://lists.gbif.org/mailman/listinfo/ala-portal
>>>
>>>
>>>
>>> ----------------------------------------------------------------------------------------
>>>  Tim Robertson - GBIF Head of Informatics - trobertson at gbif.org
>>>  Global Biodiversity Information Facility http://www.gbif.org/
>>>  GBIF Secretariat, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark
>>>  Tel: +45 3532 1487  Mob: +45 2826 1487  Fax: +45 2875 1480
>>> ----------------------------------------------------------------------------------------
>>>
>>>
>>> _______________________________________________
>>> Ala-portal mailing list
>>> Ala-portal at lists.gbif.org
>>> http://lists.gbif.org/mailman/listinfo/ala-portal
>>>
>>>
>>
>>
>> ----------------------------------------------------------------------------------------
>>
>> Tim Robertson - GBIF Head of Informatics - trobertson at gbif.org
>>
>> Global Biodiversity Information Facility http://www.gbif.org/
>>
>> GBIF Secretariat, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark
>>
>> Tel: +45 3532 1487  Mob: +45 2826 1487  Fax: +45 2875 1480
>>
>>
>> ----------------------------------------------------------------------------------------
>>
>>
>> _______________________________________________
>> Ala-portal mailing list
>> Ala-portal at lists.gbif.org
>> http://lists.gbif.org/mailman/listinfo/ala-portal
>>
>>
>

_______________________________________________
Ala-portal mailing list
Ala-portal at lists.gbif.org
http://lists.gbif.org/mailman/listinfo/ala-portal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.gbif.org/pipermail/ala-portal/attachments/20140318/1ff3a540/attachment-0001.html 


More information about the Ala-portal mailing list