[GloBI] Ontology of Biotic Interactions?

Jorrit Poelen jhpoelen at xs4all.nl
Wed Nov 28 00:41:05 CET 2018


Hi John:

Thanks for sharing the access database file. I was able to convert the 
file to tsv files without too much trouble and had a look at the 
examples you shared.

I appreciate how you normalized the life stages and sex of the 
interacting things. also, the reverse interactions and the pairs help to 
more intuitively understand and parse the data.

Re: associationTypes - When looking at the association kinds (from 
tblAssocKinds), I realized that bodyPart (e.g., "leaf"), life stage 
(e.g., "nymph") and physiological state (e.g., "dead") are mixed into 
the interaction type (e.g., "feeding on"). I can see how this notation 
can be handy for data entry or capture writing on labels, however, I 
would have the tendency to map the interaction phrases to separate the 
different kinds of things into separate columns, just like you did with 
lifestage and sex.  That said, I'd always want to keep the verbatim 
interaction phrase around to preserve the original language.

Re: list of OBO relations terms - they are listed, but in specialized 
formats (e.g., OBO, OWL) at http://purl.obolibrary.org/obo/ro.obo and 
http://purl.obolibrary.org/obo/ro.owl respectively. I've opened an issue 
to remind myself to make it easier to provide a list (see 
<https://github.com/jhpoelen/eol-globi-data/issues/386>). For the time 
being, you might be inspired by a subset of the supported interactions 
types via 
<https://api.globalbioticinteractions.org/interactionTypes.csv?type=csv> 
. More on that later.

Re: mapping interaction terms - I agree that automated mapping is tricky 
business. What I had in mind is more of a static translation table that 
is used to maintain how one systems interaction terms (like yours) would 
translate into another naming scheme (like OBO Relations Ontology). In 
our case, an automation would use the static translation table to link 
the RO terms. So, no fancy methods here. An example of such a 
translation table can be found at 
https://github.com/globalbioticinteractions/inaturalist : 
https://github.com/globalbioticinteractions/inaturalist/blob/master/interaction_types.csv 
translates terms native the iNaturalist into RO . 
https://github.com/globalbioticinteractions/inaturalist/blob/master/interaction_types_ignored.csv 
contains a list of terms that are explicitly ignored.

Re: next steps - in my experience, highly normalized data structures are 
important and useful when actively managing and curating data. However, 
when exporting data to other systems, often a denormalized format (aka 
"wide single table") really makes life a lot easier for moving snapshots 
of the data around . . . as long as it's automated and the identifiers 
are preserved. So, my suggested next step in our integration would be to 
figure out how to create a method that automatically generates a 
de-normalized table from your wealth of association data in a similar 
form as outlined in 
https://github.com/globalbioticinteractions/template-dataset and, more 
specifically in 
https://github.com/globalbioticinteractions/template-dataset/blob/master/interactions.tsv 
. Once the de-normalization is complete, terms (like assocKinds, but 
also lifestage, bodypart, physiological state) can be translated using 
static translation tables (see above).

In short - in my view, an integration would preserve the autonomy of 
local data management, including terms, while automating a translation 
into a de-normalized format friendly for integration. This allows to 
updates to easily flow through the system without manual intervention 
without effecting the existing data management platform. Most 
importantly perhaps, the process contains sufficient information for 
GloBI (or others) to link back into your database as well as the 
original sources / specimen.

I hope this helps and I'd be interested to help document this 
integration between your Neuropterida database with GloBI as a use case 
to share with our peers.

Curious to hear your thoughts,

-jorrit


On 11/27/18 12:29 PM, John Oswald wrote:
>
> Hi Jorrit,
>
>      See below…
>
> I think that sharing the Access database file would be a great start.
>
> ---I have just shared a Dropbox folder with you that contains an 
> Access database that contains the several tables that I am currently 
> working with to try to formulate a strategy for capturing relationship 
> data. I’m happy to receive any comments/suggestions for improvements. 
> The three tables are briefly discussed below.
>
> tblAssocKinds (association kinds) – Contains a list of 544 
> “association kind” text strings. Many (ca. 200) of these originated 
> with an initial 2010 dataset of associations that had been extracted 
> from insect specimen labels by Norm Johnson of Ohio State University. 
> I subsequently tried to standardize the phrasing of the original 
> association strings, added additional strings, then tried to write 
> “reverse association” strings so that there were pairs of phrases that 
> could be used to state the relationship from the “opposite sides” of 
> the association. The pairings are held in tblAssocPairs. Not all 
> AssocKinds have reverse associations, so some are not currently 
> included in a pair. The AssocKinds are mostly taxon-to-taxon or 
> taxon-to-inanimate, but there are some outliers as I was experimenting 
> with different kinds of associations. Is this kind of “reverse 
> association” information useful in a broader context?
>
> tblAssocPairs (association pairs) – Contains pairings of values from 
> tblAssocKinds.
>
> frmFlexAssocPairs – A simple query that contains AssocPairs as both 
> AssocKind IDs and text strings (easier to read).
>
> tblAssociations – My current working table for capturing association 
> data from the literature. This table currently contains only ca. 150 
> records of test data entered from the literature (I want to make sure 
> that I get things set up optimally before extending the data capture 
> effort). The table is general structured around a “left” associate and 
> a “right” associate, “separated” by an AssocPair ID that links to 
> tblAssocPairs. The table is currently structured for ease of data 
> capture from the literature, and includes some other kinds of 
> desirable data (e.g., sex, life stage, geography, literature source) 
> that would be captured from the same literature sources. As I get into 
> this though, it seems clear that many of those other data, which will 
> not be available for all associations, should probably be removed to 
> other relationally-linked tables in order to keep the data normalized. 
> In the current structure the “left” associate is assumed to be an 
> insect species of the superorder Neuropterida, which is specified by a 
> “combination object” ID (field LeftNidaCombObjID). For data entry 
> purposes this ID links to an episodically re-calculated lookup table 
> of >20,000 genus-group/species-group scientific name combinations 
> (essentially a master lookup table of almost every Neuropterida 
> combination that has ever been used in the literature). This 
> “combination” ID can be used to link into most of the related 
> taxonomic and nomenclatural data in my database. The link to the 
> literature source is specified in field BibObjPageCiteID 
> (Bibliographic Object Page Citation ID), which is an identifier that 
> specifies a particular page/plate in the neuropterid literature (from 
> which can be looked up all of the typical bibliographic information 
> about the literature source, plus other data linked to individual 
> literature pages, e.g., figures, chresonyms, and other annotations). 
> This links to my bibliographic dataset of 17,000+ published works that 
> contain information on the Neuropterida.
>
> I imagine that a first GloBI integration (or any other integration) 
> would preserve the existing system and implement an automated 
> translation or mapping procedures (e.g., scripts).
>
> ---To the extent possible I would prefer to not have to rely on 
> automated translations, which are prone to interpretation errors (or 
> maybe I’m misunderstanding what is automated here…). I would prefer to 
> “oversplit” the associations that I use in my database, then re-group 
> those associations into aggregate sets that correspond to other set(s) 
> of association types used by other projects. This would give me more 
> flexibility for defining associations that are useful for my purposes, 
> and more control over how those associations are mapped when used in 
> (potentially multiple) other contexts. We’re probably saying basically 
> the same thing here, but I would like to retain the ability to 
> control/influence the basic mapping of associations through 
> relationships defined in my database.
>
> One of such procedures would include a mapping from your internal 
> association types into another naming scheme such as OBO Relation 
> Onology (http://obofoundry.org/ontology/ro.html 
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__obofoundry.org_ontology_ro.html&d=DwMDaQ&c=ODFT-G5SujMiGrKuoJJjVg&r=n0isp79O1WSTtoOYJGr1_rF-2PrQuw41UXGiGQ_Rpb8&m=SYaRVnY0jiGjyYGGuj4Lx0dPncmBZFAatHgtwJ-h4Uk&s=so-xkW80dqsEkKPNLOdRhe1JCD0nC8HQgpn-zKjmaDE&e=>): 
> the closer the terms are in meaning, the easier the mapping is.
>
> ---Right, see above. How can I get a complete listing of the relation 
> strings used in the Relation Ontology, their definitions (meanings), 
> and their hierarchical organization? I don’t know how to do that. Is 
> that something you can conveniently download, put into a table-based 
> format, then send to me so that I can incorporate it into my database 
> to experiment with?
>
> ---I expect that the relations in the Relation Ontology are fairly 
> general. Is there another ontology (or some other source of relation 
> terms) that deals more specifically (or at least includes) more 
> specific relations that would pertain between insect taxa and either 
> insect or non-insect taxa, and/or insect taxa and inanimate objects? 
> That is something that I would find useful. Does something like that 
> exist? Outside of the Relation Ontology, where do you source other 
> kinds of relations? Or, how does one go about contributing to the 
> development of a more specific ontology of relations that pertain to 
> insects? Would that be a useful thing?
>
> With such a translation / mapping method in place, others can more 
> easily find your data and you can more easily find similar projects. 
> Then, hopefully, over time, we'll learn from each other in discussion 
> forums, professional meetings or workshops and make it easier to share 
> and capture these complex datasets.
>
> ---I’m interested to know if others have already developed 
> well-defined sets of terms/phrases that describe 
> relations/associations among insects and other organisms and inanimate 
> objects. Can you make any recommendations for where I might look to 
> find to find such sets of terms/phrases? Do any of the other projects 
> involved in GloBI have such term/phrase sets that are available for 
> examination?
>
> Perhaps we'll even settle on some best practices!
>
> So, yes, please send a (complete/partial) copy of your native database 
> files or raw datasets, so I can get a sense of what your datasets 
> looks like.
>
> ---You should have received an e-mail from Dropbox on this today, or 
> will soon.
>
> Cheers,
>
> John
>
> Curious to hear your thoughts,
>
> -jorrit
>
> On 11/21/18 11:32 AM, John Oswald wrote:
>
>     Hi Jorrit,
>
>         Thanks for responding. Deborah Paul mentioned to me at the
>     recent ESA meeting in Vancouver, BC, that GloBI would be one place
>     that I should look, so I hoped to make contact with you by posting
>     to the GloBI listserver. See more interleaved below…
>
>     ---oo0oo---
>
>     John D. Oswald
>
>     Professor of Entomology
>
>     Curator, Texas A&M University Insect Collection
>
>     Department of Entomology
>
>     Texas A&M University
>
>     College Station, TX  77843-2475
>
>     E-mail: j-oswald at tamu.edu <mailto:j-oswald at tamu.edu>
>
>     Phone: 1-979-862-3507
>
>     Lacewing Digital Library: http://lacewing.tamu.edu/
>
>     Bibliography of the Neuropterida: http://lacewing.tamu.edu/Biblio/Main
>
>     Neuropterida Species of the World:
>     http://lacewing.tamu.edu/SpeciesCatalog/Main
>
>     *From:*GloBI <globi-bounces at lists.gbif.org>
>     <mailto:globi-bounces at lists.gbif.org>*On Behalf Of *Jorrit Poelen
>     *Sent:* Tuesday, November 20, 2018 8:59 PM
>     *To:* globi at lists.gbif.org <mailto:globi at lists.gbif.org>
>     *Subject:* Re: [GloBI] Ontology of Biotic Interactions?
>
>     Hey John -
>
>     Like Robert (hi Robert!) mentioned, GloBI is also using the OBO
>     Relations Ontology for defining biotic and abiotic interaction
>     types, just like many other projects.
>
>     ---As a newbie to ontologies (and an entomologist, not a computer
>     scientist) I’m still trying to wrap my head around ontologies. In
>     a nutshell, so far as I can determine, an ontology is at heart an
>     extended web of controlled vocabulary terms with relationships
>     defined between terms at different web nodes (terms), together
>     with facilities to link to other ontologies. Is that about right?
>
>     I've been attempting to collect my thoughts on data format and
>     models at
>     https://github.com/jhpoelen/globis-b-interactions/blob/master/text/on-species-interaction-data-models-and-formats.md
>     <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jhpoelen_globis-2Db-2Dinteractions_blob_master_text_on-2Dspecies-2Dinteraction-2Ddata-2Dmodels-2Dand-2Dformats.md&d=DwMDaQ&c=ODFT-G5SujMiGrKuoJJjVg&r=n0isp79O1WSTtoOYJGr1_rF-2PrQuw41UXGiGQ_Rpb8&m=RumeEb9OhO5-uPYMBvLVR_6mx3SCkPawDJ-uJYJWSC4&s=5rsVFs3GJwqHVhJgL9CJ3zhrFjEJuilAn7bu9SAW91c&e=>
>     .
>
>     ---Thanks for the link. I have briefly looked this over just now,
>     but will go back for a detailed read later. There appears much
>     there that would be good for me to consider as I continue
>     development on my end.
>
>     In my experience, an effective way to figure out how to
>     capture/share your data is to use what you have today (as is!) and
>     try to integrate a subset of it with other projects (like GloBI,
>     GBIF).
>
>     ---For the present, as briefly explained in my initial e-mail, I’m
>     mostly trying to set up an efficient way to capture data on
>     interactions/associations of neuropterid species with other taxa,
>     inanimate objects, and concepts in my personal Neuropterida
>     research database on the three insect orders Neuroptera,
>     Megaloptera, and Raphidioptera. The Neuropterida database
>     (currently built in Access) is highly parsed (ca. 350 relationally
>     linked tables, including ca. 150 standardized ‘lookup’ tables
>     covering various subjects), and significantly normalized (ca. 300
>     tables normalized to 3NF or better). The core data are
>     bibliographic, nomenclatural, taxonomic, and ‘agent based’, but
>     there are extensions going off in many other directions. I
>     currently share parts of these data via episodic downloads to
>     support a variety of projects – particularly the Neuropterida
>     domain of the Catalogue of Life (used by GBIF and many other
>     projects globally), and the various modules of the Lacewing
>     Digital Library project (lacewing.tamu.edu). I am an insect
>     systematist/taxonomist by training, but I got into databasing in
>     the early 1990’s and have been capturing data on the Neuropterida
>     ever since. I am currently involved in a project whose primary
>     product will be a new module in the Lacewing Digital Library that
>     is devoted to interactions/associations (mostly predator/prey) of
>     neuropterid insects and hemipterous insects. Thus, my primary
>     motivation at the current time is to develop an effective and
>     efficient database subschema for capturing these kinds of data,
>     and to relationally link that subschema into my existing overall
>     database schema. I would like to do this in a fairly general way
>     so that I can (1) capture a wide variety of different kinds of
>     interaction/association data into the same subschema of my
>     database, (2) standardize the terminology/phrasing that I use so
>     that terms/phrases are based on explicit definitions and are
>     consistent with other similar controlled vocabularies for similar
>     projects (to the extent that this may be possible; thus my foray
>     here into ontologies…), and (3) be reasonably sure that there is
>     fairly straightforward pathway through which whatever
>     interaction/association subschema I develop within my database
>     that it can be linked out to other projects that I might get
>     involved with in the future.
>
>     Your projects sounds very similar to other projects that have
>     already been integrated into GloBI - a mix of specimen and
>     literature data with their own way of describing interaction terms.
>
>     ---Yes, I am sure that there are lots of other projects that are
>     capturing similar data. I’d like to learn from those projects so
>     that I can avoid any common pitfalls and ‘start off on the right
>     foot’ as I get into this. To the extent possible, as I get
>     started, I’d like to tap into any well-defined sets of interaction
>     terms that may already exist. My initial though has been to build
>     out this subschema in my database starting from my current table
>     of ‘association kinds’. But I’m open to new ideas, and looking for
>     someone who might be interested to discuss such issues.
>
>     Do you have some samples that you can share so that I can get a
>     sense of what you currently have?
>
>     ---Sure. I can export a few tables to a simple Access database and
>     send that to you. Can you work with that? Please confirm. If so, I
>     can e-mail it to you or post it to you via Dropbox (depending on
>     size).
>
>     ---One of the difficulties that I am currently having is how one
>     might extract the “data items” (to me this would be the set of
>     controlled terms/phrases and their linking interaction/association
>     terms) from one or more ontologies so that those data can be
>     placed into and manipulated within a relational database
>     structure. Is there an easy way to do that?
>
>     Cheers,
>
>     John
>
>     thx,
>
>     -jorrit
>
>     https://globalbioticinteractions.org
>     <https://urldefense.proofpoint.com/v2/url?u=https-3A__globalbioticinteractions.org&d=DwMDaQ&c=ODFT-G5SujMiGrKuoJJjVg&r=n0isp79O1WSTtoOYJGr1_rF-2PrQuw41UXGiGQ_Rpb8&m=RumeEb9OhO5-uPYMBvLVR_6mx3SCkPawDJ-uJYJWSC4&s=I3T9B7MwP8QsKCbpPfmjxbX37fWerqfnZPmO9RYlbCM&e=>
>
>
>     On 11/19/18 6:21 PM, Bates, Robert P wrote:
>
>         Hi John,
>
>         We’ve been working with subsets of the OBO Relation Ontology
>         (which if I’m not mistaken is also what GloBI uses) to provide
>         the concepts for interaction relationships in our VERA
>         modeling system:
>
>         http://www.obofoundry.org/ontology/ro.html
>         <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.obofoundry.org_ontology_ro.html&d=DwMDaQ&c=ODFT-G5SujMiGrKuoJJjVg&r=n0isp79O1WSTtoOYJGr1_rF-2PrQuw41UXGiGQ_Rpb8&m=RumeEb9OhO5-uPYMBvLVR_6mx3SCkPawDJ-uJYJWSC4&s=kzb8EYNLqZjOBHOdXtKcwh141kUweLk_nINGFtUMVDc&e=>
>
>         -R
>
>         *Robert Bates*
>
>         Research Scientist
>
>         Design & Intelligence Lab
>
>         *Georgia Institute of Technology*
>
>         Technology Square Research Building, 85 5th Street NW,
>         Atlanta, GA 30308
>
>         e: rbates8 at gatech.edu <mailto:rbates8 at gatech.edu>
>
>         m: 770.713.8531
>
>         *From: *GloBI <globi-bounces at lists.gbif.org>
>         <mailto:globi-bounces at lists.gbif.org>on behalf of John Oswald
>         <j-oswald at tamu.edu> <mailto:j-oswald at tamu.edu>
>         *Date: *Monday, November 19, 2018 at 9:18 PM
>         *To: *"globi at lists.gbif.org"
>         <mailto:globi at lists.gbif.org><globi at lists.gbif.org>
>         <mailto:globi at lists.gbif.org>
>         *Subject: *[GloBI] Ontology of Biotic Interactions?
>
>         I’m growing and extending interaction/association data in my
>         research database on the species of the superorder
>         Neuropterida (Insecta: orders Neuroptera, Megaloptera, and
>         Raphidioptera) of the world. The data is primarily drawn from
>         the published literature; some also from specimen labels. I’m
>         interested in standardizing the terminology that I use to
>         describe interactions and associations. I’m interested in
>         taxon to taxon interactions (e.g., species X eats species Y;
>         species X is phoretic on species Y), taxon to inanimate object
>         interactions/associations (e.g., species X oviposits on
>         substrate Y [say, rocks]), and taxon to concept associations
>         (e.g., species X exhibits behavior Y). Can anyone recommend
>         any good lists of standardized terms (with definitions) for
>         this sort of thing? Are there any good, well developed,
>         ontologies for general taxon-taxon and/or taxon-inanimate
>         interactions? I have a list of 500+ “association kinds”
>         (without well-standardized definitions) that I have scraped
>         together over the years. I’d like to plug these into (or
>         convert them into) something more standardized if something
>         more standard exists. Thanks for any suggestions on where I
>         might go next on this.
>
>         ---oo0oo---
>
>         John D. Oswald
>
>         Professor of Entomology
>
>         Curator, Texas A&M University Insect Collection
>
>         Department of Entomology
>
>         Texas A&M University
>
>         College Station, TX  77843-2475
>
>         E-mail: j-oswald at tamu.edu <mailto:j-oswald at tamu.edu>
>
>         Phone: 1-979-862-3507
>
>         Lacewing Digital Library: http://lacewing.tamu.edu/
>
>         Bibliography of the Neuropterida:
>         http://lacewing.tamu.edu/Biblio/Main
>
>         Neuropterida Species of the World:
>         http://lacewing.tamu.edu/SpeciesCatalog/Main
>
>
>
>
>         _______________________________________________
>
>         GloBI mailing list
>
>         GloBI at lists.gbif.org  <mailto:GloBI at lists.gbif.org>
>
>         https://lists.gbif.org/mailman/listinfo/globi  <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gbif.org_mailman_listinfo_globi&d=DwMDaQ&c=ODFT-G5SujMiGrKuoJJjVg&r=n0isp79O1WSTtoOYJGr1_rF-2PrQuw41UXGiGQ_Rpb8&m=RumeEb9OhO5-uPYMBvLVR_6mx3SCkPawDJ-uJYJWSC4&s=RymsxaAO66KIoZRhT4qKmGNaLZuQhGlJY83ri2bvwfQ&e=>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.gbif.org/pipermail/globi/attachments/20181127/5620d1b6/attachment-0001.html>


More information about the GloBI mailing list