Hi all,

I am part of a group of people working with integrating molecular data into the Swedish equivalent of the Atlas of Living Australia, currently called Biodiversity Atlas Sweden (BAS). We are considering different strategies to do this, and would like to ask for some input from the ALA/GBIF community, especially from those of you who are also dealing with biodiversity data derived from metabarcoding och metegenomics data. 

Our main problem is that the current GBIF taxonomy backbone has insufficient coverage of many microbial organism groups, although we're hoping for substantial improvement coming out of the announced collaboration with SILVA. But even if SILVA´s cluster-based OTUs are integrated into the backbone, following the UNITE example, we think researchers will increasingly want to access and analyse data at the highest possible resolution, i.e. at Amplicon Sequence Variant (ASV) level. We see two alternative solutions to this problem, in BAS:

1) We treat ASVs as a (unranked) taxonomic level and (locally, in BAS) try to merge the existing GBIF backbone with a checklist based on unique ASV IDs and taxon strings derived from an external reference database, e.g. SILVA. We don't know how to do this, and suspect that it would require more manual curation than we have resources for, though.

2) We use the existing taxonomy (hoping that SILVA integration will soon happen, centrally, at GBIF), and treat the (ASV) sequence as a property of the observation, e.g. using the GGBN extension or dynamicProperty term. Accessing data at ASV level would then, however, require that sequences (and perhaps also primer names and sequences) are fully searchable and available for analysis and display.

Does anyone have any thoughts or advice to offer regarding this?

Thanks in advance.
Kind regards,
Maria

--
Maria Prager, PhD, Research Engineer
Science for Life Laboratory
Department of Ecology, Environment and Plant Sciences (DEEP)
Stockholm University
Email: maria.prager@scilifelab.se