[API-users] Backbone data as SQL

Markus Döring mdoering at gbif.org
Wed Nov 16 12:02:21 CET 2016


Hi,

based on the CLB data model https://github.com/gbif/checklistbank/blob/master/docs/schema.pdf
I have created a tab delimited text dump (1.2GB uncompressed) from the following statement:

SELECT u.id, u.parent_fk, u.basionym_fk, u.is<http://u.is>_synonym, u.status, u.rank, u.nom_status, u.constituent_key, u.origin, u.source_taxon_key,
 u.kingdom_fk, u.phylum_fk, u.class_fk, u.order_fk, u.family_fk, u.genus_fk, u.species_fk,
 n.id as name_id, n.scientific_name, n.canonical_name,
 n.genus_or_above, n.specific_epithet, n.infra_specific_epithet, n.notho_type, n.authorship, n.year, n.bracket_authorship, n.bracket_year,
 cpi.citation as name_published_in, u.issues
FROM name_usage u
 JOIN name n ON u.name_fk=n.id
 LEFT JOIN citation cpi ON u.name_published_in_fk=cpi.id
WHERE u.dataset_key=nubKey() and u.deleted IS NULL;

The gzipped tab file is hosted here:
http://rs.gbif.org/datasets/backbone/backbone-current.txt.gz

You should be able to import that into most relational dbs but for sure into postgres with a table DDL like this (also attached):

CREATE TABLE backbone (
 id int PRIMARY KEY,
 parent_key int,
 basionym_key int,
 is_synonym boolean,
 status text,
 rank text,
 nom_status text[],
 constituent_key text,
 origin text,
 source_taxon_key int,

 kingdom_key int,
 phylum_key int,
 class_key int,
 order_key int,
 family_key int,
 genus_key int,
 species_key int,

 name_id int,
 scientific_name text,
 canonical_name text,
 genus_or_above text,
 specific_epithet text,
 infra_specific_epithet text,
 notho_type text,
 authorship text,
 year text,
 bracket_authorship text,
 bracket_year text,

 name_published_in text,
 issues text[]
)



If this format is of interest to a wider community I will make it available for every new backbone version we produce.

Best,
Markus



On 16 Nov 2016, at 11:34, Markus Döring <mdoering at gbif.org<mailto:mdoering at gbif.org>> wrote:

Hi,
if there is a demand in having the backbone as a SQL file we should maybe consider to provide one directly from GBIF ChecklistBanks relational model.
Would a single table with all taxa be sufficient as a start or are vernacular names, images and other additional information key?
I can probably supply a sql dump for the core taxonomy quickly, let me try.

Markus


On 16 Nov 2016, at 11:01, Köhler Christian <C.Koehler at zfmk.de<mailto:C.Koehler at zfmk.de>> wrote:

Hi,

I already looked at both of the git project. Looks interesting. Is I am more "pythonic" the idb-backend is easier to adapt for me.
Thanks for showing me. I was hoping that someone already addressed the question as it seemed to me quite obviously to provide the data in a widely used format like sql. I have the feeling I am reenventing the wheel ;-)

The "B-HIT" tool seems to be a reasonable alternative to get a the desired data into an sql data base. In case you are interested, have a look here:
https://www.researchgate.net/publication/283543150_B-HIT_-_A_Tool_for_Harvesting_and_Indexing_Biodiversity_Data
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4636251/

Cheers
Chris


Am 14.11.2016 um 17:40 schrieb Scott Chamberlain:
Hi Chris, I started something a while back to automate building a SQLite version of the backbone taxonomy (https://github.com/ropensci/gbif-backbone-sql) but it's not quite done yet. Idea is to run on Heroku (e.g., once a day), resulting in a fresh SQLite version of the backbone taxonomy on Amazon S3.

Scott

On Mon, Nov 14, 2016 at 7:50 AM Markus Döring <mdoering at gbif.org<mailto:mdoering at gbif.org>> wrote:
Hi Chris,
the latest GBIF backbone is always available as a Darwin Core archive. This is mostly a collection of tab delimited text files with the accepted and synonym names at its core.
You can find the latest and previous, archived versions here:
http://rs.gbif.org/datasets/backbone/

Best,
Markus


--
Markus Döring
Software Developer
Global Biodiversity Information Facility (GBIF)
mdoering at gbif.org<mailto:mdoering at gbif.org>
http://www.gbif.org<http://www.gbif.org/>





On 14 Nov 2016, at 16:42, Köhler Christian <C.Koehler at zfmk.de<mailto:C.Koehler at zfmk.de>> wrote:

Hi,

we are developing an application to curate taxonomic and morphological
data for scientists. At the moment we are evaluating different taxonomic
backbones to be used within our application. The GIBF taxonomic backbone
seems to be an good choice in regards to quality, number of entries and
acceptance.

Due to the nature of our application, a web service to browse the
taxonomy will not fulfil our requirements. A local copy of the GIBF data
as SQL would be an ideal solution. I looked for this data publicly
available to no avail.  "Harvesting" the GBIF rest api seems not a good
option. Are there plans to provide current taxonomic backbone data in
the future? Maybe the data is already available, but I failed to find it
yet.

Regards
Chris

--
Christian Köhler
Tel.: 0228 9122-434

Zoologisches Forschungsmuseum Alexander Koenig
Leibniz-Institut für Biodiversität der Tiere
Adenauerallee 160, 53113 Bonn, Germany
www.zfmk.de<http://www.zfmk.de/>

Stiftung des öffentlichen Rechts
Direktor: Prof. J. Wolfgang Wägele
Sitz: Bonn
--
Zoologisches Forschungsmuseum Alexander Koenig
- Leibniz-Institut für Biodiversität der Tiere -
Adenauerallee 160, 53113 Bonn, Germany
www.zfmk.de<http://www.zfmk.de/>

Stiftung des öffentlichen Rechts; Direktor: Prof. J. Wolfgang Wägele
Sitz: Bonn
_______________________________________________
API-users mailing list
API-users at lists.gbif.org<mailto:API-users at lists.gbif.org>
http://lists.gbif.org/mailman/listinfo/api-users

_______________________________________________
API-users mailing list
API-users at lists.gbif.org<mailto:API-users at lists.gbif.org>
http://lists.gbif.org/mailman/listinfo/api-users



--
Christian Köhler
Tel.: 0228 9122-434

Zoologisches Forschungsmuseum Alexander Koenig
Leibniz-Institut für Biodiversität der Tiere
Adenauerallee 160, 53113 Bonn, Germany
www.zfmk.de<http://www.zfmk.de/>

Stiftung des öffentlichen Rechts
Direktor: Prof. J. Wolfgang Wägele
Sitz: Bonn

--
Zoologisches Forschungsmuseum Alexander Koenig
- Leibniz-Institut für Biodiversität der Tiere -
Adenauerallee 160, 53113 Bonn, Germany
www.zfmk.de<http://www.zfmk.de/>

Stiftung des öffentlichen Rechts; Direktor: Prof. J. Wolfgang Wägele
Sitz: Bonn
_______________________________________________
API-users mailing list
API-users at lists.gbif.org<mailto:API-users at lists.gbif.org>
http://lists.gbif.org/mailman/listinfo/api-users

_______________________________________________
API-users mailing list
API-users at lists.gbif.org<mailto:API-users at lists.gbif.org>
http://lists.gbif.org/mailman/listinfo/api-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gbif.org/pipermail/api-users/attachments/20161116/024fb6bf/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: backbone-ddl.sql
Type: application/octet-stream
Size: 638 bytes
Desc: backbone-ddl.sql
URL: <http://lists.gbif.org/pipermail/api-users/attachments/20161116/024fb6bf/attachment-0001.obj>


More information about the API-users mailing list