[GloBI] GloBI + spark + wikidata + taxon links

Jorrit Poelen jhpoelen at xs4all.nl
Fri Jan 5 22:30:45 CET 2018

Hey y'all - 

Just wanted to share some experiments I did using guoda / spark and

Today, after loading the wikidata archive into hdfs, I've extracted all
taxon items (~2M) from wikidata and associated taxon ids (e.g., gbif,
itis) in less than about 5 minutes. You should be able to reproduce
this using https://jupyter.idigbio.org and my notes at https://github.c
om/bio-guoda/guoda-datasets/tree/master/wikidata . 

Reason for my interest is to link GloBI into wikidata taxon items
(e.g., https://www.wikidata.org/wiki/Q140), to retrieve associated data
(e.g., images, common names) and to be able to share species
interaction data with wikidata using their native ids (e.g., https://gi
thub.com/jhpoelen/eol-globi-data/issues/209). I am sure other projects
have similar needs.

Needless to say, I would have been unable to do this data experiment
without the guoda systems that are up and running. 

Curious to hear your thoughts and hope you find this inspiring,

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.gbif.org/pipermail/globi/attachments/20180105/03f6e2ba/attachment.html>

More information about the GloBI mailing list