[API-users] Is there any NEO4J or graph-based driver for this API ?

Juan M. Escamilla Molgora j.escamillamolgora at lancaster.ac.uk
Tue May 31 18:55:06 CEST 2016


Dear all,


Thank you very much for your valuable feedback!


I'll explain a bit what I'm doing just to clarify, sorry if this spam to 
some.


I want to build a model for species assemblages based on co-occurrence 
of taxa within an arbitrary area. I'm building a 2D lattice in which for 
each cell I'm collapsing the data into a taxonomic tree (the 
occurrences). For doing this I need first to obtain the data from the 
gbif api and later, based on the ids (or names) of each taxonomic level 
(from kingdom to occurrence) build a tree coupled to each cell.


The implementation is done with postgresql (postgis) for storing the raw 
gbif data and neo4j for storing the relation

"Being a member of the [ specie, genus, family,,,] [name/id]" The idea 
is to include data from different sources similar to the project Matthew 
and Jennifer had mentioned (which I'm very interested and like to hear 
more) and traverse the network looking for significant merged information.


One of the immediate problems I've found is to import big chunks of the 
gbif data into my specification. Thanks to this thread I've found the 
tools that are the most used by the community (pygbif,rgbif, and 
python-dwca-reader). I was using urlib2 and things like that.

I'll be happy to share any code or ideas with the people interested.


Btw, I've checked the tinkerpop project which uses the Gremlin traversal 
language as independent from the DBMS.

Perhaps it's possible to use it with spark and Guoda as well?



Does GOuda is working now?


Best wishes


Juan.







On 31/05/16 17:02, Collins, Matthew wrote:
>
> Jorrit pointed out this thread to us at iDigBio. Downloading and 
> importing data into a relational database will work great, especially 
> if as Jan said you can cut the data size down to a reasonable amount.
>
>
> Another approach we've been working on in a collaboration called GUODA 
> [1] is to build an Apache Spark environment with pre-formatted data 
> frames with common data sets in them for researchers to use. This 
> approach would offer a remote service where you could write arbitrary 
> Spark code, probably in Jupyter notebooks, to iterate over data. Spark 
> does a lot of cool stuff including GraphX which might be of 
> interest. This is definitely pre-alpha at this point and if anyone is 
> interested, I'd like to hear your thoughts. I'll also be at SPNHC 
> talking about this.
>
>
> One thing we've found in working on this is that importing data into a 
> structured data format isn't always easy. If you only want a few 
> columns, it'll be fine. But getting the data typing, format 
> standardization, and column name syntax of the whole width of an 
> iDigBio record right requires some code. I looked to see if EcoData 
> Retriever [2] had a GBIF data source and they have an eBird one that 
> perhaps you might find useful as a starting point if you wanted to try 
> to use someone else's code to download and import data.
>
>
> For other data structures like BHL, we're kind of making stuff up 
> since we're packaging a relational structure and not something nearly 
> as flat as GBIF and DWC stuff.
>
>
> [1] http://guoda.bio/​
>
> [2] http://www.ecodataretriever.org/
>
>
> Matthew Collins
> Technical Operations Manager
> Advanced Computing and Information Systems Lab, ECE
> University of Florida
> 352-392-5414 <callto:352-392-5414>
> ------------------------------------------------------------------------
> *From:* jorrit poelen <jhpoelen at xs4all.nl>
> *Sent:* Monday, May 30, 2016 11:16 AM
> *To:* Collins, Matthew; Thompson, Alexander M; Hammock, Jennifer
> *Subject:* Fwd: [API-users] Is there any NEO4J or graph-based driver 
> for this API ?
> Hey y’all:
>
> Interesting request below on the GBIF mailing list - sounds like a 
> perfect fit for the GUODA use cases.
>
> Would it be too early to jump onto this thread and share our 
> efforts/vision?
>
> thx,
> -jorrit
>
>> Begin forwarded message:
>>
>> *From: *Jan Legind <jlegind at gbif.org <mailto:jlegind at gbif.org>>
>> *Subject: **Re: [API-users] Is there any NEO4J or graph-based driver 
>> for this API ?*
>> *Date: *May 30, 2016 at 5:48:51 AM PDT
>> *To: *Mauro Cavalcanti <maurobio at gmail.com 
>> <mailto:maurobio at gmail.com>>, "Juan M. Escamilla Molgora" 
>> <j.escamillamolgora at lancaster.ac.uk 
>> <mailto:j.escamillamolgora at lancaster.ac.uk>>
>> *Cc: *"api-users at lists.gbif.org <mailto:api-users at lists.gbif.org>" 
>> <api-users at lists.gbif.org <mailto:api-users at lists.gbif.org>>
>>
>> Dear Juan,
>> Unfortunately we have no tool for creating these kind of SQL like 
>> queries to the portal. I am sure you are aware that the filters in 
>> the occurrence search pages can be applied in combination in numerous 
>> ways. The API can go even further in this regard[1], but it not well 
>> suited for retrieving occurrence records since there is a 200.000 
>> records ceiling making it unfit for species exceeding this number.
>> There is going be updates to the pygbif package[2] in the near future 
>> that will enable you to launch user downloads programmatically where 
>> a whole list of different species can be used as a query parameter as 
>> well as adding polygons.[3]
>> In the meantime, Mauro’s suggestion is excellent. If you can narrow 
>> your search down until it returns a manageable download (say less 
>> than 100 million records), importing this into a database should be 
>> doable. From there, you can refine using SQL queries.
>> Best,
>> Jan K. Legind, GBIF Data manager
>> [1]http://www.gbif.org/developer/occurrence#search
>> [2]https://github.com/sckott/pygbif
>> [3]https://github.com/jlegind/GBIF-downloads
>> *From:*API-users [mailto:api-users-bounces at lists.gbif.org]*On Behalf 
>> Of*Mauro Cavalcanti
>> *Sent:*30. maj 2016 14:06
>> *To:*Juan M. Escamilla Molgora
>> *Cc:*api-users at lists.gbif.org <mailto:api-users at lists.gbif.org>
>> *Subject:*Re: [API-users] Is there any NEO4J or graph-based driver 
>> for this API ?
>>
>> Hi,
>>
>> One solution I have successfully adopted for this is to download the 
>> records (either "manually" via browser or, yet better, using a Python 
>> script using the fine pygbif library), storing them into a MySQL or 
>> SQLite database and then perform the relational queries. I can 
>> provide examples if you are interested.
>>
>> Best regards,
>> 2016-05-30 8:59 GMT-03:00 Juan M. Escamilla Molgora 
>> <j.escamillamolgora at lancaster.ac.uk 
>> <mailto:j.escamillamolgora at lancaster.ac.uk>>:
>> Hola,
>>
>> Is there any API for making relational queries like taxonomy, 
>> location or timestamp?
>>
>> Thank you and best wishes
>>
>> Juan
>> _______________________________________________
>> API-users mailing list
>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org>
>> http://lists.gbif.org/mailman/listinfo/api-users
>>
>>
>>
>> --
>> Dr. Mauro J. Cavalcanti
>> E-mail:maurobio at gmail.com <mailto:maurobio at gmail.com>
>> Web:http://sites.google.com/site/maurobio
>> _______________________________________________
>> API-users mailing list
>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org>
>> http://lists.gbif.org/mailman/listinfo/api-users
>
>
>
> _______________________________________________
> API-users mailing list
> API-users at lists.gbif.org
> http://lists.gbif.org/mailman/listinfo/api-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gbif.org/pipermail/api-users/attachments/20160531/c62c722f/attachment-0001.html>


More information about the API-users mailing list