[API-users] Is there any NEO4J or graph-based driver for this API ?

Juan M. Escamilla Molgora j.escamillamolgora at lancaster.ac.uk
Tue May 31 20:31:25 CEST 2016


Hi Tim,

The grid is made by selecting a square area and divide it in nxn 
subsquares which form a partition on the bigger square.

Each grid is a table in postgis and there's a mapping between this table 
to a django model (class).

The class constructor have attributes: id, cell and neighbours (next 
release).

The cell is a polygon (square) and with geodjango inherits the 
properties of the osgeo module for polygons.

I've tried to use the CSV data (downloaded as a CSV request ) but I 
couldn't find a way to obtain the global id's for each taxonomic level 
(idspecies, idgenus, idfamily, etc).

Do you know a way for obtaining these fields?


Thank you for your email and best wishes,


Juan


On 31/05/16 19:03, Tim Robertson wrote:
> Hi Juan
>
> That sounds like a fun project!
>
> Can you please describe your grid / cells?
>
> Most likely your best bet will be to use the download API (as CSV 
> data) and ingest that. The other APIs will likely hit limits (e.g. You 
> can't page through indefinitely).
>
> Thanks,
> Tim
>
> On 31 May 2016, at 18:55, Juan M. Escamilla Molgora 
> <j.escamillamolgora at lancaster.ac.uk 
> <mailto:j.escamillamolgora at lancaster.ac.uk>> wrote:
>
>> Dear all,
>>
>>
>> Thank you very much for your valuable feedback!
>>
>>
>> I'll explain a bit what I'm doing just to clarify, sorry if this spam 
>> to some.
>>
>>
>> I want to build a model for species assemblages based on 
>> co-occurrence of taxa within an arbitrary area. I'm building a 2D 
>> lattice in which for each cell I'm collapsing the data into a 
>> taxonomic tree (the occurrences). For doing this I need first to 
>> obtain the data from the gbif api and later, based on the ids (or 
>> names) of each taxonomic level (from kingdom to occurrence) build a 
>> tree coupled to each cell.
>>
>>
>> The implementation is done with postgresql (postgis) for storing the 
>> raw gbif data and neo4j for storing the relation
>>
>> "Being a member of the [ specie, genus, family,,,] [name/id]" The 
>> idea is to include data from different sources similar to the project 
>> Matthew and Jennifer had mentioned (which I'm very interested and 
>> like to hear more) and traverse the network looking for significant 
>> merged information.
>>
>>
>> One of the immediate problems I've found is to import big chunks of 
>> the gbif data into my specification. Thanks to this thread I've found 
>> the tools that are the most used by the community (pygbif,rgbif, and 
>> python-dwca-reader). I was using urlib2 and things like that.
>>
>> I'll be happy to share any code or ideas with the people interested.
>>
>>
>> Btw, I've checked the tinkerpop project which uses the Gremlin 
>> traversal language as independent from the DBMS.
>>
>> Perhaps it's possible to use it with spark and Guoda as well?
>>
>>
>>
>> Does GOuda is working now?
>>
>>
>> Best wishes
>>
>>
>> Juan.
>>
>>
>>
>>
>>
>>
>>
>> On 31/05/16 17:02, Collins, Matthew wrote:
>>>
>>> Jorrit pointed out this thread to us at iDigBio. Downloading and 
>>> importing data into a relational database will work great, 
>>> especially if as Jan said you can cut the data size down to a 
>>> reasonable amount.
>>>
>>>
>>> Another approach we've been working on in a collaboration called 
>>> GUODA [1] is to build an Apache Spark environment with pre-formatted 
>>> data frames with common data sets in them for researchers to use. 
>>> This approach would offer a remote service where you could write 
>>> arbitrary Spark code, probably in Jupyter notebooks, to iterate over 
>>> data. Spark does a lot of cool stuff including GraphX which might be 
>>> of interest. This is definitely pre-alpha at this point and if 
>>> anyone is interested, I'd like to hear your thoughts. I'll also be 
>>> at SPNHC talking about this.
>>>
>>>
>>> One thing we've found in working on this is that importing data into 
>>> a structured data format isn't always easy. If you only want a few 
>>> columns, it'll be fine. But getting the data typing, format 
>>> standardization, and column name syntax of the whole width of an 
>>> iDigBio record right requires some code. I looked to see if EcoData 
>>> Retriever [2] had a GBIF data source and they have an eBird one that 
>>> perhaps you might find useful as a starting point if you wanted to 
>>> try to use someone else's code to download and import data.
>>>
>>>
>>> For other data structures like BHL, we're kind of making stuff up 
>>> since we're packaging a relational structure and not something 
>>> nearly as flat as GBIF and DWC stuff.
>>>
>>>
>>> [1] http://guoda.bio/​
>>>
>>> [2] http://www.ecodataretriever.org/
>>>
>>>
>>> Matthew Collins
>>> Technical Operations Manager
>>> Advanced Computing and Information Systems Lab, ECE
>>> University of Florida
>>> 352-392-5414 <callto:352-392-5414>
>>> ------------------------------------------------------------------------
>>> *From:* jorrit poelen <jhpoelen at xs4all.nl>
>>> *Sent:* Monday, May 30, 2016 11:16 AM
>>> *To:* Collins, Matthew; Thompson, Alexander M; Hammock, Jennifer
>>> *Subject:* Fwd: [API-users] Is there any NEO4J or graph-based driver 
>>> for this API ?
>>> Hey y’all:
>>>
>>> Interesting request below on the GBIF mailing list - sounds like a 
>>> perfect fit for the GUODA use cases.
>>>
>>> Would it be too early to jump onto this thread and share our 
>>> efforts/vision?
>>>
>>> thx,
>>> -jorrit
>>>
>>>> Begin forwarded message:
>>>>
>>>> *From: *Jan Legind <jlegind at gbif.org>
>>>> *Subject: **Re: [API-users] Is there any NEO4J or graph-based 
>>>> driver for this API ?*
>>>> *Date: *May 30, 2016 at 5:48:51 AM PDT
>>>> *To: *Mauro Cavalcanti <maurobio at gmail.com>, "Juan M. Escamilla 
>>>> Molgora" <j.escamillamolgora at lancaster.ac.uk>
>>>> *Cc: *"api-users at lists.gbif.org <mailto:api-users at lists.gbif.org>" 
>>>> <api-users at lists.gbif.org <mailto:api-users at lists.gbif.org>>
>>>>
>>>> Dear Juan,
>>>> Unfortunately we have no tool for creating these kind of SQL like 
>>>> queries to the portal. I am sure you are aware that the filters in 
>>>> the occurrence search pages can be applied in combination in 
>>>> numerous ways. The API can go even further in this regard[1], but 
>>>> it not well suited for retrieving occurrence records since there is 
>>>> a 200.000 records ceiling making it unfit for species exceeding 
>>>> this number.
>>>> There is going be updates to the pygbif package[2] in the near 
>>>> future that will enable you to launch user downloads 
>>>> programmatically where a whole list of different species can be 
>>>> used as a query parameter as well as adding polygons.[3]
>>>> In the meantime, Mauro’s suggestion is excellent. If you can narrow 
>>>> your search down until it returns a manageable download (say less 
>>>> than 100 million records), importing this into a database should be 
>>>> doable. From there, you can refine using SQL queries.
>>>> Best,
>>>> Jan K. Legind, GBIF Data manager
>>>> [1]http://www.gbif.org/developer/occurrence#search
>>>> [2]https://github.com/sckott/pygbif
>>>> [3]https://github.com/jlegind/GBIF-downloads
>>>> *From:*API-users [mailto:api-users-bounces at lists.gbif.org]*On 
>>>> Behalf Of*Mauro Cavalcanti
>>>> *Sent:*30. maj 2016 14:06
>>>> *To:*Juan M. Escamilla Molgora
>>>> *Cc:*api-users at lists.gbif.org
>>>> *Subject:*Re: [API-users] Is there any NEO4J or graph-based driver 
>>>> for this API ?
>>>>
>>>> Hi,
>>>>
>>>> One solution I have successfully adopted for this is to download 
>>>> the records (either "manually" via browser or, yet better, using a 
>>>> Python script using the fine pygbif library), storing them into a 
>>>> MySQL or SQLite database and then perform the relational queries. I 
>>>> can provide examples if you are interested.
>>>>
>>>> Best regards,
>>>> 2016-05-30 8:59 GMT-03:00 Juan M. Escamilla Molgora 
>>>> <j.escamillamolgora at lancaster.ac.uk 
>>>> <mailto:j.escamillamolgora at lancaster.ac.uk>>:
>>>> Hola,
>>>>
>>>> Is there any API for making relational queries like taxonomy, 
>>>> location or timestamp?
>>>>
>>>> Thank you and best wishes
>>>>
>>>> Juan
>>>> _______________________________________________
>>>> API-users mailing list
>>>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org>
>>>> http://lists.gbif.org/mailman/listinfo/api-users
>>>>
>>>>
>>>>
>>>> --
>>>> Dr. Mauro J. Cavalcanti
>>>> E-mail:maurobio at gmail.com
>>>> Web:http://sites.google.com/site/maurobio
>>>> _______________________________________________
>>>> API-users mailing list
>>>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org>
>>>> http://lists.gbif.org/mailman/listinfo/api-users
>>>
>>>
>>>
>>> _______________________________________________
>>> API-users mailing list
>>> API-users at lists.gbif.org
>>> http://lists.gbif.org/mailman/listinfo/api-users
>>
>> _______________________________________________
>> API-users mailing list
>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org>
>> http://lists.gbif.org/mailman/listinfo/api-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gbif.org/pipermail/api-users/attachments/20160531/bf064731/attachment-0001.html>


More information about the API-users mailing list