[API-users] Is there any NEO4J or graph-based driver for this API ?

Wed Jun 1 11:09:54 CEST 2016

Hi Juan et al

Thanks a lot for triggering this discussion.
I am currently working on a Web processing service 
(http://birdhouse.readthedocs.io/en/latest/) including a species 
distribution model based on the GBIF data (and climate model data). A 
good connection to GBIF database is still missing and all hints were 
quite useful!!

If you want to share code:
https://github.com/bird-house/flyingpigeon/blob/master/flyingpigeon/processes/wps_sdm.py 

Merci
Nils

On 31/05/2016 22:08, Juan M. Escamilla Molgora wrote:
>
> Hi Tim,
>
> Thank you! specially for the DwC-A hint.
>
> The cells are by default in decimal degrees, (wgs84 ) but the 
> functions for generating them are general enough to use any projection 
> supported by gdal using postgis. It could be done "on the fly" or 
> stored on the server side,
>
> I was thinking (day dreaming) in a standard way for coding unique but 
> universal grids (similar to geohash or open location code), but didn't 
> find something fast and ready. Maybe later :)
>
> I only use Open Source Software, Python, Django, GDAL, Numpy, Postgis, 
> Conda, Py2Neo, ete2 among others.
>
> Currently I don't have an official release and the project is quite 
> inmature, unstable as well as the installation could be non trivial.  
> I'm fixing all these issues but will take some time,sorry for this.
>
> The github repository is:
>
> https://github.com/molgor/biospytial.git
>
> An there's a very old documentation here:
>
> http://test.holobio.me/modules/gbif_taxonomy_class.html
>
> Please feel free to follow!
>
>
> Best wishes
>
>
> Juan
>
> P.s. The functions for generating the grid are in: 
> biospytial/SQL_functions
>
>
>
>
>
> On 31/05/16 19:47, Tim Robertson wrote:
>> Thanks Juan
>>
>> You're quite right - you need the DwC-A download format to get those 
>> IDs.
>>
>> Are the cells decimal degrees, and then partitioned into smaller 
>> units, or equal area cells or maybe UTM grids or something else 
>> perhaps? I am just curious.
>>
>> Are you developing this as OSS? I'd like to follow progress if possible?
>>
>> Thanks,
>> Tim,
>>
>> On 31 May 2016, at 20:31, Juan M. Escamilla Molgora 
>> <j.escamillamolgora at lancaster.ac.uk> wrote:
>>
>>> Hi Tim,
>>>
>>> The grid is made by selecting a square area and divide it in nxn 
>>> subsquares which form a partition on the bigger square.
>>>
>>> Each grid is a table in postgis and there's a mapping between this 
>>> table to a django model (class).
>>>
>>> The class constructor have attributes: id, cell and neighbours (next 
>>> release).
>>>
>>> The cell is a polygon (square) and with geodjango inherits the 
>>> properties of the osgeo module for polygons.
>>>
>>> I've tried to use the CSV data (downloaded as a CSV request ) but I 
>>> couldn't find a way to obtain the global id's for each taxonomic 
>>> level (idspecies, idgenus, idfamily, etc).
>>>
>>> Do you know a way for obtaining these fields?
>>>
>>>
>>> Thank you for your email and best wishes,
>>>
>>>
>>> Juan
>>>
>>>
>>> On 31/05/16 19:03, Tim Robertson wrote:
>>>> Hi Juan
>>>>
>>>> That sounds like a fun project!
>>>>
>>>> Can you please describe your grid / cells?
>>>>
>>>> Most likely your best bet will be to use the download API (as CSV 
>>>> data) and ingest that. The other APIs will likely hit limits (e.g. 
>>>> You can't page through indefinitely).
>>>>
>>>> Thanks,
>>>> Tim
>>>>
>>>> On 31 May 2016, at 18:55, Juan M. Escamilla Molgora 
>>>> <j.escamillamolgora at lancaster.ac.uk> wrote:
>>>>
>>>>> Dear all,
>>>>>
>>>>>
>>>>> Thank you very much for your valuable feedback!
>>>>>
>>>>>
>>>>> I'll explain a bit what I'm doing just to clarify, sorry if this 
>>>>> spam to some.
>>>>>
>>>>>
>>>>> I want to build a model for species assemblages based on 
>>>>> co-occurrence of taxa within an arbitrary area. I'm building a 2D 
>>>>> lattice in which for each cell I'm collapsing the data into a 
>>>>> taxonomic tree (the occurrences). For doing this I need first to 
>>>>> obtain the data from the gbif api and later, based on the ids (or 
>>>>> names) of each taxonomic level (from kingdom to occurrence) build 
>>>>> a tree coupled to each cell.
>>>>>
>>>>>
>>>>> The implementation is done with postgresql (postgis) for storing 
>>>>> the raw gbif data and neo4j for storing the relation
>>>>>
>>>>> "Being a member of the [ specie, genus, family,,,] [name/id]" The 
>>>>> idea is to include data from different sources similar to the 
>>>>> project Matthew and Jennifer had mentioned (which I'm very 
>>>>> interested and like to hear more) and traverse the network looking 
>>>>> for significant merged information.
>>>>>
>>>>>
>>>>> One of the immediate problems I've found is to import big chunks 
>>>>> of the gbif data into my specification. Thanks to this thread I've 
>>>>> found the tools that are the most used by the community 
>>>>> (pygbif,rgbif, and python-dwca-reader). I was using urlib2 and 
>>>>> things like that.
>>>>>
>>>>> I'll be happy to share any code or ideas with the people interested.
>>>>>
>>>>>
>>>>> Btw, I've checked the tinkerpop project which uses the Gremlin 
>>>>> traversal language as independent from the DBMS.
>>>>>
>>>>> Perhaps it's possible to use it with spark and Guoda as well?
>>>>>
>>>>>
>>>>>
>>>>> Does GOuda is working now?
>>>>>
>>>>>
>>>>> Best wishes
>>>>>
>>>>>
>>>>> Juan.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 31/05/16 17:02, Collins, Matthew wrote:
>>>>>>
>>>>>> Jorrit pointed out this thread to us at iDigBio. Downloading and 
>>>>>> importing data into a relational database will work great, 
>>>>>> especially if as Jan said you can cut the data size down to a 
>>>>>> reasonable amount.
>>>>>>
>>>>>>
>>>>>> Another approach we've been working on in a collaboration called 
>>>>>> GUODA [1] is to build an Apache Spark environment with 
>>>>>> pre-formatted data frames with common data sets in them for 
>>>>>> researchers to use. This approach would offer a remote service 
>>>>>> where you could write arbitrary Spark code, probably in Jupyter 
>>>>>> notebooks, to iterate over data. Spark does a lot of cool stuff 
>>>>>> including GraphX which might be of interest. This is definitely 
>>>>>> pre-alpha at this point and if anyone is interested, I'd like to 
>>>>>> hear your thoughts. I'll also be at SPNHC talking about this.
>>>>>>
>>>>>>
>>>>>> One thing we've found in working on this is that importing data 
>>>>>> into a structured data format isn't always easy. If you only want 
>>>>>> a few columns, it'll be fine. But getting the data typing, format 
>>>>>> standardization, and column name syntax of the whole width of an 
>>>>>> iDigBio record right requires some code. I looked to see if 
>>>>>> EcoData Retriever [2] had a GBIF data source and they have an 
>>>>>> eBird one that perhaps you might find useful as a starting point 
>>>>>> if you wanted to try to use someone else's code to download and 
>>>>>> import data.
>>>>>>
>>>>>>
>>>>>> For other data structures like BHL, we're kind of making stuff up 
>>>>>> since we're packaging a relational structure and not something 
>>>>>> nearly as flat as GBIF and DWC stuff.
>>>>>>
>>>>>>
>>>>>> [1] http://guoda.bio/
>>>>>>
>>>>>> [2] http://www.ecodataretriever.org/
>>>>>>
>>>>>>
>>>>>> Matthew Collins
>>>>>> Technical Operations Manager
>>>>>> Advanced Computing and Information Systems Lab, ECE
>>>>>> University of Florida
>>>>>> 352-392-5414 <callto:352-392-5414>
>>>>>> ------------------------------------------------------------------------
>>>>>> *From:* jorrit poelen <jhpoelen at xs4all.nl>
>>>>>> *Sent:* Monday, May 30, 2016 11:16 AM
>>>>>> *To:* Collins, Matthew; Thompson, Alexander M; Hammock, Jennifer
>>>>>> *Subject:* Fwd: [API-users] Is there any NEO4J or graph-based 
>>>>>> driver for this API ?
>>>>>> Hey y’all:
>>>>>>
>>>>>> Interesting request below on the GBIF mailing list - sounds like 
>>>>>> a perfect fit for the GUODA use cases.
>>>>>>
>>>>>> Would it be too early to jump onto this thread and share our 
>>>>>> efforts/vision?
>>>>>>
>>>>>> thx,
>>>>>> -jorrit
>>>>>>
>>>>>>> Begin forwarded message:
>>>>>>>
>>>>>>> *From: *Jan Legind <jlegind at gbif.org>
>>>>>>> *Subject: **Re: [API-users] Is there any NEO4J or graph-based 
>>>>>>> driver for this API ?*
>>>>>>> *Date: *May 30, 2016 at 5:48:51 AM PDT
>>>>>>> *To: *Mauro Cavalcanti <maurobio at gmail.com>, "Juan M. Escamilla 
>>>>>>> Molgora" <j.escamillamolgora at lancaster.ac.uk>
>>>>>>> *Cc: *"api-users at lists.gbif.org 
>>>>>>> <mailto:api-users at lists.gbif.org>" <api-users at lists.gbif.org>
>>>>>>>
>>>>>>> Dear Juan,
>>>>>>> Unfortunately we have no tool for creating these kind of SQL 
>>>>>>> like queries to the portal. I am sure you are aware that the 
>>>>>>> filters in the occurrence search pages can be applied in 
>>>>>>> combination in numerous ways. The API can go even further in 
>>>>>>> this regard[1], but it not well suited for retrieving occurrence 
>>>>>>> records since there is a 200.000 records ceiling making it unfit 
>>>>>>> for species exceeding this number.
>>>>>>> There is going be updates to the pygbif package[2] in the near 
>>>>>>> future that will enable you to launch user downloads 
>>>>>>> programmatically where a whole list of different species can be 
>>>>>>> used as a query parameter as well as adding polygons.[3]
>>>>>>> In the meantime, Mauro’s suggestion is excellent. If you can 
>>>>>>> narrow your search down until it returns a manageable download 
>>>>>>> (say less than 100 million records), importing this into a 
>>>>>>> database should be doable. From there, you can refine using SQL 
>>>>>>> queries.
>>>>>>> Best,
>>>>>>> Jan K. Legind, GBIF Data manager
>>>>>>> [1]http://www.gbif.org/developer/occurrence#search
>>>>>>> [2]https://github.com/sckott/pygbif
>>>>>>> [3]https://github.com/jlegind/GBIF-downloads
>>>>>>> *From:*API-users [mailto:api-users-bounces at lists.gbif.org]*On 
>>>>>>> Behalf Of*Mauro Cavalcanti
>>>>>>> *Sent:*30. maj 2016 14:06
>>>>>>> *To:*Juan M. Escamilla Molgora
>>>>>>> *Cc:*api-users at lists.gbif.org
>>>>>>> *Subject:*Re: [API-users] Is there any NEO4J or graph-based 
>>>>>>> driver for this API ?
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> One solution I have successfully adopted for this is to download 
>>>>>>> the records (either "manually" via browser or, yet better, using 
>>>>>>> a Python script using the fine pygbif library), storing them 
>>>>>>> into a MySQL or SQLite database and then perform the relational 
>>>>>>> queries. I can provide examples if you are interested.
>>>>>>>
>>>>>>> Best regards,
>>>>>>> 2016-05-30 8:59 GMT-03:00 Juan M. Escamilla Molgora 
>>>>>>> <j.escamillamolgora at lancaster.ac.uk>:
>>>>>>> Hola,
>>>>>>>
>>>>>>> Is there any API for making relational queries like taxonomy, 
>>>>>>> location or timestamp?
>>>>>>>
>>>>>>> Thank you and best wishes
>>>>>>>
>>>>>>> Juan
>>>>>>> _______________________________________________
>>>>>>> API-users mailing list
>>>>>>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org>
>>>>>>> http://lists.gbif.org/mailman/listinfo/api-users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Dr. Mauro J. Cavalcanti
>>>>>>> E-mail:maurobio at gmail.com
>>>>>>> Web:http://sites.google.com/site/maurobio
>>>>>>> _______________________________________________
>>>>>>> API-users mailing list
>>>>>>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org>
>>>>>>> http://lists.gbif.org/mailman/listinfo/api-users
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> API-users mailing list
>>>>>> API-users at lists.gbif.org
>>>>>> http://lists.gbif.org/mailman/listinfo/api-users
>>>>>
>>>>> _______________________________________________
>>>>> API-users mailing list
>>>>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org>
>>>>> http://lists.gbif.org/mailman/listinfo/api-users
>>>
>
>
>
> _______________________________________________
> API-users mailing list
> API-users at lists.gbif.org
> http://lists.gbif.org/mailman/listinfo/api-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gbif.org/pipermail/api-users/attachments/20160601/66fbfe78/attachment-0001.html>