[API-users] Is there any NEO4J or graph-based driver for this API ?

Wed Jun 1 16:06:37 CEST 2016

Hi Nils,

Thank you for sharing!

How is phoenix about? Does it connects to the ESGF network? It's the 
first time I read about this. Looks very very interesting!

Thanks everybody for these valuable feedback.

Best wishes

Juan

On 01/06/16 10:09, Nils Hempelmann wrote:
> Hi Juan et al
>
> Thanks a lot for triggering this discussion.
> I am currently working on a Web processing service 
> (http://birdhouse.readthedocs.io/en/latest/) including a species 
> distribution model based on the GBIF data (and climate model data). A 
> good connection to GBIF database is still missing and all hints were 
> quite useful!!
>
> If you want to share code:
> https://github.com/bird-house/flyingpigeon/blob/master/flyingpigeon/processes/wps_sdm.py 
>
>
> Merci
> Nils
>
> On 31/05/2016 22:08, Juan M. Escamilla Molgora wrote:
>>
>> Hi Tim,
>>
>> Thank you! specially for the DwC-A hint.
>>
>> The cells are by default in decimal degrees, (wgs84 ) but the 
>> functions for generating them are general enough to use any 
>> projection supported by gdal using postgis. It could be done "on the 
>> fly" or stored on the server side,
>>
>> I was thinking (day dreaming) in a standard way for coding unique but 
>> universal grids (similar to geohash or open location code), but 
>> didn't find something fast and ready. Maybe later :)
>>
>> I only use Open Source Software, Python, Django, GDAL, Numpy, 
>> Postgis, Conda, Py2Neo, ete2 among others.
>>
>> Currently I don't have an official release and the project is quite 
>> inmature, unstable as well as the installation could be non trivial.  
>> I'm fixing all these issues but will take some time,sorry for this.
>>
>> The github repository is:
>>
>> https://github.com/molgor/biospytial.git
>>
>> An there's a very old documentation here:
>>
>> http://test.holobio.me/modules/gbif_taxonomy_class.html
>>
>> Please feel free to follow!
>>
>>
>> Best wishes
>>
>>
>> Juan
>>
>> P.s. The functions for generating the grid are in: 
>> biospytial/SQL_functions
>>
>>
>>
>>
>>
>> On 31/05/16 19:47, Tim Robertson wrote:
>>> Thanks Juan
>>>
>>> You're quite right - you need the DwC-A download format to get those 
>>> IDs.
>>>
>>> Are the cells decimal degrees, and then partitioned into smaller 
>>> units, or equal area cells or maybe UTM grids or something else 
>>> perhaps? I am just curious.
>>>
>>> Are you developing this as OSS? I'd like to follow progress if possible?
>>>
>>> Thanks,
>>> Tim,
>>>
>>> On 31 May 2016, at 20:31, Juan M. Escamilla Molgora 
>>> <j.escamillamolgora at lancaster.ac.uk> wrote:
>>>
>>>> Hi Tim,
>>>>
>>>> The grid is made by selecting a square area and divide it in nxn 
>>>> subsquares which form a partition on the bigger square.
>>>>
>>>> Each grid is a table in postgis and there's a mapping between this 
>>>> table to a django model (class).
>>>>
>>>> The class constructor have attributes: id, cell and neighbours 
>>>> (next release).
>>>>
>>>> The cell is a polygon (square) and with geodjango inherits the 
>>>> properties of the osgeo module for polygons.
>>>>
>>>> I've tried to use the CSV data (downloaded as a CSV request ) but I 
>>>> couldn't find a way to obtain the global id's for each taxonomic 
>>>> level (idspecies, idgenus, idfamily, etc).
>>>>
>>>> Do you know a way for obtaining these fields?
>>>>
>>>>
>>>> Thank you for your email and best wishes,
>>>>
>>>>
>>>> Juan
>>>>
>>>>
>>>> On 31/05/16 19:03, Tim Robertson wrote:
>>>>> Hi Juan
>>>>>
>>>>> That sounds like a fun project!
>>>>>
>>>>> Can you please describe your grid / cells?
>>>>>
>>>>> Most likely your best bet will be to use the download API (as CSV 
>>>>> data) and ingest that. The other APIs will likely hit limits (e.g. 
>>>>> You can't page through indefinitely).
>>>>>
>>>>> Thanks,
>>>>> Tim
>>>>>
>>>>> On 31 May 2016, at 18:55, Juan M. Escamilla Molgora 
>>>>> <j.escamillamolgora at lancaster.ac.uk> wrote:
>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>>
>>>>>> Thank you very much for your valuable feedback!
>>>>>>
>>>>>>
>>>>>> I'll explain a bit what I'm doing just to clarify, sorry if this 
>>>>>> spam to some.
>>>>>>
>>>>>>
>>>>>> I want to build a model for species assemblages based on 
>>>>>> co-occurrence of taxa within an arbitrary area. I'm building a 2D 
>>>>>> lattice in which for each cell I'm collapsing the data into a 
>>>>>> taxonomic tree (the occurrences). For doing this I need first to 
>>>>>> obtain the data from the gbif api and later, based on the ids (or 
>>>>>> names) of each taxonomic level (from kingdom to occurrence) build 
>>>>>> a tree coupled to each cell.
>>>>>>
>>>>>>
>>>>>> The implementation is done with postgresql (postgis) for storing 
>>>>>> the raw gbif data and neo4j for storing the relation
>>>>>>
>>>>>> "Being a member of the [ specie, genus, family,,,] [name/id]" The 
>>>>>> idea is to include data from different sources similar to the 
>>>>>> project Matthew and Jennifer had mentioned (which I'm very 
>>>>>> interested and like to hear more) and traverse the network 
>>>>>> looking for significant merged information.
>>>>>>
>>>>>>
>>>>>> One of the immediate problems I've found is to import big chunks 
>>>>>> of the gbif data into my specification. Thanks to this thread 
>>>>>> I've found the tools that are the most used by the community 
>>>>>> (pygbif,rgbif, and python-dwca-reader). I was using urlib2 and 
>>>>>> things like that.
>>>>>>
>>>>>> I'll be happy to share any code or ideas with the people interested.
>>>>>>
>>>>>>
>>>>>> Btw, I've checked the tinkerpop project which uses the Gremlin 
>>>>>> traversal language as independent from the DBMS.
>>>>>>
>>>>>> Perhaps it's possible to use it with spark and Guoda as well?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Does GOuda is working now?
>>>>>>
>>>>>>
>>>>>> Best wishes
>>>>>>
>>>>>>
>>>>>> Juan.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 31/05/16 17:02, Collins, Matthew wrote:
>>>>>>>
>>>>>>> Jorrit pointed out this thread to us at iDigBio. Downloading and 
>>>>>>> importing data into a relational database will work great, 
>>>>>>> especially if as Jan said you can cut the data size down to a 
>>>>>>> reasonable amount.
>>>>>>>
>>>>>>>
>>>>>>> Another approach we've been working on in a collaboration called 
>>>>>>> GUODA [1] is to build an Apache Spark environment with 
>>>>>>> pre-formatted data frames with common data sets in them for 
>>>>>>> researchers to use. This approach would offer a remote service 
>>>>>>> where you could write arbitrary Spark code, probably in Jupyter 
>>>>>>> notebooks, to iterate over data. Spark does a lot of cool stuff 
>>>>>>> including GraphX which might be of interest. This is definitely 
>>>>>>> pre-alpha at this point and if anyone is interested, I'd like to 
>>>>>>> hear your thoughts. I'll also be at SPNHC talking about this.
>>>>>>>
>>>>>>>
>>>>>>> One thing we've found in working on this is that importing data 
>>>>>>> into a structured data format isn't always easy. If you only 
>>>>>>> want a few columns, it'll be fine. But getting the data typing, 
>>>>>>> format standardization, and column name syntax of the whole 
>>>>>>> width of an iDigBio record right requires some code. I looked to 
>>>>>>> see if EcoData Retriever [2] had a GBIF data source and they 
>>>>>>> have an eBird one that perhaps you might find useful as a 
>>>>>>> starting point if you wanted to try to use someone else's code 
>>>>>>> to download and import data.
>>>>>>>
>>>>>>>
>>>>>>> For other data structures like BHL, we're kind of making stuff 
>>>>>>> up since we're packaging a relational structure and not 
>>>>>>> something nearly as flat as GBIF and DWC stuff.
>>>>>>>
>>>>>>>
>>>>>>> [1] http://guoda.bio/
>>>>>>>
>>>>>>> [2] http://www.ecodataretriever.org/
>>>>>>>
>>>>>>>
>>>>>>> Matthew Collins
>>>>>>> Technical Operations Manager
>>>>>>> Advanced Computing and Information Systems Lab, ECE
>>>>>>> University of Florida
>>>>>>> 352-392-5414 <callto:352-392-5414>
>>>>>>> ------------------------------------------------------------------------
>>>>>>> *From:* jorrit poelen <jhpoelen at xs4all.nl>
>>>>>>> *Sent:* Monday, May 30, 2016 11:16 AM
>>>>>>> *To:* Collins, Matthew; Thompson, Alexander M; Hammock, Jennifer
>>>>>>> *Subject:* Fwd: [API-users] Is there any NEO4J or graph-based 
>>>>>>> driver for this API ?
>>>>>>> Hey y’all:
>>>>>>>
>>>>>>> Interesting request below on the GBIF mailing list - sounds like 
>>>>>>> a perfect fit for the GUODA use cases.
>>>>>>>
>>>>>>> Would it be too early to jump onto this thread and share our 
>>>>>>> efforts/vision?
>>>>>>>
>>>>>>> thx,
>>>>>>> -jorrit
>>>>>>>
>>>>>>>> Begin forwarded message:
>>>>>>>>
>>>>>>>> *From: *Jan Legind <jlegind at gbif.org>
>>>>>>>> *Subject: **Re: [API-users] Is there any NEO4J or graph-based 
>>>>>>>> driver for this API ?*
>>>>>>>> *Date: *May 30, 2016 at 5:48:51 AM PDT
>>>>>>>> *To: *Mauro Cavalcanti <maurobio at gmail.com>, "Juan M. Escamilla 
>>>>>>>> Molgora" <j.escamillamolgora at lancaster.ac.uk>
>>>>>>>> *Cc: *"api-users at lists.gbif.org 
>>>>>>>> <mailto:api-users at lists.gbif.org>" <api-users at lists.gbif.org>
>>>>>>>>
>>>>>>>> Dear Juan,
>>>>>>>> Unfortunately we have no tool for creating these kind of SQL 
>>>>>>>> like queries to the portal. I am sure you are aware that the 
>>>>>>>> filters in the occurrence search pages can be applied in 
>>>>>>>> combination in numerous ways. The API can go even further in 
>>>>>>>> this regard[1], but it not well suited for retrieving 
>>>>>>>> occurrence records since there is a 200.000 records ceiling 
>>>>>>>> making it unfit for species exceeding this number.
>>>>>>>> There is going be updates to the pygbif package[2] in the near 
>>>>>>>> future that will enable you to launch user downloads 
>>>>>>>> programmatically where a whole list of different species can be 
>>>>>>>> used as a query parameter as well as adding polygons.[3]
>>>>>>>> In the meantime, Mauro’s suggestion is excellent. If you can 
>>>>>>>> narrow your search down until it returns a manageable download 
>>>>>>>> (say less than 100 million records), importing this into a 
>>>>>>>> database should be doable. From there, you can refine using SQL 
>>>>>>>> queries.
>>>>>>>> Best,
>>>>>>>> Jan K. Legind, GBIF Data manager
>>>>>>>> [1]http://www.gbif.org/developer/occurrence#search
>>>>>>>> [2]https://github.com/sckott/pygbif
>>>>>>>> [3]https://github.com/jlegind/GBIF-downloads
>>>>>>>> *From:*API-users [mailto:api-users-bounces at lists.gbif.org]*On 
>>>>>>>> Behalf Of*Mauro Cavalcanti
>>>>>>>> *Sent:*30. maj 2016 14:06
>>>>>>>> *To:*Juan M. Escamilla Molgora
>>>>>>>> *Cc:*api-users at lists.gbif.org
>>>>>>>> *Subject:*Re: [API-users] Is there any NEO4J or graph-based 
>>>>>>>> driver for this API ?
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> One solution I have successfully adopted for this is to 
>>>>>>>> download the records (either "manually" via browser or, yet 
>>>>>>>> better, using a Python script using the fine pygbif library), 
>>>>>>>> storing them into a MySQL or SQLite database and then perform 
>>>>>>>> the relational queries. I can provide examples if you are 
>>>>>>>> interested.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> 2016-05-30 8:59 GMT-03:00 Juan M. Escamilla Molgora 
>>>>>>>> <j.escamillamolgora at lancaster.ac.uk>:
>>>>>>>> Hola,
>>>>>>>>
>>>>>>>> Is there any API for making relational queries like taxonomy, 
>>>>>>>> location or timestamp?
>>>>>>>>
>>>>>>>> Thank you and best wishes
>>>>>>>>
>>>>>>>> Juan
>>>>>>>> _______________________________________________
>>>>>>>> API-users mailing list
>>>>>>>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org>
>>>>>>>> http://lists.gbif.org/mailman/listinfo/api-users
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Dr. Mauro J. Cavalcanti
>>>>>>>> E-mail:maurobio at gmail.com
>>>>>>>> Web:http://sites.google.com/site/maurobio
>>>>>>>> _______________________________________________
>>>>>>>> API-users mailing list
>>>>>>>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org>
>>>>>>>> http://lists.gbif.org/mailman/listinfo/api-users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> API-users mailing list
>>>>>>> API-users at lists.gbif.org
>>>>>>> http://lists.gbif.org/mailman/listinfo/api-users
>>>>>>
>>>>>> _______________________________________________
>>>>>> API-users mailing list
>>>>>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org>
>>>>>> http://lists.gbif.org/mailman/listinfo/api-users
>>>>
>>
>>
>>
>> _______________________________________________
>> API-users mailing list
>> API-users at lists.gbif.org
>> http://lists.gbif.org/mailman/listinfo/api-users
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gbif.org/pipermail/api-users/attachments/20160601/d0b29f05/attachment-0001.html>