[API-users] Is there any NEO4J or graph-based driver for this API ?

Wed Jun 1 16:48:33 CEST 2016

Hi Juan et al

Yes phoenix has a search inter-phase to ESGF data ( but you can use 
other climate data archives as well ).

Here are some preliminary screen shots:
http://flyingpigeon.readthedocs.io/en/latest/tutorials/sdm.html

Best
Nils

On 01/06/2016 16:06, Juan M. Escamilla Molgora wrote:
>
> Hi Nils,
>
> Thank you for sharing!
>
> How is phoenix about? Does it connects to the ESGF network? It's the 
> first time I read about this. Looks very very interesting!
>
>
> Thanks everybody for these valuable feedback.
>
> Best wishes
>
>
> Juan
>
>
>
> On 01/06/16 10:09, Nils Hempelmann wrote:
>> Hi Juan et al
>>
>> Thanks a lot for triggering this discussion.
>> I am currently working on a Web processing service 
>> (http://birdhouse.readthedocs.io/en/latest/) including a species 
>> distribution model based on the GBIF data (and climate model data). A 
>> good connection to GBIF database is still missing and all hints were 
>> quite useful!!
>>
>> If you want to share code:
>> https://github.com/bird-house/flyingpigeon/blob/master/flyingpigeon/processes/wps_sdm.py 
>>
>>
>> Merci
>> Nils
>>
>> On 31/05/2016 22:08, Juan M. Escamilla Molgora wrote:
>>>
>>> Hi Tim,
>>>
>>> Thank you! specially for the DwC-A hint.
>>>
>>> The cells are by default in decimal degrees, (wgs84 ) but the 
>>> functions for generating them are general enough to use any 
>>> projection supported by gdal using postgis. It could be done "on the 
>>> fly" or stored on the server side,
>>>
>>> I was thinking (day dreaming) in a standard way for coding unique 
>>> but universal grids (similar to geohash or open location code), but 
>>> didn't find something fast and ready. Maybe later :)
>>>
>>> I only use Open Source Software, Python, Django, GDAL, Numpy, 
>>> Postgis, Conda, Py2Neo, ete2 among others.
>>>
>>> Currently I don't have an official release and the project is quite 
>>> inmature, unstable as well as the installation could be non 
>>> trivial.  I'm fixing all these issues but will take some time,sorry 
>>> for this.
>>>
>>> The github repository is:
>>>
>>> https://github.com/molgor/biospytial.git
>>>
>>> An there's a very old documentation here:
>>>
>>> http://test.holobio.me/modules/gbif_taxonomy_class.html
>>>
>>> Please feel free to follow!
>>>
>>>
>>> Best wishes
>>>
>>>
>>> Juan
>>>
>>> P.s. The functions for generating the grid are in: 
>>> biospytial/SQL_functions
>>>
>>>
>>>
>>>
>>>
>>> On 31/05/16 19:47, Tim Robertson wrote:
>>>> Thanks Juan
>>>>
>>>> You're quite right - you need the DwC-A download format to get 
>>>> those IDs.
>>>>
>>>> Are the cells decimal degrees, and then partitioned into smaller 
>>>> units, or equal area cells or maybe UTM grids or something else 
>>>> perhaps? I am just curious.
>>>>
>>>> Are you developing this as OSS? I'd like to follow progress if 
>>>> possible?
>>>>
>>>> Thanks,
>>>> Tim,
>>>>
>>>> On 31 May 2016, at 20:31, Juan M. Escamilla Molgora 
>>>> <j.escamillamolgora at lancaster.ac.uk> wrote:
>>>>
>>>>> Hi Tim,
>>>>>
>>>>> The grid is made by selecting a square area and divide it in nxn 
>>>>> subsquares which form a partition on the bigger square.
>>>>>
>>>>> Each grid is a table in postgis and there's a mapping between this 
>>>>> table to a django model (class).
>>>>>
>>>>> The class constructor have attributes: id, cell and neighbours 
>>>>> (next release).
>>>>>
>>>>> The cell is a polygon (square) and with geodjango inherits the 
>>>>> properties of the osgeo module for polygons.
>>>>>
>>>>> I've tried to use the CSV data (downloaded as a CSV request ) but 
>>>>> I couldn't find a way to obtain the global id's for each taxonomic 
>>>>> level (idspecies, idgenus, idfamily, etc).
>>>>>
>>>>> Do you know a way for obtaining these fields?
>>>>>
>>>>>
>>>>> Thank you for your email and best wishes,
>>>>>
>>>>>
>>>>> Juan
>>>>>
>>>>>
>>>>> On 31/05/16 19:03, Tim Robertson wrote:
>>>>>> Hi Juan
>>>>>>
>>>>>> That sounds like a fun project!
>>>>>>
>>>>>> Can you please describe your grid / cells?
>>>>>>
>>>>>> Most likely your best bet will be to use the download API (as CSV 
>>>>>> data) and ingest that. The other APIs will likely hit limits 
>>>>>> (e.g. You can't page through indefinitely).
>>>>>>
>>>>>> Thanks,
>>>>>> Tim
>>>>>>
>>>>>> On 31 May 2016, at 18:55, Juan M. Escamilla Molgora 
>>>>>> <j.escamillamolgora at lancaster.ac.uk> wrote:
>>>>>>
>>>>>>> Dear all,
>>>>>>>
>>>>>>>
>>>>>>> Thank you very much for your valuable feedback!
>>>>>>>
>>>>>>>
>>>>>>> I'll explain a bit what I'm doing just to clarify, sorry if this 
>>>>>>> spam to some.
>>>>>>>
>>>>>>>
>>>>>>> I want to build a model for species assemblages based on 
>>>>>>> co-occurrence of taxa within an arbitrary area. I'm building a 
>>>>>>> 2D lattice in which for each cell I'm collapsing the data into a 
>>>>>>> taxonomic tree (the occurrences). For doing this I need first to 
>>>>>>> obtain the data from the gbif api and later, based on the ids 
>>>>>>> (or names) of each taxonomic level (from kingdom to occurrence) 
>>>>>>> build a tree coupled to each cell.
>>>>>>>
>>>>>>>
>>>>>>> The implementation is done with postgresql (postgis) for storing 
>>>>>>> the raw gbif data and neo4j for storing the relation
>>>>>>>
>>>>>>> "Being a member of the [ specie, genus, family,,,] [name/id]" 
>>>>>>> The idea is to include data from different sources similar to 
>>>>>>> the project Matthew and Jennifer had mentioned (which I'm very 
>>>>>>> interested and like to hear more) and traverse the network 
>>>>>>> looking for significant merged information.
>>>>>>>
>>>>>>>
>>>>>>> One of the immediate problems I've found is to import big chunks 
>>>>>>> of the gbif data into my specification. Thanks to this thread 
>>>>>>> I've found the tools that are the most used by the community 
>>>>>>> (pygbif,rgbif, and python-dwca-reader). I was using urlib2 and 
>>>>>>> things like that.
>>>>>>>
>>>>>>> I'll be happy to share any code or ideas with the people interested.
>>>>>>>
>>>>>>>
>>>>>>> Btw, I've checked the tinkerpop project which uses the Gremlin 
>>>>>>> traversal language as independent from the DBMS.
>>>>>>>
>>>>>>> Perhaps it's possible to use it with spark and Guoda as well?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Does GOuda is working now?
>>>>>>>
>>>>>>>
>>>>>>> Best wishes
>>>>>>>
>>>>>>>
>>>>>>> Juan.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 31/05/16 17:02, Collins, Matthew wrote:
>>>>>>>>
>>>>>>>> Jorrit pointed out this thread to us at iDigBio. Downloading 
>>>>>>>> and importing data into a relational database will work great, 
>>>>>>>> especially if as Jan said you can cut the data size down to a 
>>>>>>>> reasonable amount.
>>>>>>>>
>>>>>>>>
>>>>>>>> Another approach we've been working on in a collaboration 
>>>>>>>> called GUODA [1] is to build an Apache Spark environment with 
>>>>>>>> pre-formatted data frames with common data sets in them for 
>>>>>>>> researchers to use. This approach would offer a remote service 
>>>>>>>> where you could write arbitrary Spark code, probably in Jupyter 
>>>>>>>> notebooks, to iterate over data. Spark does a lot of cool stuff 
>>>>>>>> including GraphX which might be of interest. This is definitely 
>>>>>>>> pre-alpha at this point and if anyone is interested, I'd like 
>>>>>>>> to hear your thoughts. I'll also be at SPNHC talking about this.
>>>>>>>>
>>>>>>>>
>>>>>>>> One thing we've found in working on this is that importing data 
>>>>>>>> into a structured data format isn't always easy. If you only 
>>>>>>>> want a few columns, it'll be fine. But getting the data typing, 
>>>>>>>> format standardization, and column name syntax of the whole 
>>>>>>>> width of an iDigBio record right requires some code. I looked 
>>>>>>>> to see if EcoData Retriever [2] had a GBIF data source and they 
>>>>>>>> have an eBird one that perhaps you might find useful as a 
>>>>>>>> starting point if you wanted to try to use someone else's code 
>>>>>>>> to download and import data.
>>>>>>>>
>>>>>>>>
>>>>>>>> For other data structures like BHL, we're kind of making stuff 
>>>>>>>> up since we're packaging a relational structure and not 
>>>>>>>> something nearly as flat as GBIF and DWC stuff.
>>>>>>>>
>>>>>>>>
>>>>>>>> [1] http://guoda.bio/
>>>>>>>>
>>>>>>>> [2] http://www.ecodataretriever.org/
>>>>>>>>
>>>>>>>>
>>>>>>>> Matthew Collins
>>>>>>>> Technical Operations Manager
>>>>>>>> Advanced Computing and Information Systems Lab, ECE
>>>>>>>> University of Florida
>>>>>>>> 352-392-5414 <callto:352-392-5414>
>>>>>>>> ------------------------------------------------------------------------
>>>>>>>> *From:* jorrit poelen <jhpoelen at xs4all.nl>
>>>>>>>> *Sent:* Monday, May 30, 2016 11:16 AM
>>>>>>>> *To:* Collins, Matthew; Thompson, Alexander M; Hammock, Jennifer
>>>>>>>> *Subject:* Fwd: [API-users] Is there any NEO4J or graph-based 
>>>>>>>> driver for this API ?
>>>>>>>> Hey y’all:
>>>>>>>>
>>>>>>>> Interesting request below on the GBIF mailing list - sounds 
>>>>>>>> like a perfect fit for the GUODA use cases.
>>>>>>>>
>>>>>>>> Would it be too early to jump onto this thread and share our 
>>>>>>>> efforts/vision?
>>>>>>>>
>>>>>>>> thx,
>>>>>>>> -jorrit
>>>>>>>>
>>>>>>>>> Begin forwarded message:
>>>>>>>>>
>>>>>>>>> *From: *Jan Legind <jlegind at gbif.org>
>>>>>>>>> *Subject: **Re: [API-users] Is there any NEO4J or graph-based 
>>>>>>>>> driver for this API ?*
>>>>>>>>> *Date: *May 30, 2016 at 5:48:51 AM PDT
>>>>>>>>> *To: *Mauro Cavalcanti <maurobio at gmail.com>, "Juan M. 
>>>>>>>>> Escamilla Molgora" <j.escamillamolgora at lancaster.ac.uk>
>>>>>>>>> *Cc: *"api-users at lists.gbif.org 
>>>>>>>>> <mailto:api-users at lists.gbif.org>" <api-users at lists.gbif.org>
>>>>>>>>>
>>>>>>>>> Dear Juan,
>>>>>>>>> Unfortunately we have no tool for creating these kind of SQL 
>>>>>>>>> like queries to the portal. I am sure you are aware that the 
>>>>>>>>> filters in the occurrence search pages can be applied in 
>>>>>>>>> combination in numerous ways. The API can go even further in 
>>>>>>>>> this regard[1], but it not well suited for retrieving 
>>>>>>>>> occurrence records since there is a 200.000 records ceiling 
>>>>>>>>> making it unfit for species exceeding this number.
>>>>>>>>> There is going be updates to the pygbif package[2] in the near 
>>>>>>>>> future that will enable you to launch user downloads 
>>>>>>>>> programmatically where a whole list of different species can 
>>>>>>>>> be used as a query parameter as well as adding polygons.[3]
>>>>>>>>> In the meantime, Mauro’s suggestion is excellent. If you can 
>>>>>>>>> narrow your search down until it returns a manageable download 
>>>>>>>>> (say less than 100 million records), importing this into a 
>>>>>>>>> database should be doable. From there, you can refine using 
>>>>>>>>> SQL queries.
>>>>>>>>> Best,
>>>>>>>>> Jan K. Legind, GBIF Data manager
>>>>>>>>> [1]http://www.gbif.org/developer/occurrence#search
>>>>>>>>> [2]https://github.com/sckott/pygbif
>>>>>>>>> [3]https://github.com/jlegind/GBIF-downloads
>>>>>>>>> *From:*API-users [mailto:api-users-bounces at lists.gbif.org]*On 
>>>>>>>>> Behalf Of*Mauro Cavalcanti
>>>>>>>>> *Sent:*30. maj 2016 14:06
>>>>>>>>> *To:*Juan M. Escamilla Molgora
>>>>>>>>> *Cc:*api-users at lists.gbif.org
>>>>>>>>> *Subject:*Re: [API-users] Is there any NEO4J or graph-based 
>>>>>>>>> driver for this API ?
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> One solution I have successfully adopted for this is to 
>>>>>>>>> download the records (either "manually" via browser or, yet 
>>>>>>>>> better, using a Python script using the fine pygbif library), 
>>>>>>>>> storing them into a MySQL or SQLite database and then perform 
>>>>>>>>> the relational queries. I can provide examples if you are 
>>>>>>>>> interested.
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> 2016-05-30 8:59 GMT-03:00 Juan M. Escamilla Molgora 
>>>>>>>>> <j.escamillamolgora at lancaster.ac.uk>:
>>>>>>>>> Hola,
>>>>>>>>>
>>>>>>>>> Is there any API for making relational queries like taxonomy, 
>>>>>>>>> location or timestamp?
>>>>>>>>>
>>>>>>>>> Thank you and best wishes
>>>>>>>>>
>>>>>>>>> Juan
>>>>>>>>> _______________________________________________
>>>>>>>>> API-users mailing list
>>>>>>>>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org>
>>>>>>>>> http://lists.gbif.org/mailman/listinfo/api-users
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Dr. Mauro J. Cavalcanti
>>>>>>>>> E-mail:maurobio at gmail.com
>>>>>>>>> Web:http://sites.google.com/site/maurobio
>>>>>>>>> _______________________________________________
>>>>>>>>> API-users mailing list
>>>>>>>>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org>
>>>>>>>>> http://lists.gbif.org/mailman/listinfo/api-users
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> API-users mailing list
>>>>>>>> API-users at lists.gbif.org
>>>>>>>> http://lists.gbif.org/mailman/listinfo/api-users
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> API-users mailing list
>>>>>>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org>
>>>>>>> http://lists.gbif.org/mailman/listinfo/api-users
>>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> API-users mailing list
>>> API-users at lists.gbif.org
>>> http://lists.gbif.org/mailman/listinfo/api-users
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gbif.org/pipermail/api-users/attachments/20160601/e04faa11/attachment-0001.html>