[API-users] Is there any NEO4J or graph-based driver for this API ?
Juan M. Escamilla Molgora
j.escamillamolgora at lancaster.ac.uk
Wed Jun 1 16:06:37 CEST 2016
Hi Nils,
Thank you for sharing!
How is phoenix about? Does it connects to the ESGF network? It's the
first time I read about this. Looks very very interesting!
Thanks everybody for these valuable feedback.
Best wishes
Juan
On 01/06/16 10:09, Nils Hempelmann wrote:
> Hi Juan et al
>
> Thanks a lot for triggering this discussion.
> I am currently working on a Web processing service
> (http://birdhouse.readthedocs.io/en/latest/) including a species
> distribution model based on the GBIF data (and climate model data). A
> good connection to GBIF database is still missing and all hints were
> quite useful!!
>
> If you want to share code:
> https://github.com/bird-house/flyingpigeon/blob/master/flyingpigeon/processes/wps_sdm.py
>
>
> Merci
> Nils
>
> On 31/05/2016 22:08, Juan M. Escamilla Molgora wrote:
>>
>> Hi Tim,
>>
>> Thank you! specially for the DwC-A hint.
>>
>> The cells are by default in decimal degrees, (wgs84 ) but the
>> functions for generating them are general enough to use any
>> projection supported by gdal using postgis. It could be done "on the
>> fly" or stored on the server side,
>>
>> I was thinking (day dreaming) in a standard way for coding unique but
>> universal grids (similar to geohash or open location code), but
>> didn't find something fast and ready. Maybe later :)
>>
>> I only use Open Source Software, Python, Django, GDAL, Numpy,
>> Postgis, Conda, Py2Neo, ete2 among others.
>>
>> Currently I don't have an official release and the project is quite
>> inmature, unstable as well as the installation could be non trivial.
>> I'm fixing all these issues but will take some time,sorry for this.
>>
>> The github repository is:
>>
>> https://github.com/molgor/biospytial.git
>>
>> An there's a very old documentation here:
>>
>> http://test.holobio.me/modules/gbif_taxonomy_class.html
>>
>> Please feel free to follow!
>>
>>
>> Best wishes
>>
>>
>> Juan
>>
>> P.s. The functions for generating the grid are in:
>> biospytial/SQL_functions
>>
>>
>>
>>
>>
>> On 31/05/16 19:47, Tim Robertson wrote:
>>> Thanks Juan
>>>
>>> You're quite right - you need the DwC-A download format to get those
>>> IDs.
>>>
>>> Are the cells decimal degrees, and then partitioned into smaller
>>> units, or equal area cells or maybe UTM grids or something else
>>> perhaps? I am just curious.
>>>
>>> Are you developing this as OSS? I'd like to follow progress if possible?
>>>
>>> Thanks,
>>> Tim,
>>>
>>> On 31 May 2016, at 20:31, Juan M. Escamilla Molgora
>>> <j.escamillamolgora at lancaster.ac.uk> wrote:
>>>
>>>> Hi Tim,
>>>>
>>>> The grid is made by selecting a square area and divide it in nxn
>>>> subsquares which form a partition on the bigger square.
>>>>
>>>> Each grid is a table in postgis and there's a mapping between this
>>>> table to a django model (class).
>>>>
>>>> The class constructor have attributes: id, cell and neighbours
>>>> (next release).
>>>>
>>>> The cell is a polygon (square) and with geodjango inherits the
>>>> properties of the osgeo module for polygons.
>>>>
>>>> I've tried to use the CSV data (downloaded as a CSV request ) but I
>>>> couldn't find a way to obtain the global id's for each taxonomic
>>>> level (idspecies, idgenus, idfamily, etc).
>>>>
>>>> Do you know a way for obtaining these fields?
>>>>
>>>>
>>>> Thank you for your email and best wishes,
>>>>
>>>>
>>>> Juan
>>>>
>>>>
>>>> On 31/05/16 19:03, Tim Robertson wrote:
>>>>> Hi Juan
>>>>>
>>>>> That sounds like a fun project!
>>>>>
>>>>> Can you please describe your grid / cells?
>>>>>
>>>>> Most likely your best bet will be to use the download API (as CSV
>>>>> data) and ingest that. The other APIs will likely hit limits (e.g.
>>>>> You can't page through indefinitely).
>>>>>
>>>>> Thanks,
>>>>> Tim
>>>>>
>>>>> On 31 May 2016, at 18:55, Juan M. Escamilla Molgora
>>>>> <j.escamillamolgora at lancaster.ac.uk> wrote:
>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>>
>>>>>> Thank you very much for your valuable feedback!
>>>>>>
>>>>>>
>>>>>> I'll explain a bit what I'm doing just to clarify, sorry if this
>>>>>> spam to some.
>>>>>>
>>>>>>
>>>>>> I want to build a model for species assemblages based on
>>>>>> co-occurrence of taxa within an arbitrary area. I'm building a 2D
>>>>>> lattice in which for each cell I'm collapsing the data into a
>>>>>> taxonomic tree (the occurrences). For doing this I need first to
>>>>>> obtain the data from the gbif api and later, based on the ids (or
>>>>>> names) of each taxonomic level (from kingdom to occurrence) build
>>>>>> a tree coupled to each cell.
>>>>>>
>>>>>>
>>>>>> The implementation is done with postgresql (postgis) for storing
>>>>>> the raw gbif data and neo4j for storing the relation
>>>>>>
>>>>>> "Being a member of the [ specie, genus, family,,,] [name/id]" The
>>>>>> idea is to include data from different sources similar to the
>>>>>> project Matthew and Jennifer had mentioned (which I'm very
>>>>>> interested and like to hear more) and traverse the network
>>>>>> looking for significant merged information.
>>>>>>
>>>>>>
>>>>>> One of the immediate problems I've found is to import big chunks
>>>>>> of the gbif data into my specification. Thanks to this thread
>>>>>> I've found the tools that are the most used by the community
>>>>>> (pygbif,rgbif, and python-dwca-reader). I was using urlib2 and
>>>>>> things like that.
>>>>>>
>>>>>> I'll be happy to share any code or ideas with the people interested.
>>>>>>
>>>>>>
>>>>>> Btw, I've checked the tinkerpop project which uses the Gremlin
>>>>>> traversal language as independent from the DBMS.
>>>>>>
>>>>>> Perhaps it's possible to use it with spark and Guoda as well?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Does GOuda is working now?
>>>>>>
>>>>>>
>>>>>> Best wishes
>>>>>>
>>>>>>
>>>>>> Juan.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 31/05/16 17:02, Collins, Matthew wrote:
>>>>>>>
>>>>>>> Jorrit pointed out this thread to us at iDigBio. Downloading and
>>>>>>> importing data into a relational database will work great,
>>>>>>> especially if as Jan said you can cut the data size down to a
>>>>>>> reasonable amount.
>>>>>>>
>>>>>>>
>>>>>>> Another approach we've been working on in a collaboration called
>>>>>>> GUODA [1] is to build an Apache Spark environment with
>>>>>>> pre-formatted data frames with common data sets in them for
>>>>>>> researchers to use. This approach would offer a remote service
>>>>>>> where you could write arbitrary Spark code, probably in Jupyter
>>>>>>> notebooks, to iterate over data. Spark does a lot of cool stuff
>>>>>>> including GraphX which might be of interest. This is definitely
>>>>>>> pre-alpha at this point and if anyone is interested, I'd like to
>>>>>>> hear your thoughts. I'll also be at SPNHC talking about this.
>>>>>>>
>>>>>>>
>>>>>>> One thing we've found in working on this is that importing data
>>>>>>> into a structured data format isn't always easy. If you only
>>>>>>> want a few columns, it'll be fine. But getting the data typing,
>>>>>>> format standardization, and column name syntax of the whole
>>>>>>> width of an iDigBio record right requires some code. I looked to
>>>>>>> see if EcoData Retriever [2] had a GBIF data source and they
>>>>>>> have an eBird one that perhaps you might find useful as a
>>>>>>> starting point if you wanted to try to use someone else's code
>>>>>>> to download and import data.
>>>>>>>
>>>>>>>
>>>>>>> For other data structures like BHL, we're kind of making stuff
>>>>>>> up since we're packaging a relational structure and not
>>>>>>> something nearly as flat as GBIF and DWC stuff.
>>>>>>>
>>>>>>>
>>>>>>> [1] http://guoda.bio/
>>>>>>>
>>>>>>> [2] http://www.ecodataretriever.org/
>>>>>>>
>>>>>>>
>>>>>>> Matthew Collins
>>>>>>> Technical Operations Manager
>>>>>>> Advanced Computing and Information Systems Lab, ECE
>>>>>>> University of Florida
>>>>>>> 352-392-5414 <callto:352-392-5414>
>>>>>>> ------------------------------------------------------------------------
>>>>>>> *From:* jorrit poelen <jhpoelen at xs4all.nl>
>>>>>>> *Sent:* Monday, May 30, 2016 11:16 AM
>>>>>>> *To:* Collins, Matthew; Thompson, Alexander M; Hammock, Jennifer
>>>>>>> *Subject:* Fwd: [API-users] Is there any NEO4J or graph-based
>>>>>>> driver for this API ?
>>>>>>> Hey y’all:
>>>>>>>
>>>>>>> Interesting request below on the GBIF mailing list - sounds like
>>>>>>> a perfect fit for the GUODA use cases.
>>>>>>>
>>>>>>> Would it be too early to jump onto this thread and share our
>>>>>>> efforts/vision?
>>>>>>>
>>>>>>> thx,
>>>>>>> -jorrit
>>>>>>>
>>>>>>>> Begin forwarded message:
>>>>>>>>
>>>>>>>> *From: *Jan Legind <jlegind at gbif.org>
>>>>>>>> *Subject: **Re: [API-users] Is there any NEO4J or graph-based
>>>>>>>> driver for this API ?*
>>>>>>>> *Date: *May 30, 2016 at 5:48:51 AM PDT
>>>>>>>> *To: *Mauro Cavalcanti <maurobio at gmail.com>, "Juan M. Escamilla
>>>>>>>> Molgora" <j.escamillamolgora at lancaster.ac.uk>
>>>>>>>> *Cc: *"api-users at lists.gbif.org
>>>>>>>> <mailto:api-users at lists.gbif.org>" <api-users at lists.gbif.org>
>>>>>>>>
>>>>>>>> Dear Juan,
>>>>>>>> Unfortunately we have no tool for creating these kind of SQL
>>>>>>>> like queries to the portal. I am sure you are aware that the
>>>>>>>> filters in the occurrence search pages can be applied in
>>>>>>>> combination in numerous ways. The API can go even further in
>>>>>>>> this regard[1], but it not well suited for retrieving
>>>>>>>> occurrence records since there is a 200.000 records ceiling
>>>>>>>> making it unfit for species exceeding this number.
>>>>>>>> There is going be updates to the pygbif package[2] in the near
>>>>>>>> future that will enable you to launch user downloads
>>>>>>>> programmatically where a whole list of different species can be
>>>>>>>> used as a query parameter as well as adding polygons.[3]
>>>>>>>> In the meantime, Mauro’s suggestion is excellent. If you can
>>>>>>>> narrow your search down until it returns a manageable download
>>>>>>>> (say less than 100 million records), importing this into a
>>>>>>>> database should be doable. From there, you can refine using SQL
>>>>>>>> queries.
>>>>>>>> Best,
>>>>>>>> Jan K. Legind, GBIF Data manager
>>>>>>>> [1]http://www.gbif.org/developer/occurrence#search
>>>>>>>> [2]https://github.com/sckott/pygbif
>>>>>>>> [3]https://github.com/jlegind/GBIF-downloads
>>>>>>>> *From:*API-users [mailto:api-users-bounces at lists.gbif.org]*On
>>>>>>>> Behalf Of*Mauro Cavalcanti
>>>>>>>> *Sent:*30. maj 2016 14:06
>>>>>>>> *To:*Juan M. Escamilla Molgora
>>>>>>>> *Cc:*api-users at lists.gbif.org
>>>>>>>> *Subject:*Re: [API-users] Is there any NEO4J or graph-based
>>>>>>>> driver for this API ?
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> One solution I have successfully adopted for this is to
>>>>>>>> download the records (either "manually" via browser or, yet
>>>>>>>> better, using a Python script using the fine pygbif library),
>>>>>>>> storing them into a MySQL or SQLite database and then perform
>>>>>>>> the relational queries. I can provide examples if you are
>>>>>>>> interested.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> 2016-05-30 8:59 GMT-03:00 Juan M. Escamilla Molgora
>>>>>>>> <j.escamillamolgora at lancaster.ac.uk>:
>>>>>>>> Hola,
>>>>>>>>
>>>>>>>> Is there any API for making relational queries like taxonomy,
>>>>>>>> location or timestamp?
>>>>>>>>
>>>>>>>> Thank you and best wishes
>>>>>>>>
>>>>>>>> Juan
>>>>>>>> _______________________________________________
>>>>>>>> API-users mailing list
>>>>>>>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org>
>>>>>>>> http://lists.gbif.org/mailman/listinfo/api-users
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Dr. Mauro J. Cavalcanti
>>>>>>>> E-mail:maurobio at gmail.com
>>>>>>>> Web:http://sites.google.com/site/maurobio
>>>>>>>> _______________________________________________
>>>>>>>> API-users mailing list
>>>>>>>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org>
>>>>>>>> http://lists.gbif.org/mailman/listinfo/api-users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> API-users mailing list
>>>>>>> API-users at lists.gbif.org
>>>>>>> http://lists.gbif.org/mailman/listinfo/api-users
>>>>>>
>>>>>> _______________________________________________
>>>>>> API-users mailing list
>>>>>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org>
>>>>>> http://lists.gbif.org/mailman/listinfo/api-users
>>>>
>>
>>
>>
>> _______________________________________________
>> API-users mailing list
>> API-users at lists.gbif.org
>> http://lists.gbif.org/mailman/listinfo/api-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gbif.org/pipermail/api-users/attachments/20160601/d0b29f05/attachment-0001.html>
More information about the API-users
mailing list