[API-users] Is there any NEO4J or graph-based driver for this API ?

Wed Jun 1 13:57:35 CEST 2016

Nils,

Really great... 🙂 Thanks for sharing!

Salud!

2016-06-01 6:09 GMT-03:00 Nils Hempelmann <info at nilshempelmann.de>:

> Hi Juan et al
>
> Thanks a lot for triggering this discussion.
> I am currently working on a Web processing service (
> http://birdhouse.readthedocs.io/en/latest/) including a species
> distribution model based on the GBIF data (and climate model data). A good
> connection to GBIF database is still missing and all hints were quite
> useful!!
>
> If you want to share code:
>
> https://github.com/bird-house/flyingpigeon/blob/master/flyingpigeon/processes/wps_sdm.py
>
> Merci
> Nils
>
> On 31/05/2016 22:08, Juan M. Escamilla Molgora wrote:
>
> Hi Tim,
>
> Thank you! specially for the DwC-A hint.
>
> The cells are by default in decimal degrees, (wgs84 ) but the functions
> for generating them are general enough to use any projection supported by
> gdal using postgis. It could be done "on the fly" or stored on the server
> side,
>
> I was thinking (day dreaming) in a standard way for coding unique but
> universal grids (similar to geohash or open location code), but didn't find
> something fast and ready. Maybe later :)
>
> I only use Open Source Software, Python, Django, GDAL, Numpy, Postgis,
> Conda, Py2Neo, ete2 among others.
>
> Currently I don't have an official release and the project is quite
> inmature, unstable as well as the installation could be non trivial.  I'm
> fixing all these issues but will take some time,sorry for this.
>
> The github repository is:
>
> https://github.com/molgor/biospytial.git
>
> An there's a very old documentation here:
>
> http://test.holobio.me/modules/gbif_taxonomy_class.html
>
> Please feel free to follow!
>
>
> Best wishes
>
>
> Juan
>
> P.s. The functions for generating the grid are in: biospytial/SQL_functions
>
>
>
>
>
> On 31/05/16 19:47, Tim Robertson wrote:
>
> Thanks Juan
>
> You're quite right - you need the DwC-A download format to get those IDs.
>
> Are the cells decimal degrees, and then partitioned into smaller units, or
> equal area cells or maybe UTM grids or something else perhaps? I am just
> curious.
>
> Are you developing this as OSS? I'd like to follow progress if possible?
>
> Thanks,
> Tim,
>
> On 31 May 2016, at 20:31, Juan M. Escamilla Molgora <
> <j.escamillamolgora at lancaster.ac.uk>j.escamillamolgora at lancaster.ac.uk>
> wrote:
>
> Hi Tim,
>
> The grid is made by selecting a square area and divide it in nxn
> subsquares which form a partition on the bigger square.
>
> Each grid is a table in postgis and there's a mapping between this table
> to a django model (class).
>
> The class constructor have attributes: id, cell and neighbours (next
> release).
>
> The cell is a polygon (square) and with geodjango inherits the properties
> of the osgeo module for polygons.
>
> I've tried to use the CSV data (downloaded as a CSV request ) but I
> couldn't find a way to obtain the global id's for each taxonomic level
> (idspecies, idgenus, idfamily, etc).
>
> Do you know a way for obtaining these fields?
>
>
> Thank you for your email and best wishes,
>
>
> Juan
>
> On 31/05/16 19:03, Tim Robertson wrote:
>
> Hi Juan
>
> That sounds like a fun project!
>
> Can you please describe your grid / cells?
>
> Most likely your best bet will be to use the download API (as CSV data)
> and ingest that. The other APIs will likely hit limits (e.g. You can't page
> through indefinitely).
>
> Thanks,
> Tim
>
> On 31 May 2016, at 18:55, Juan M. Escamilla Molgora <
> <j.escamillamolgora at lancaster.ac.uk>j.escamillamolgora at lancaster.ac.uk>
> wrote:
>
> Dear all,
>
>
> Thank you very much for your valuable feedback!
>
>
> I'll explain a bit what I'm doing just to clarify, sorry if this spam to
> some.
>
>
> I want to build a model for species assemblages based on co-occurrence of
> taxa within an arbitrary area. I'm building a 2D lattice in which for each
> cell I'm collapsing the data into a taxonomic tree (the occurrences). For
> doing this I need first to obtain the data from the gbif api and later,
> based on the ids (or names) of each taxonomic level (from kingdom to
> occurrence) build a tree coupled to each cell.
>
>
> The implementation is done with postgresql (postgis) for storing the raw
> gbif data and neo4j for storing the relation
>
> "Being a member of the [ specie, genus, family,,,] [name/id]" The idea is
> to include data from different sources similar to the project Matthew and
> Jennifer had mentioned (which I'm very interested and like to hear more)
> and traverse the network looking for significant merged information.
>
>
> One of the immediate problems I've found is to import big chunks of the
> gbif data into my specification. Thanks to this thread I've found the tools
> that are the most used by the community (pygbif,rgbif, and
> python-dwca-reader). I was using urlib2 and things like that.
>
> I'll be happy to share any code or ideas with the people interested.
>
>
> Btw, I've checked the tinkerpop project which uses the Gremlin traversal
> language as independent from the DBMS.
>
> Perhaps it's possible to use it with spark and Guoda as well?
>
>
>
> Does GOuda is working now?
>
>
> Best wishes
>
>
> Juan.
>
>
>
>
>
>
>
> On 31/05/16 17:02, Collins, Matthew wrote:
>
> Jorrit pointed out this thread to us at iDigBio. Downloading and importing
> data into a relational database will work great, especially if as Jan said
> you can cut the data size down to a reasonable amount.
>
>
> Another approach we've been working on in a collaboration called GUODA
> [1] is to build an Apache Spark environment with pre-formatted data frames
> with common data sets in them for researchers to use. This approach would
> offer a remote service where you could write arbitrary Spark code,
> probably in Jupyter notebooks, to iterate over data. Spark does a lot of
> cool stuff including GraphX which might be of interest. This is definitely
> pre-alpha at this point and if anyone is interested, I'd like to hear your
> thoughts. I'll also be at SPNHC talking about this.
>
>
> One thing we've found in working on this is that importing data into a
> structured data format isn't always easy. If you only want a few columns,
> it'll be fine. But getting the data typing, format standardization, and
> column name syntax of the whole width of an iDigBio record right requires
> some code. I looked to see if EcoData Retriever [2] had a GBIF data
> source and they have an eBird one that perhaps you might find useful as a
> starting point if you wanted to try to use someone else's code to download
> and import data.
>
>
> For other data structures like BHL, we're kind of making stuff up since
> we're packaging a relational structure and not something nearly as flat as
> GBIF and DWC stuff.
>
>
> [1] http://guoda.bio/
>
> [2] http://www.ecodataretriever.org/
>
>
> Matthew Collins
> Technical Operations Manager
> Advanced Computing and Information Systems Lab, ECE
> University of Florida
> 352-392-5414 <callto:352-392-5414>
> ------------------------------
> *From:* jorrit poelen <jhpoelen at xs4all.nl> <jhpoelen at xs4all.nl>
> *Sent:* Monday, May 30, 2016 11:16 AM
> *To:* Collins, Matthew; Thompson, Alexander M; Hammock, Jennifer
> *Subject:* Fwd: [API-users] Is there any NEO4J or graph-based driver for
> this API ?
>
> Hey y’all:
>
> Interesting request below on the GBIF mailing list - sounds like a perfect
> fit for the GUODA use cases.
>
> Would it be too early to jump onto this thread and share our
> efforts/vision?
>
> thx,
> -jorrit
>
> Begin forwarded message:
>
> *From: *Jan Legind < <jlegind at gbif.org>jlegind at gbif.org>
> *Subject: **Re: [API-users] Is there any NEO4J or graph-based driver for
> this API ?*
> *Date: *May 30, 2016 at 5:48:51 AM PDT
> *To: *Mauro Cavalcanti <maurobio at gmail.com>, "Juan M. Escamilla Molgora" <
> <j.escamillamolgora at lancaster.ac.uk>j.escamillamolgora at lancaster.ac.uk>
> *Cc: *"api-users at lists.gbif.org" <api-users at lists.gbif.org>
>
> Dear Juan,
>
> Unfortunately we have no tool for creating these kind of SQL like queries
> to the portal. I am sure you are aware that the filters in the occurrence
> search pages can be applied in combination in numerous ways. The API can go
> even further in this regard[1], but it not well suited for retrieving
> occurrence records since there is a 200.000 records ceiling making it unfit
> for species exceeding this number.
>
> There is going be updates to the pygbif package[2] in the near future that
> will enable you to launch user downloads programmatically where a whole
> list of different species can be used as a query parameter as well as
> adding polygons.[3]
>
> In the meantime, Mauro’s suggestion is excellent. If you can narrow your
> search down until it returns a manageable download (say less than 100
> million records), importing this into a database should be doable. From
> there, you can refine using SQL queries.
>
> Best,
> Jan K. Legind, GBIF Data manager
>
> [1]  <http://www.gbif.org/developer/occurrence#search>
> http://www.gbif.org/developer/occurrence#search
> [2]  <https://github.com/sckott/pygbif>https://github.com/sckott/pygbif
> [3]  <https://github.com/jlegind/GBIF-downloads>
> https://github.com/jlegind/GBIF-downloads
>
> *From:* API-users [mailto:api-users-bounces at lists.gbif.org
> <api-users-bounces at lists.gbif.org>] *On Behalf Of *Mauro Cavalcanti
> *Sent:* 30. maj 2016 14:06
> *To:* Juan M. Escamilla Molgora
> *Cc:*  <api-users at lists.gbif.org>api-users at lists.gbif.org
> *Subject:* Re: [API-users] Is there any NEO4J or graph-based driver for
> this API ?
>
>
> Hi,
>
> One solution I have successfully adopted for this is to download the
> records (either "manually" via browser or, yet better, using a Python
> script using the fine pygbif library), storing them into a MySQL or SQLite
> database and then perform the relational queries. I can provide examples if
> you are interested.
> Best regards,
>
> 2016-05-30 8:59 GMT-03:00 Juan M. Escamilla Molgora <
> <j.escamillamolgora at lancaster.ac.uk>j.escamillamolgora at lancaster.ac.uk>:
> Hola,
>
> Is there any API for making relational queries like taxonomy, location or
> timestamp?
>
> Thank you and best wishes
>
> Juan
> _______________________________________________
> API-users mailing list
> API-users at lists.gbif.org
> http://lists.gbif.org/mailman/listinfo/api-users
>
>
>
> --
> Dr. Mauro J. Cavalcanti
> E-mail:  <maurobio at gmail.com>maurobio at gmail.com
> Web:  <http://sites.google.com/site/maurobio>
> http://sites.google.com/site/maurobio
> _______________________________________________
> API-users mailing list
> API-users at lists.gbif.org
> http://lists.gbif.org/mailman/listinfo/api-users
>
>
>
>
> _______________________________________________
> API-users mailing listAPI-users at lists.gbif.orghttp://lists.gbif.org/mailman/listinfo/api-users
>
>
> _______________________________________________
> API-users mailing list
> API-users at lists.gbif.org
> http://lists.gbif.org/mailman/listinfo/api-users
>
>
>
>
>
> _______________________________________________
> API-users mailing listAPI-users at lists.gbif.orghttp://lists.gbif.org/mailman/listinfo/api-users
>
>
>
> _______________________________________________
> API-users mailing list
> API-users at lists.gbif.org
> http://lists.gbif.org/mailman/listinfo/api-users
>
>

-- 
Dr. Mauro J. Cavalcanti
E-mail: maurobio at gmail.com
Web: http://sites.google.com/site/maurobio
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gbif.org/pipermail/api-users/attachments/20160601/df2166d4/attachment-0001.html>