[API-users] API change proposal: supporting ranges in occurrence eventDate

Matthew Blissett mblissett at gbif.org
Tue Jan 17 14:28:33 UTC 2023


Dear GBIF API users,

/You might prefer to read this email on either GitHub or the community 
forum, as the formatting is probably better:/

//

/GitHub issue discussion: 
https://github.com/gbif/gbif-api/issues/4#issuecomment-1385497157/

//

/Community forum discussion: 
https://discourse.gbif.org/t/gbif-api-supporting-ranges-in-occurrence-eventdate/3804/


A longstanding issue with the GBIF API is the interpretation and 
formatting of the Darwin Core term "eventDate".

*Summary: instead of GBIF changing published |eventDate| values like 
|2009-03-18/2009-04-13| and |2010| to |2009-03-18| and |2010-01-01| 
respectively, we propose returning the values |2009-03-18/2009-04-13| 
and |2010| in the occurrence API and in downloads. Existing code/scripts 
that use the |eventDate| value may need to be updated.*

The recommended best practise for the term is "use a date that conforms 
to ISO 8601-1:2019" (see https://dwc.tdwg.org/terms/#dwc:eventDate).

ISO 8601-1:2019 supports date ranges, and some publishers provide these. 
Examples are |2000-05|, or |2007-11-13/2007-11-15|. GBIF's current 
interpretation changes date ranges like this to the first possible day 
in the range (|2000-05-01| and |2007-11-13|).

At least 64 million occurrences are affected.


    <https://gist.github.com/MattBlissett/ff06599559ce86302a6e84d2e3e605ec#change-to-date-interpretation>


    Change to date interpretation

We propose changing the eventDate field in the GBIF API to support ISO 
8601-1 date ranges. A range will be returned where one was provided by 
the publisher, either directly as a range in the |eventDate| field, or 
through a combination of the |year|, |month|, |day|, |startDayOfYear| 
and |endDayOfYear| fields.

The data quality checks on dates will be improved to check for 
consistency between these fields: |eventDate|, |year|, |month|, |day|, 
|startDayOfYear| and |endDayOfYear|. These fields will only be populated 
if they are constant for the whole range of dates — a range spanning 
several days in January 2020 will have |year=2020|, |month=January| and 
|day=(Blank)|.

|startDayOfYear| and |endDayOfYear| will also be present if the range is 
accurate to days.

Examples:

published event date 	intepreted eventDate 	int. year 	int. month 	int. 
day 	int. sdoy 	int. edoy
2023-01-13 	2023-01-13 	2023 	1 	13 	13 	13
2023-01 	2023-01 	2023 	1 	
	
	
2023 	2023 	2023 	
	
	
	
2023-01-13/2023-01-14 	2023-01-13/2023-01-14 	2023 	1 	
	13 	14
2023-01-13/14 	2023-01-13/14 	2023 	1 	
	13 	14
2023-01/2023-02 	2023-01/2023-02 	2023 	
	
	
	
2023-01/02 	2023-01/02 	2023 	
	
	
	
2023/2024 	2023/2024 	
	
	
	
	
2023-01-01/2023-12-31 	2023-01-01/2023-12-31 	2023 	
	
	1 	365

Other cases where we can unambiguously determine a date or date range 
will also be handled, for example a record with a |year| and |month| but 
no |eventDate|, or non-ISO dates like |January 2023|.


      <https://gist.github.com/MattBlissett/ff06599559ce86302a6e84d2e3e605ec#api-example>


      API example:

This record <https://api.gbif.org/v1/occurrence/1234530937> (portal link 
<https://www.gbif.org/occurrence/1234530937>) is published with 
|eventDate=2009-03-18/2009-04-13|, |year=2009|, |month=3|, |day=18|. We 
currently change the |eventDate|:

"year":2009,
"month":3,
"day":18,
"eventDate":"2009-03-18T00:00:00",

With this proposal, we would preserve the |eventDate| but remove |day|, 
as it the event crosses several days:

"year":2009,
"month":3,
"eventDate":"2009-03-18/2009-04-13",

This record <https://api.gbif.org/v1/occurrence/2382954724> (portal link 
<https://www.gbif.org/occurrence/2382954724>) is published with 
|eventDate=2019-04-06T20:00:00/2019-04-10T05:00:00| and no separate 
|day|, |month| or |year| values. Currently, we process it to this:

"year":2019,
"month":4,
"day":6,
"eventDate":"2019-04-06T20:00:00",

Instead, we propose returning this:

"year":2019,
"month":4,
"eventDate":"2019-04-06T20:00:00/2019-04-10T05:00:00",
"startDayOfYear":96,
"endDayOfYear":100,


    <https://gist.github.com/MattBlissett/ff06599559ce86302a6e84d2e3e605ec#searching>


    Searching

The search and download APIs will be affected by this change.

Occurrences will be returned if the occurrence date/date range is 
*completely within* the query date or date range.

|Search: eventDate=2023-01-11 Record: eventDate=2023-01-11 -- included 
Record: eventDate=2023-01 -- EXCLUDED Record: eventDate=2023-01-11/12 -- 
EXCLUDED Search: eventDate=2023-01-11,2023-01-12 Record: 
eventDate=2023-01-11 -- included Record: eventDate=2023-01 -- EXCLUDED 
Record: eventDate=2023-01-11/12 -- included Search: eventDate=*,2023-01 
(meaning "Before end of January 2023") Record: eventDate=2023-01-11 -- 
included Record: eventDate=2023-01 -- included Record: 
eventDate=2023-01-11/12 -- included Search: eventDate=2023-01,2023-01 
(meaning "After start of January 2023 AND before end of January 2023") 
Search: eventDate=2023-01 (same meaning) Record: eventDate=2023-01-11 -- 
included Record: eventDate=2023-01 -- included Record: 
eventDate=2023-01-11/12 -- included |

This implementation will avoid returning occurrences with eventDates 
like "2010/2021" in many queries. (There are millions of occurrences 
with large ranges like this.)


    <https://gist.github.com/MattBlissett/ff06599559ce86302a6e84d2e3e605ec#density-maps>


    Density maps

There is a year filter for the density/pixel maps. An occurrence from 
2023-01 will be included, but an occurrence with an eventDate spanning 
more than a single year (like 2022-13-31/2023-01-01) will no longer be 
included.


    <https://gist.github.com/MattBlissett/ff06599559ce86302a6e84d2e3e605ec#quarterly-analytics-globalregional-trends>


    Quarterly analytics, global/regional trends

The quarterly analytics include calculations based on the individual 
dwc:year, dwc:month and dwc:day fields. The statistics will be affected 
where these values change or become blank.


    <https://gist.github.com/MattBlissett/ff06599559ce86302a6e84d2e3e605ec#rgbif-pygbif>


    rGBIF, PyGBIF

Both libraries will be updated as necessary to support eventDate values 
containing a date range.


    <https://gist.github.com/MattBlissett/ff06599559ce86302a6e84d2e3e605ec#feedback>


    Feedback

We have delayed addressing this issue for a long time, primarily due to 
concerns about changing the existing behaviour of the API. However, it's 
also one of the most frequently requested improvements to GBIF's 
interpretation.

If you are aware of software or systems which would have problems 
adapting to the proposed change, please let us know, either on this 
mailing list, the GitHub issue, the community forum or by email to me.

We will alert users in the same places when the change is ready to be 
tested on the test system at api.gbif-uat.org, and when the change is to 
be made live on api.gbif.org.

Thank you,

Matt

GitHub issue discussion: 
https://github.com/gbif/gbif-api/issues/4#issuecomment-1385497157

Community forum discussion: 
https://discourse.gbif.org/t/gbif-api-supporting-ranges-in-occurrence-eventdate/3804
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.gbif.org/pipermail/api-users/attachments/20230117/b3c84e87/attachment.html>


More information about the API-users mailing list