[API-users] API change proposal: supporting ranges in occurrence eventDate

Matthew Blissett mblissett at gbif.org
Tue Sep 26 11:53:14 UTC 2023


Dear GBIF API users,

/You might prefer to read this email on either GitHub or the community 
forum, as the formatting is probably better:/

//

/GitHub issue discussion: 
https://github.com/gbif/gbif-api/issues/4#issuecomment-1735378954/

//

/Community forum discussion: 
https://discourse.gbif.org/t/gbif-api-supporting-ranges-in-occurrence-eventdate/3804/

*Event dates — upcoming API change*

Early this year we announced a plan to change the way we handle the 
"eventDate" Darwin Core term.  Date ranges formatted using the ISO 8601 
standard, recommended by Darwin Core, will retain their meaning, and the 
API will return values like "2000-05" or "2007-11-13/2007-11-15", rather 
than the current behaviour of changing these values to "2000-05-01" and 
"2007-11-13".

These changes are now visible on GBIF's test system, GBIF-UAT.org.  To 
allow time for you to test this change against any existing software and 
scripts you have, we will not implement these changes on GBIF.org before 
early November.

*API users*

Users of the occurrence API will need to decide how to handle an 
eventDate like "1880/1889", "1910", "2000-05", "1999-11/2000-03", 
"2007-11-13/2007-11-15" or "2023-09-22T05:17:10/2023-09-22T12:17:10" — 
taking the earliest, latest or middle value, randomizing within the 
range, excluding them etc.  To make parsing easier ranges will always be 
formatted using the full form and never abbreviated — always 
"2007-11-13/2007-11-15" and never "2007-11-13/15".

It may be easier to use the individual "year", "month" and "day" fields, 
which will be present if the year/month/day is constant for the whole 
range of the eventDate — eventDate=2010-11-25/2010-12-03 will have 
year=2010, month=NULL, day=NULL as only the year is constant. (However, 
note a date like 2022-12-31/2023-01-01 covers just 2 days, but as is 
spans two different years the "year" field will be blank.)

When searching using a range (e.g. eventDate=2005-01,2005-03) only 
occurrences with eventDates *entirely within* the range will be returned.

*Download users*

The "eventDate" column in CSV, Darwin Core and Parquet (cloud snapshot) 
downloads will contain the same value as in the API, for example 
"2023-09-22T12:17:10", "2023-09-22", "1880/1889", "1910", "2000-05", 
"1999-11/2000-03", "2007-11-13/2007-11-15" or 
"2023-09-22T05:17:10/2023-09-22T12:17:10".

As with the search API, when filtering using a range (e.g. 
eventDate=2005-01,2005-03) only occurrences with eventDates *entirely 
within* the range will be returned

*Data interpretation (for data publishers)*

Eight Darwin Core terms record information on when an occurrence was 
collected or observed:

- year
- month
- day
- eventDate
- eventTime
- startDayOfYear
- endDayOfYear
- verbatimEventDate

Some records will have conflicting information in these fields. Detailed 
documentation on how we handle the various cases is being prepared, but 
the general approach is to remove parts of the date that conflict, 
adding a RECORDED_DATE_MISMATCH issue in this case.  For example, 
"eventDate=2005-06-01", "year=2005", "month=6" and "day=NULL" would have 
eventDate changed to "2005-06" and the issue added.

Occurrences published with only one/some fields will have the other 
fields filled in automatically, where possible. We will not add an issue 
flag for this.

All existing datasets will be reprocessed with the new algorithms as the 
change to the API is made for GBIF.org.

*Example dataset*

A dataset of test occurrences is here: 
https://www.gbif-uat.org/occurrence/search?dataset_key=d6167827-973d-429a-a00c-8ea294d62d80 
providing many examples of consistent and conflicting event date fields. 
The scientificName is set to a summary of what the eventDate is, and the 
eventRemarks field has more explanation.

*Feedback*

Feedback is welcome on the GitHub issue, here on the mailing list, or on 
the Discourse forum

Thanks,

Matt

GitHub issue discussion: 
https://github.com/gbif/gbif-api/issues/4#issuecomment-1735378954

Community forum discussion: 
https://discourse.gbif.org/t/gbif-api-supporting-ranges-in-occurrence-eventdate/3804


On 17/01/2023 15:28, Matthew Blissett via API-users wrote:
>
> Dear GBIF API users,
>
> /You might prefer to read this email on either GitHub or the community 
> forum, as the formatting is probably better:/
>
> //
>
> /GitHub issue discussion: 
> https://github.com/gbif/gbif-api/issues/4#issuecomment-1385497157/
>
> //
>
> /Community forum discussion: 
> https://discourse.gbif.org/t/gbif-api-supporting-ranges-in-occurrence-eventdate/3804/
>
>
> A longstanding issue with the GBIF API is the interpretation and 
> formatting of the Darwin Core term "eventDate".
>
> *Summary: instead of GBIF changing published |eventDate| values like 
> |2009-03-18/2009-04-13| and |2010| to |2009-03-18| and |2010-01-01| 
> respectively, we propose returning the values |2009-03-18/2009-04-13| 
> and |2010| in the occurrence API and in downloads. Existing 
> code/scripts that use the |eventDate| value may need to be updated.*
>
> The recommended best practise for the term is "use a date that 
> conforms to ISO 8601-1:2019" (see 
> https://dwc.tdwg.org/terms/#dwc:eventDate).
>
> ISO 8601-1:2019 supports date ranges, and some publishers provide 
> these. Examples are |2000-05|, or |2007-11-13/2007-11-15|. GBIF's 
> current interpretation changes date ranges like this to the first 
> possible day in the range (|2000-05-01| and |2007-11-13|).
>
> At least 64 million occurrences are affected.
>
>
>     <https://gist.github.com/MattBlissett/ff06599559ce86302a6e84d2e3e605ec#change-to-date-interpretation>
>
>
>     Change to date interpretation
>
> We propose changing the eventDate field in the GBIF API to support ISO 
> 8601-1 date ranges. A range will be returned where one was provided by 
> the publisher, either directly as a range in the |eventDate| field, or 
> through a combination of the |year|, |month|, |day|, |startDayOfYear| 
> and |endDayOfYear| fields.
>
> The data quality checks on dates will be improved to check for 
> consistency between these fields: |eventDate|, |year|, |month|, |day|, 
> |startDayOfYear| and |endDayOfYear|. These fields will only be 
> populated if they are constant for the whole range of dates — a range 
> spanning several days in January 2020 will have |year=2020|, 
> |month=January| and |day=(Blank)|.
>
> |startDayOfYear| and |endDayOfYear| will also be present if the range 
> is accurate to days.
>
> Examples:
>
> published event date 	intepreted eventDate 	int. year 	int. month 
> int. day 	int. sdoy 	int. edoy
> 2023-01-13 	2023-01-13 	2023 	1 	13 	13 	13
> 2023-01 	2023-01 	2023 	1 	
> 	
> 	
> 2023 	2023 	2023 	
> 	
> 	
> 	
> 2023-01-13/2023-01-14 	2023-01-13/2023-01-14 	2023 	1 	
> 	13 	14
> 2023-01-13/14 	2023-01-13/14 	2023 	1 	
> 	13 	14
> 2023-01/2023-02 	2023-01/2023-02 	2023 	
> 	
> 	
> 	
> 2023-01/02 	2023-01/02 	2023 	
> 	
> 	
> 	
> 2023/2024 	2023/2024 	
> 	
> 	
> 	
> 	
> 2023-01-01/2023-12-31 	2023-01-01/2023-12-31 	2023 	
> 	
> 	1 	365
>
> Other cases where we can unambiguously determine a date or date range 
> will also be handled, for example a record with a |year| and |month| 
> but no |eventDate|, or non-ISO dates like |January 2023|.
>
>
>       <https://gist.github.com/MattBlissett/ff06599559ce86302a6e84d2e3e605ec#api-example>
>
>
>       API example:
>
> This record <https://api.gbif.org/v1/occurrence/1234530937> (portal 
> link <https://www.gbif.org/occurrence/1234530937>) is published with 
> |eventDate=2009-03-18/2009-04-13|, |year=2009|, |month=3|, |day=18|. 
> We currently change the |eventDate|:
>
> "year":2009,
> "month":3,
> "day":18,
> "eventDate":"2009-03-18T00:00:00",
>
> With this proposal, we would preserve the |eventDate| but remove 
> |day|, as it the event crosses several days:
>
> "year":2009,
> "month":3,
> "eventDate":"2009-03-18/2009-04-13",
>
> This record <https://api.gbif.org/v1/occurrence/2382954724> (portal 
> link <https://www.gbif.org/occurrence/2382954724>) is published with 
> |eventDate=2019-04-06T20:00:00/2019-04-10T05:00:00| and no separate 
> |day|, |month| or |year| values. Currently, we process it to this:
>
> "year":2019,
> "month":4,
> "day":6,
> "eventDate":"2019-04-06T20:00:00",
>
> Instead, we propose returning this:
>
> "year":2019,
> "month":4,
> "eventDate":"2019-04-06T20:00:00/2019-04-10T05:00:00",
> "startDayOfYear":96,
> "endDayOfYear":100,
>
>
>     <https://gist.github.com/MattBlissett/ff06599559ce86302a6e84d2e3e605ec#searching>
>
>
>     Searching
>
> The search and download APIs will be affected by this change.
>
> Occurrences will be returned if the occurrence date/date range is 
> *completely within* the query date or date range.
>
> |Search: eventDate=2023-01-11 Record: eventDate=2023-01-11 -- included 
> Record: eventDate=2023-01 -- EXCLUDED Record: eventDate=2023-01-11/12 
> -- EXCLUDED Search: eventDate=2023-01-11,2023-01-12 Record: 
> eventDate=2023-01-11 -- included Record: eventDate=2023-01 -- EXCLUDED 
> Record: eventDate=2023-01-11/12 -- included Search: 
> eventDate=*,2023-01 (meaning "Before end of January 2023") Record: 
> eventDate=2023-01-11 -- included Record: eventDate=2023-01 -- included 
> Record: eventDate=2023-01-11/12 -- included Search: 
> eventDate=2023-01,2023-01 (meaning "After start of January 2023 AND 
> before end of January 2023") Search: eventDate=2023-01 (same meaning) 
> Record: eventDate=2023-01-11 -- included Record: eventDate=2023-01 -- 
> included Record: eventDate=2023-01-11/12 -- included |
>
> This implementation will avoid returning occurrences with eventDates 
> like "2010/2021" in many queries. (There are millions of occurrences 
> with large ranges like this.)
>
>
>     <https://gist.github.com/MattBlissett/ff06599559ce86302a6e84d2e3e605ec#density-maps>
>
>
>     Density maps
>
> There is a year filter for the density/pixel maps. An occurrence from 
> 2023-01 will be included, but an occurrence with an eventDate spanning 
> more than a single year (like 2022-13-31/2023-01-01) will no longer be 
> included.
>
>
>     <https://gist.github.com/MattBlissett/ff06599559ce86302a6e84d2e3e605ec#quarterly-analytics-globalregional-trends>
>
>
>     Quarterly analytics, global/regional trends
>
> The quarterly analytics include calculations based on the individual 
> dwc:year, dwc:month and dwc:day fields. The statistics will be 
> affected where these values change or become blank.
>
>
>     <https://gist.github.com/MattBlissett/ff06599559ce86302a6e84d2e3e605ec#rgbif-pygbif>
>
>
>     rGBIF, PyGBIF
>
> Both libraries will be updated as necessary to support eventDate 
> values containing a date range.
>
>
>     <https://gist.github.com/MattBlissett/ff06599559ce86302a6e84d2e3e605ec#feedback>
>
>
>     Feedback
>
> We have delayed addressing this issue for a long time, primarily due 
> to concerns about changing the existing behaviour of the API. However, 
> it's also one of the most frequently requested improvements to GBIF's 
> interpretation.
>
> If you are aware of software or systems which would have problems 
> adapting to the proposed change, please let us know, either on this 
> mailing list, the GitHub issue, the community forum or by email to me.
>
> We will alert users in the same places when the change is ready to be 
> tested on the test system at api.gbif-uat.org, and when the change is 
> to be made live on api.gbif.org.
>
> Thank you,
>
> Matt
>
> GitHub issue discussion: 
> https://github.com/gbif/gbif-api/issues/4#issuecomment-1385497157
>
> Community forum discussion: 
> https://discourse.gbif.org/t/gbif-api-supporting-ranges-in-occurrence-eventdate/3804
>
>
> _______________________________________________
> API-users mailing list
> API-users at lists.gbif.org
> https://lists.gbif.org/mailman/listinfo/api-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.gbif.org/pipermail/api-users/attachments/20230926/8cf467c6/attachment-0001.html>


More information about the API-users mailing list