Dear GBIF API users,
You might prefer to read this email on either GitHub or the community forum, as the formatting is probably better:
GitHub issue discussion: https://github.com/gbif/gbif-api/issues/4#issuecomment-1735378954
Community forum discussion: https://discourse.gbif.org/t/gbif-api-supporting-ranges-in-occurrence-eventdate/3804
Event dates — upcoming API change
Early this year we announced a plan to change the way we handle
the "eventDate" Darwin Core term. Date ranges formatted using the
ISO 8601 standard, recommended by Darwin Core, will retain their
meaning, and the API will return values like "2000-05" or
"2007-11-13/2007-11-15", rather than the current behaviour of
changing these values to "2000-05-01" and "2007-11-13".
These changes are now visible on GBIF's test system,
GBIF-UAT.org. To allow time for you to test this change against
any existing software and scripts you have, we will not implement
these changes on GBIF.org before early November.
API users
Users of the occurrence API will need to decide how to handle an
eventDate like "1880/1889", "1910", "2000-05", "1999-11/2000-03",
"2007-11-13/2007-11-15" or
"2023-09-22T05:17:10/2023-09-22T12:17:10" — taking the earliest,
latest or middle value, randomizing within the range, excluding
them etc. To make parsing easier ranges will always be formatted
using the full form and never abbreviated — always
"2007-11-13/2007-11-15" and never "2007-11-13/15".
It may be easier to use the individual "year", "month" and "day"
fields, which will be present if the year/month/day is constant
for the whole range of the eventDate —
eventDate=2010-11-25/2010-12-03 will have year=2010, month=NULL,
day=NULL as only the year is constant. (However, note a date like
2022-12-31/2023-01-01 covers just 2 days, but as is spans two
different years the "year" field will be blank.)
When searching using a range (e.g. eventDate=2005-01,2005-03) only
occurrences with eventDates *entirely within* the range will be
returned.
Download users
The "eventDate" column in CSV, Darwin Core and Parquet (cloud
snapshot) downloads will contain the same value as in the API, for
example "2023-09-22T12:17:10", "2023-09-22", "1880/1889", "1910",
"2000-05", "1999-11/2000-03", "2007-11-13/2007-11-15" or
"2023-09-22T05:17:10/2023-09-22T12:17:10".
As with the search API, when filtering using a range (e.g.
eventDate=2005-01,2005-03) only occurrences with eventDates
*entirely within* the range will be returned
Data interpretation (for data publishers)
Eight Darwin Core terms record information on when an occurrence
was collected or observed:
- year
- month
- day
- eventDate
- eventTime
- startDayOfYear
- endDayOfYear
- verbatimEventDate
Some records will have conflicting information in these fields.
Detailed documentation on how we handle the various cases is being
prepared, but the general approach is to remove parts of the date
that conflict, adding a RECORDED_DATE_MISMATCH issue in this
case. For example, "eventDate=2005-06-01", "year=2005", "month=6"
and "day=NULL" would have eventDate changed to "2005-06" and the
issue added.
Occurrences published with only one/some fields will have the
other fields filled in automatically, where possible. We will not
add an issue flag for this.
All existing datasets will be reprocessed with the new algorithms
as the change to the API is made for GBIF.org.
Example dataset
A dataset of test occurrences is here:
https://www.gbif-uat.org/occurrence/search?dataset_key=d6167827-973d-429a-a00c-8ea294d62d80
providing many examples of consistent and conflicting event date
fields. The scientificName is set to a summary of what the
eventDate is, and the eventRemarks field has more explanation.
Feedback
Feedback is welcome on the GitHub issue, here on the mailing list,
or on the Discourse forum
Thanks,
Matt
GitHub issue discussion: https://github.com/gbif/gbif-api/issues/4#issuecomment-1735378954
Community forum discussion: https://discourse.gbif.org/t/gbif-api-supporting-ranges-in-occurrence-eventdate/3804
Dear GBIF API users,
You might prefer to read this email on either GitHub or the community forum, as the formatting is probably better:
GitHub issue discussion: https://github.com/gbif/gbif-api/issues/4#issuecomment-1385497157
Community forum discussion: https://discourse.gbif.org/t/gbif-api-supporting-ranges-in-occurrence-eventdate/3804
A longstanding issue with the GBIF API is the interpretation and formatting of the Darwin Core term "eventDate".
Summary: instead of GBIF changing published
The recommended best practise for the term is "use a date that conforms to ISO 8601-1:2019" (see https://dwc.tdwg.org/terms/#dwc:eventDate).eventDate
values like2009-03-18/2009-04-13
and2010
to2009-03-18
and2010-01-01
respectively, we propose returning the values2009-03-18/2009-04-13
and2010
in the occurrence API and in downloads. Existing code/scripts that use theeventDate
value may need to be updated.ISO 8601-1:2019 supports date ranges, and some publishers provide these. Examples are
2000-05
, or2007-11-13/2007-11-15
. GBIF's current interpretation changes date ranges like this to the first possible day in the range (2000-05-01
and2007-11-13
).At least 64 million occurrences are affected.
Change to date interpretation
We propose changing the eventDate field in the GBIF API to support ISO 8601-1 date ranges. A range will be returned where one was provided by the publisher, either directly as a range in the
eventDate
field, or through a combination of theyear
,month
,day
,startDayOfYear
andendDayOfYear
fields.The data quality checks on dates will be improved to check for consistency between these fields:
eventDate
,year
,month
,day
,startDayOfYear
andendDayOfYear
. These fields will only be populated if they are constant for the whole range of dates — a range spanning several days in January 2020 will haveyear=2020
,month=January
andday=(Blank)
.
startDayOfYear
andendDayOfYear
will also be present if the range is accurate to days.Examples:
published event date intepreted eventDate int. year int. month int. day int. sdoy int. edoy 2023-01-13 2023-01-13 2023 1 13 13 13 2023-01 2023-01 2023 1
2023 2023 2023
2023-01-13/2023-01-14 2023-01-13/2023-01-14 2023 1
13 14 2023-01-13/14 2023-01-13/14 2023 1
13 14 2023-01/2023-02 2023-01/2023-02 2023
2023-01/02 2023-01/02 2023
2023/2024 2023/2024
2023-01-01/2023-12-31 2023-01-01/2023-12-31 2023
1 365 Other cases where we can unambiguously determine a date or date range will also be handled, for example a record with a
year
andmonth
but noeventDate
, or non-ISO dates likeJanuary 2023
.
API example:
This record (portal link) is published with
eventDate=2009-03-18/2009-04-13
,year=2009
,month=3
,day=18
. We currently change theeventDate
:"year": 2009, "month": 3, "day": 18, "eventDate": "2009-03-18T00:00:00",With this proposal, we would preserve the
eventDate
but removeday
, as it the event crosses several days:"year": 2009, "month": 3, "eventDate": "2009-03-18/2009-04-13",This record (portal link) is published with
eventDate=2019-04-06T20:00:00/2019-04-10T05:00:00
and no separateday
,month
oryear
values. Currently, we process it to this:"year": 2019, "month": 4, "day": 6, "eventDate": "2019-04-06T20:00:00",Instead, we propose returning this:
"year": 2019, "month": 4, "eventDate": "2019-04-06T20:00:00/2019-04-10T05:00:00", "startDayOfYear": 96, "endDayOfYear": 100,
Searching
The search and download APIs will be affected by this change.
Occurrences will be returned if the occurrence date/date range is completely within the query date or date range.
Search: eventDate=2023-01-11 Record: eventDate=2023-01-11 -- included Record: eventDate=2023-01 -- EXCLUDED Record: eventDate=2023-01-11/12 -- EXCLUDED Search: eventDate=2023-01-11,2023-01-12 Record: eventDate=2023-01-11 -- included Record: eventDate=2023-01 -- EXCLUDED Record: eventDate=2023-01-11/12 -- included Search: eventDate=*,2023-01 (meaning "Before end of January 2023") Record: eventDate=2023-01-11 -- included Record: eventDate=2023-01 -- included Record: eventDate=2023-01-11/12 -- included Search: eventDate=2023-01,2023-01 (meaning "After start of January 2023 AND before end of January 2023") Search: eventDate=2023-01 (same meaning) Record: eventDate=2023-01-11 -- included Record: eventDate=2023-01 -- included Record: eventDate=2023-01-11/12 -- included
This implementation will avoid returning occurrences with eventDates like "2010/2021" in many queries. (There are millions of occurrences with large ranges like this.)
Density maps
There is a year filter for the density/pixel maps. An occurrence from 2023-01 will be included, but an occurrence with an eventDate spanning more than a single year (like 2022-13-31/2023-01-01) will no longer be included.
Quarterly analytics, global/regional trends
The quarterly analytics include calculations based on the individual dwc:year, dwc:month and dwc:day fields. The statistics will be affected where these values change or become blank.
rGBIF, PyGBIF
Both libraries will be updated as necessary to support eventDate values containing a date range.
Feedback
We have delayed addressing this issue for a long time, primarily due to concerns about changing the existing behaviour of the API. However, it's also one of the most frequently requested improvements to GBIF's interpretation.
If you are aware of software or systems which would have problems adapting to the proposed change, please let us know, either on this mailing list, the GitHub issue, the community forum or by email to me.
We will alert users in the same places when the change is ready to be tested on the test system at api.gbif-uat.org, and when the change is to be made live on api.gbif.org.
Thank you,
Matt
GitHub issue discussion: https://github.com/gbif/gbif-api/issues/4#issuecomment-1385497157
Community forum discussion: https://discourse.gbif.org/t/gbif-api-supporting-ranges-in-occurrence-eventdate/3804
_______________________________________________ API-users mailing list API-users@lists.gbif.org https://lists.gbif.org/mailman/listinfo/api-users