Unstable COL identifiers in latest October release
Dear friends of COL,
we have released a new monthly October version recently that contains some bad identifiers: https://doi.org/10.48580/df7lv
Some taxa have temporary identifiers in the UUID format, for example Apis mellifera: https://www.catalogueoflife.org/data/taxon/bc15e6bb-fe11-4a8c-ac6f-a1f7d9197...
The old identifier FN46 has been deleted. Luckily we provide tombstone pages for these so you can still resolve them and find their latest use: https://www.catalogueoflife.org/data/taxon/FN46
We are working hard to find the problem and fix it in the next release where these temporary identifiers will be replaced with their previous versions again. Please do not store the temporary UUIDs as they will expire soon. Once resolved I will inform you all on this list. Meanwhile progress on this problem can be followed on github: https://github.com/CatalogueOfLife/general/issues/100
We are very sorry for the inconvenience, we just realised this today.
Regards, Markus
-- Markus Döring GBIF & Catalogue of Life Technical lead ChecklistBank.org
m.doering@gbif.org http://www.checklistbank.org https://www.catalogueoflife.org
Hello,
I am extremely glad to inform you that we have just published the new November edition of the Catalogue of Life Checklist, which has reverted its identifiers to use the previous stable and short ones as we used to know them: https://preview.catalogueoflife.org/2023/11/24/release Monthly Release November 2023https://preview.catalogueoflife.org/2023/11/24/release preview.catalogueoflife.orghttps://preview.catalogueoflife.org/2023/11/24/release [favicon.ico]https://preview.catalogueoflife.org/2023/11/24/release
Happy Thanksgiving!
Markus
On 27. Oct 2023, at 16:28, Markus Döring via COL-Users col-users@lists.gbif.org wrote:
Dear friends of COL,
we have released a new monthly October version recently that contains some bad identifiers: https://doi.org/10.48580/df7lv
Some taxa have temporary identifiers in the UUID format, for example Apis mellifera: https://www.catalogueoflife.org/data/taxon/bc15e6bb-fe11-4a8c-ac6f-a1f7d9197...
The old identifier FN46 has been deleted. Luckily we provide tombstone pages for these so you can still resolve them and find their latest use: https://www.catalogueoflife.org/data/taxon/FN46
We are working hard to find the problem and fix it in the next release where these temporary identifiers will be replaced with their previous versions again. Please do not store the temporary UUIDs as they will expire soon. Once resolved I will inform you all on this list. Meanwhile progress on this problem can be followed on github: https://github.com/CatalogueOfLife/general/issues/100
We are very sorry for the inconvenience, we just realised this today.
Regards, Markus
-- Markus Döring GBIF & Catalogue of Life Technical lead ChecklistBank.org
m.doering@gbif.org http://www.checklistbank.org https://www.catalogueoflife.org
_______________________________________________ COL-Users mailing list COL-Users@lists.gbif.org https://lists.gbif.org/mailman/listinfo/col-users
Op 11/24/23 om 16:28 schreef Markus Döring via COL-Users:
Hello,
I am extremely glad to inform you that we have just published the new November edition of the Catalogue of Life Checklist, which has reverted its identifiers to use the previous stable and short ones as we used to know them: https://preview.catalogueoflife.org/2023/11/24/release
Hi Markus,
The usual structure of files, with Taxon.tsv as the main one, is not in this download, and this new file looks quite a bit smaller, suspiciously so. Is this as expected?
-rw-r--r-- 1 xxx xxx 1.3G Aug 17 21:41 20230817/Taxon.tsv -rw-r--r-- 1 xxx xxx 1.3G Sep 14 15:47 20230914/Taxon.tsv -rw-r--r-- 1 xxx xxx 325M Nov 24 12:25 20231124/dataset-278910.tsv
(files as always from https://download.checklistbank.org/col/ )
Thank you,
Erikjan Rijkers
Monthly Release November 2023https://preview.catalogueoflife.org/2023/11/24/release preview.catalogueoflife.orghttps://preview.catalogueoflife.org/2023/11/24/release [favicon.ico]https://preview.catalogueoflife.org/2023/11/24/release
Happy Thanksgiving!
Markus
On 27. Oct 2023, at 16:28, Markus Döring via COL-Users col-users@lists.gbif.org wrote:
Dear friends of COL,
we have released a new monthly October version recently that contains some bad identifiers: https://doi.org/10.48580/df7lv
Some taxa have temporary identifiers in the UUID format, for example Apis mellifera: https://www.catalogueoflife.org/data/taxon/bc15e6bb-fe11-4a8c-ac6f-a1f7d9197...
The old identifier FN46 has been deleted. Luckily we provide tombstone pages for these so you can still resolve them and find their latest use: https://www.catalogueoflife.org/data/taxon/FN46
We are working hard to find the problem and fix it in the next release where these temporary identifiers will be replaced with their previous versions again. Please do not store the temporary UUIDs as they will expire soon. Once resolved I will inform you all on this list. Meanwhile progress on this problem can be followed on github: https://github.com/CatalogueOfLife/general/issues/100
We are very sorry for the inconvenience, we just realised this today.
Regards, Markus
-- Markus Döring GBIF & Catalogue of Life Technical lead ChecklistBank.org
m.doering@gbif.org http://www.checklistbank.org https://www.catalogueoflife.org
COL-Users mailing list COL-Users@lists.gbif.org https://lists.gbif.org/mailman/listinfo/col-users
COL-Users mailing list COL-Users@lists.gbif.org https://lists.gbif.org/mailman/listinfo/col-users
I noticed this as well. If you ask for a DwCA (or at least this was true a few days ago) of Amphibia in COL19 you get a file without a meta.xml (which to me says not a DwCA), and with only 7 columns, far fewer than previously. -Jonathan
On 11/24/23 11:06, Erikjan Rijkers via COL-Users wrote:
Op 11/24/23 om 16:28 schreef Markus Döring via COL-Users:
Hello,
I am extremely glad to inform you that we have just published the new November edition of the Catalogue of Life Checklist, which has reverted its identifiers to use the previous stable and short ones as we used to know them: https://preview.catalogueoflife.org/2023/11/24/release
Hi Markus,
The usual structure of files, with Taxon.tsv as the main one, is not in this download, and this new file looks quite a bit smaller, suspiciously so. Is this as expected?
-rw-r--r-- 1 xxx xxx 1.3G Aug 17 21:41 20230817/Taxon.tsv -rw-r--r-- 1 xxx xxx 1.3G Sep 14 15:47 20230914/Taxon.tsv -rw-r--r-- 1 xxx xxx 325M Nov 24 12:25 20231124/dataset-278910.tsv
(files as always from https://download.checklistbank.org/col/ )
Thank you,
Erikjan Rijkers
Monthly Release November 2023https://preview.catalogueoflife.org/2023/11/24/release preview.catalogueoflife.orghttps://preview.catalogueoflife.org/2023/11/24/release
[favicon.ico]https://preview.catalogueoflife.org/2023/11/24/release
Happy Thanksgiving!
Markus
On 27. Oct 2023, at 16:28, Markus Döring via COL-Users col-users@lists.gbif.org wrote:
Dear friends of COL,
we have released a new monthly October version recently that contains some bad identifiers: https://doi.org/10.48580/df7lv
Some taxa have temporary identifiers in the UUID format, for example Apis mellifera: https://www.catalogueoflife.org/data/taxon/bc15e6bb-fe11-4a8c-ac6f-a1f7d9197...
The old identifier FN46 has been deleted. Luckily we provide tombstone pages for these so you can still resolve them and find their latest use: https://www.catalogueoflife.org/data/taxon/FN46
We are working hard to find the problem and fix it in the next release where these temporary identifiers will be replaced with their previous versions again. Please do not store the temporary UUIDs as they will expire soon. Once resolved I will inform you all on this list. Meanwhile progress on this problem can be followed on github: https://github.com/CatalogueOfLife/general/issues/100
We are very sorry for the inconvenience, we just realised this today.
Regards, Markus
-- Markus Döring GBIF & Catalogue of Life Technical lead ChecklistBank.org
m.doering@gbif.org http://www.checklistbank.org https://www.catalogueoflife.org
COL-Users mailing list COL-Users@lists.gbif.org https://lists.gbif.org/mailman/listinfo/col-users
COL-Users mailing list COL-Users@lists.gbif.org https://lists.gbif.org/mailman/listinfo/col-users
COL-Users mailing list COL-Users@lists.gbif.org https://lists.gbif.org/mailman/listinfo/col-users
Hi Erikjan,
Indeed the linked files were wrongly using the “simple” cold and dwca format, which only contains the core data suitable for many use cases and which has become the default format for downloads. You need to specify extended=true in the API to get the former complete archive with vernacular names and the rest. We are missing a button in the UI though to selected the extended format and it only works in the API as of today. But we’ll provide a patch shortly.
I have relinked the COL cold & dwca downloads to the complete files, so please download them again if you need them.
Thanks, Markus
On 24. Nov 2023, at 17:06, Erikjan Rijkers er@xs4all.nl wrote:
Op 11/24/23 om 16:28 schreef Markus Döring via COL-Users:
Hello, I am extremely glad to inform you that we have just published the new November edition of the Catalogue of Life Checklist, which has reverted its identifiers to use the previous stable and short ones as we used to know them: https://preview.catalogueoflife.org/2023/11/24/release
Hi Markus,
The usual structure of files, with Taxon.tsv as the main one, is not in this download, and this new file looks quite a bit smaller, suspiciously so. Is this as expected?
-rw-r--r-- 1 xxx xxx 1.3G Aug 17 21:41 20230817/Taxon.tsv -rw-r--r-- 1 xxx xxx 1.3G Sep 14 15:47 20230914/Taxon.tsv -rw-r--r-- 1 xxx xxx 325M Nov 24 12:25 20231124/dataset-278910.tsv
(files as always from https://download.checklistbank.org/col/ )
Thank you,
Erikjan Rijkers
Monthly Release November 2023https://preview.catalogueoflife.org/2023/11/24/release preview.catalogueoflife.orghttps://preview.catalogueoflife.org/2023/11/24/release [favicon.ico]https://preview.catalogueoflife.org/2023/11/24/release Happy Thanksgiving! Markus On 27. Oct 2023, at 16:28, Markus Döring via COL-Users col-users@lists.gbif.org wrote: Dear friends of COL, we have released a new monthly October version recently that contains some bad identifiers: https://doi.org/10.48580/df7lv Some taxa have temporary identifiers in the UUID format, for example Apis mellifera: https://www.catalogueoflife.org/data/taxon/bc15e6bb-fe11-4a8c-ac6f-a1f7d9197... The old identifier FN46 has been deleted. Luckily we provide tombstone pages for these so you can still resolve them and find their latest use: https://www.catalogueoflife.org/data/taxon/FN46 We are working hard to find the problem and fix it in the next release where these temporary identifiers will be replaced with their previous versions again. Please do not store the temporary UUIDs as they will expire soon. Once resolved I will inform you all on this list. Meanwhile progress on this problem can be followed on github: https://github.com/CatalogueOfLife/general/issues/100 We are very sorry for the inconvenience, we just realised this today. Regards, Markus -- Markus Döring GBIF & Catalogue of Life Technical lead ChecklistBank.org m.doering@gbif.org http://www.checklistbank.org https://www.catalogueoflife.org _______________________________________________ COL-Users mailing list COL-Users@lists.gbif.org https://lists.gbif.org/mailman/listinfo/col-users _______________________________________________ COL-Users mailing list COL-Users@lists.gbif.org https://lists.gbif.org/mailman/listinfo/col-users
Thank you very much. There are still a few UUID-ish values but so few that it doesn't bother me (0.003 % of 5 million records as opposed to 25% in the October release)
May I take the opportunity to say that I am very happy that these files are available. I query the data daily.
Thanks!
Erikjan Rijkers
Op 11/27/23 om 12:12 schreef Markus Döring:
Hi Erikjan,
Indeed the linked files were wrongly using the “simple” cold and dwca format, which only contains the core data suitable for many use cases and which has become the default format for downloads. You need to specify extended=true in the API to get the former complete archive with vernacular names and the rest. We are missing a button in the UI though to selected the extended format and it only works in the API as of today. But we’ll provide a patch shortly.
I have relinked the COL cold & dwca downloads to the complete files, so please download them again if you need them.
Thanks, Markus
On 24. Nov 2023, at 17:06, Erikjan Rijkers er@xs4all.nl wrote:
Op 11/24/23 om 16:28 schreef Markus Döring via COL-Users:
Hello, I am extremely glad to inform you that we have just published the new November edition of the Catalogue of Life Checklist, which has reverted its identifiers to use the previous stable and short ones as we used to know them: https://preview.catalogueoflife.org/2023/11/24/release
Hi Markus,
The usual structure of files, with Taxon.tsv as the main one, is not in this download, and this new file looks quite a bit smaller, suspiciously so. Is this as expected?
-rw-r--r-- 1 xxx xxx 1.3G Aug 17 21:41 20230817/Taxon.tsv -rw-r--r-- 1 xxx xxx 1.3G Sep 14 15:47 20230914/Taxon.tsv -rw-r--r-- 1 xxx xxx 325M Nov 24 12:25 20231124/dataset-278910.tsv
(files as always from https://download.checklistbank.org/col/ )
Thank you,
Erikjan Rijkers
Monthly Release November 2023https://preview.catalogueoflife.org/2023/11/24/release preview.catalogueoflife.orghttps://preview.catalogueoflife.org/2023/11/24/release [favicon.ico]https://preview.catalogueoflife.org/2023/11/24/release Happy Thanksgiving! Markus On 27. Oct 2023, at 16:28, Markus Döring via COL-Users col-users@lists.gbif.org wrote: Dear friends of COL, we have released a new monthly October version recently that contains some bad identifiers: https://doi.org/10.48580/df7lv Some taxa have temporary identifiers in the UUID format, for example Apis mellifera: https://www.catalogueoflife.org/data/taxon/bc15e6bb-fe11-4a8c-ac6f-a1f7d9197... The old identifier FN46 has been deleted. Luckily we provide tombstone pages for these so you can still resolve them and find their latest use: https://www.catalogueoflife.org/data/taxon/FN46 We are working hard to find the problem and fix it in the next release where these temporary identifiers will be replaced with their previous versions again. Please do not store the temporary UUIDs as they will expire soon. Once resolved I will inform you all on this list. Meanwhile progress on this problem can be followed on github: https://github.com/CatalogueOfLife/general/issues/100 We are very sorry for the inconvenience, we just realised this today. Regards, Markus -- Markus Döring GBIF & Catalogue of Life Technical lead ChecklistBank.org m.doering@gbif.org http://www.checklistbank.org https://www.catalogueoflife.org _______________________________________________ COL-Users mailing list COL-Users@lists.gbif.org https://lists.gbif.org/mailman/listinfo/col-users _______________________________________________ COL-Users mailing list COL-Users@lists.gbif.org https://lists.gbif.org/mailman/listinfo/col-users
Thanks Erikjan. There are 91 names in there with temporary identifiers. This is because these names are all bad names that we don’t want to assign stable ids to. Hope they will be cleared in future releases.
Markus
On 28. Nov 2023, at 06:43, Erikjan Rijkers er@xs4all.nl wrote:
Thank you very much. There are still a few UUID-ish values but so few that it doesn't bother me (0.003 % of 5 million records as opposed to 25% in the October release)
May I take the opportunity to say that I am very happy that these files are available. I query the data daily.
Thanks!
Erikjan Rijkers
Op 11/27/23 om 12:12 schreef Markus Döring:
Hi Erikjan, Indeed the linked files were wrongly using the “simple” cold and dwca format, which only contains the core data suitable for many use cases and which has become the default format for downloads. You need to specify extended=true in the API to get the former complete archive with vernacular names and the rest. We are missing a button in the UI though to selected the extended format and it only works in the API as of today. But we’ll provide a patch shortly. I have relinked the COL cold & dwca downloads to the complete files, so please download them again if you need them. Thanks, Markus
On 24. Nov 2023, at 17:06, Erikjan Rijkers er@xs4all.nl wrote:
Op 11/24/23 om 16:28 schreef Markus Döring via COL-Users:
Hello, I am extremely glad to inform you that we have just published the new November edition of the Catalogue of Life Checklist, which has reverted its identifiers to use the previous stable and short ones as we used to know them: https://preview.catalogueoflife.org/2023/11/24/release
Hi Markus,
The usual structure of files, with Taxon.tsv as the main one, is not in this download, and this new file looks quite a bit smaller, suspiciously so. Is this as expected?
-rw-r--r-- 1 xxx xxx 1.3G Aug 17 21:41 20230817/Taxon.tsv -rw-r--r-- 1 xxx xxx 1.3G Sep 14 15:47 20230914/Taxon.tsv -rw-r--r-- 1 xxx xxx 325M Nov 24 12:25 20231124/dataset-278910.tsv
(files as always from https://download.checklistbank.org/col/ )
Thank you,
Erikjan Rijkers
Monthly Release November 2023https://preview.catalogueoflife.org/2023/11/24/release preview.catalogueoflife.orghttps://preview.catalogueoflife.org/2023/11/24/release [favicon.ico]https://preview.catalogueoflife.org/2023/11/24/release Happy Thanksgiving! Markus On 27. Oct 2023, at 16:28, Markus Döring via COL-Users col-users@lists.gbif.org wrote: Dear friends of COL, we have released a new monthly October version recently that contains some bad identifiers: https://doi.org/10.48580/df7lv Some taxa have temporary identifiers in the UUID format, for example Apis mellifera: https://www.catalogueoflife.org/data/taxon/bc15e6bb-fe11-4a8c-ac6f-a1f7d9197... The old identifier FN46 has been deleted. Luckily we provide tombstone pages for these so you can still resolve them and find their latest use: https://www.catalogueoflife.org/data/taxon/FN46 We are working hard to find the problem and fix it in the next release where these temporary identifiers will be replaced with their previous versions again. Please do not store the temporary UUIDs as they will expire soon. Once resolved I will inform you all on this list. Meanwhile progress on this problem can be followed on github: https://github.com/CatalogueOfLife/general/issues/100 We are very sorry for the inconvenience, we just realised this today. Regards, Markus -- Markus Döring GBIF & Catalogue of Life Technical lead ChecklistBank.org m.doering@gbif.org http://www.checklistbank.org https://www.catalogueoflife.org _______________________________________________ COL-Users mailing list COL-Users@lists.gbif.org https://lists.gbif.org/mailman/listinfo/col-users _______________________________________________ COL-Users mailing list COL-Users@lists.gbif.org https://lists.gbif.org/mailman/listinfo/col-users
participants (3)
-
Erikjan Rijkers
-
Jonathan A Rees
-
Markus Döring