Question about Catalogue of Life (COL) Data Package file format
Dear COL team,
I'm trying to download the ColDP Archive of THE COL CHECKLIST VERSION 2021-09-21 on this page: https://www.catalogueoflife.org/data/download. However, the actual download link https://download.catalogueoflife.org/col/monthly/2021-09-21_coldp.zip is not working and returns 404. I am able to find another data file here: https://download.catalogueoflife.org/col/. The most recent one I believe is https://download.catalogueoflife.org/col/latest_coldp.zip with a timestamp of 2021-08-27 11:00.
According to the description of ColDP file format here: https://www.catalogueoflife.org/about/colusage#data-formats. The ZIP archive is supposed to bundle the following delimited text files:
* Namehttps://github.com/CatalogueOfLife/coldp/blob/master/README.md#name * NameRelationhttps://github.com/CatalogueOfLife/coldp/blob/master/README.md#namerelation * Taxonhttps://github.com/CatalogueOfLife/coldp/blob/master/README.md#taxon * Synonymhttps://github.com/CatalogueOfLife/coldp/blob/master/README.md#synonym * NameUsagehttps://github.com/CatalogueOfLife/coldp/blob/master/README.md#nameusage * Referencehttps://github.com/CatalogueOfLife/coldp/blob/master/README.md#reference * TypeMaterialhttps://github.com/CatalogueOfLife/coldp/blob/master/README.md#typematerial * Distributionhttps://github.com/CatalogueOfLife/coldp/blob/master/README.md#distribution * VernacularNamehttps://github.com/CatalogueOfLife/coldp/blob/master/README.md#vernacularname * Mediahttps://github.com/CatalogueOfLife/coldp/blob/master/README.md#media * SpeciesInteractionhttps://github.com/CatalogueOfLife/coldp/blob/master/README.md#speciesinteraction * TaxonConceptRelationhttps://github.com/CatalogueOfLife/coldp/blob/master/README.md#taxonconceptrelation * SpeciesEstimatehttps://github.com/CatalogueOfLife/coldp/blob/master/README.md#speciesestimate * Treatmentshttps://github.com/CatalogueOfLife/coldp/blob/master/README.md#treatment
However, In the latest_coldp.zip I downloaded, I got something different.
$zipinfo -1 latest_coldp.zip
* NameUsage.tsv * NameRelation.tsv * TypeMaterial.tsv * VernacularName.tsv * Distribution.tsv * Media.tsv * SpeciesEstimate.tsv * SpeciesInteraction.tsv * TaxonConceptRelation.tsv * Reference.tsv * reference.json (<- this is new)
Some important files are missing.
* Name * Taxon * Synomym * Treatments
Could you please help understand it? Thanks.
Best regards,
Tiejun CHENG, Ph.D. --------------------------- National Center for Biotechnology Information (NCBI) National Library of Medicine (NLM) National Institutes of Health (NIH)
Bldg. 38A, Rm. 8S816A 8600 Rockville Pike Bethesda, MD 20894
Phone: 301-402-9527 Email: chengt2@ncbi.nlm.nih.govmailto:chengt2@ncbi.nlm.nih.gov
Dear Tiejun,
thanks for the notification. I have fixed the download links in the portal now.
ColDP comes in two flavors. The simple version merges Taxon, Synonym and Name into a single NameUsage entity similar to how DwC does it: https://github.com/CatalogueOfLife/coldp/blob/master/README.md#nameusage
This is how we prepare the COL downloads. The other one splits the name usages into 3 distinct entities.
Treatments are missing because we don't have them in COL currently. This will be added once we deal with Plazi articles over the next month.
Best, Markus
On 29. Sep 2021, at 17:29, Cheng, Tiejun (NIH/NLM/NCBI) [E] chengt2@ncbi.nlm.nih.gov wrote:
Dear COL team,
I’m trying to download the ColDP Archive of THE COL CHECKLIST VERSION 2021-09-21 on this page:https://www.catalogueoflife.org/data/download. However, the actual download linkhttps://download.catalogueoflife.org/col/monthly/2021-09-21_coldp.zip is not working and returns 404. I am able to find another data file here: https://download.catalogueoflife.org/col/. The most recent one I believe ishttps://download.catalogueoflife.org/col/latest_coldp.zip with a timestamp of 2021-08-27 11:00.
According to the description of ColDP file format here: https://www.catalogueoflife.org/about/colusage#data-formats. The ZIP archive is supposed to bundle the following delimited text files: • Name • NameRelation • Taxon • Synonym • NameUsage • Reference • TypeMaterial • Distribution • VernacularName • Media • SpeciesInteraction • TaxonConceptRelation • SpeciesEstimate • Treatments
However, In the latest_coldp.zip I downloaded, I got something different.
$zipinfo -1 latest_coldp.zip • NameUsage.tsv • NameRelation.tsv • TypeMaterial.tsv • VernacularName.tsv • Distribution.tsv • Media.tsv • SpeciesEstimate.tsv • SpeciesInteraction.tsv • TaxonConceptRelation.tsv • Reference.tsv • reference.json (<- this is new)
Some important files are missing. • Name • Taxon • Synomym • Treatments
Could you please help understand it? Thanks.
Best regards,
Tiejun CHENG, Ph.D.
National Center for Biotechnology Information (NCBI) National Library of Medicine (NLM) National Institutes of Health (NIH)
Bldg. 38A, Rm. 8S816A 8600 Rockville Pike Bethesda, MD 20894
Phone: 301-402-9527 Email: chengt2@ncbi.nlm.nih.gov
COL-Users mailing list COL-Users@lists.gbif.org https://lists.gbif.org/mailman/listinfo/col-users
Hi Markus,
Thanks again for the info. Sorry about asking the same question twice ;-)
Best, Tiejun
-----Original Message----- From: Markus Döring mdoering@gbif.org Sent: Wednesday, September 29, 2021 3:50 PM To: Catalogue of Life user announcements and discussion col-users@lists.gbif.org Subject: Re: [COL-Users] Question about Catalogue of Life (COL) Data Package file format
Dear Tiejun,
thanks for the notification. I have fixed the download links in the portal now.
ColDP comes in two flavors. The simple version merges Taxon, Synonym and Name into a single NameUsage entity similar to how DwC does it: https://github.com/CatalogueOfLife/coldp/blob/master/README.md#nameusage
This is how we prepare the COL downloads. The other one splits the name usages into 3 distinct entities.
Treatments are missing because we don't have them in COL currently. This will be added once we deal with Plazi articles over the next month.
Best, Markus
On 29. Sep 2021, at 17:29, Cheng, Tiejun (NIH/NLM/NCBI) [E] chengt2@ncbi.nlm.nih.gov wrote:
Dear COL team,
I’m trying to download the ColDP Archive of THE COL CHECKLIST VERSION 2021-09-21 on this page:https://www.catalogueoflife.org/data/download. However, the actual download linkhttps://download.catalogueoflife.org/col/monthly/2021-09-21_coldp.zip is not working and returns 404. I am able to find another data file here: https://download.catalogueoflife.org/col/. The most recent one I believe ishttps://download.catalogueoflife.org/col/latest_coldp.zip with a timestamp of 2021-08-27 11:00.
According to the description of ColDP file format here: https://www.catalogueoflife.org/about/colusage#data-formats. The ZIP archive is supposed to bundle the following delimited text files: • Name • NameRelation • Taxon • Synonym • NameUsage • Reference • TypeMaterial • Distribution • VernacularName • Media • SpeciesInteraction • TaxonConceptRelation • SpeciesEstimate • Treatments
However, In the latest_coldp.zip I downloaded, I got something different.
$zipinfo -1 latest_coldp.zip • NameUsage.tsv • NameRelation.tsv • TypeMaterial.tsv • VernacularName.tsv • Distribution.tsv • Media.tsv • SpeciesEstimate.tsv • SpeciesInteraction.tsv • TaxonConceptRelation.tsv • Reference.tsv • reference.json (<- this is new)
Some important files are missing. • Name • Taxon • Synomym • Treatments
Could you please help understand it? Thanks.
Best regards,
Tiejun CHENG, Ph.D.
National Center for Biotechnology Information (NCBI) National Library of Medicine (NLM) National Institutes of Health (NIH)
Bldg. 38A, Rm. 8S816A 8600 Rockville Pike Bethesda, MD 20894
Phone: 301-402-9527 Email: chengt2@ncbi.nlm.nih.gov
COL-Users mailing list COL-Users@lists.gbif.org https://lists.gbif.org/mailman/listinfo/col-users
_______________________________________________ COL-Users mailing list COL-Users@lists.gbif.org https://lists.gbif.org/mailman/listinfo/col-users
participants (2)
-
Cheng, Tiejun (NIH/NLM/NCBI) [E]
-
Markus Döring