[COL-Users] Question about Catalogue of Life (COL) Data Package file format

Markus Döring mdoering at gbif.org
Wed Sep 29 19:49:39 UTC 2021


Dear Tiejun,

thanks for the notification. I have fixed the download links in the portal now.

ColDP comes in two flavors. The simple version merges Taxon, Synonym and Name into a single NameUsage entity similar to how DwC does it: https://github.com/CatalogueOfLife/coldp/blob/master/README.md#nameusage

This is how we prepare the COL downloads.
The other one splits the name usages into 3 distinct entities.

Treatments are missing because we don't have them in COL currently. This will be added once we deal with Plazi articles over the next month.


Best,
Markus



> On 29. Sep 2021, at 17:29, Cheng, Tiejun (NIH/NLM/NCBI) [E] <chengt2 at ncbi.nlm.nih.gov> wrote:
> 
> Dear COL team,
>  
> I’m trying to download the ColDP Archive of THE COL CHECKLIST VERSION 2021-09-21 on this page:https://www.catalogueoflife.org/data/download. However, the actual download linkhttps://download.catalogueoflife.org/col/monthly/2021-09-21_coldp.zip is not working and returns 404. I am able to find another data file here: https://download.catalogueoflife.org/col/. The most recent one I believe ishttps://download.catalogueoflife.org/col/latest_coldp.zip with a timestamp of 2021-08-27 11:00.
>  
> According to the description of ColDP file format here: https://www.catalogueoflife.org/about/colusage#data-formats. The ZIP archive is supposed to bundle the following delimited text files:
> 	• Name
> 	• NameRelation
> 	• Taxon
> 	• Synonym
> 	• NameUsage
> 	• Reference
> 	• TypeMaterial
> 	• Distribution
> 	• VernacularName
> 	• Media
> 	• SpeciesInteraction
> 	• TaxonConceptRelation
> 	• SpeciesEstimate
> 	• Treatments
> 
> However, In the latest_coldp.zip I downloaded, I got something different.
> 
> $zipinfo -1 latest_coldp.zip   
> 	• NameUsage.tsv
> 	• NameRelation.tsv
> 	• TypeMaterial.tsv
> 	• VernacularName.tsv
> 	• Distribution.tsv
> 	• Media.tsv
> 	• SpeciesEstimate.tsv
> 	• SpeciesInteraction.tsv
> 	• TaxonConceptRelation.tsv
> 	• Reference.tsv
> 	• reference.json (<- this is new)
>  
> Some important files are missing.
> 	• Name
> 	• Taxon
> 	• Synomym
> 	• Treatments
>  
> Could you please help understand it? Thanks.
>  
> Best regards,
>  
> Tiejun CHENG, Ph.D.
> ---------------------------
> National Center for Biotechnology Information (NCBI)
> National Library of Medicine (NLM)
> National Institutes of Health (NIH)
>  
> Bldg. 38A, Rm. 8S816A
> 8600 Rockville Pike
> Bethesda, MD 20894
>  
> Phone: 301-402-9527
> Email: chengt2 at ncbi.nlm.nih.gov
>  
> _______________________________________________
> COL-Users mailing list
> COL-Users at lists.gbif.org
> https://lists.gbif.org/mailman/listinfo/col-users



More information about the COL-Users mailing list