IPT fields mapped to GBIF API download fields?
I'm new to the list, and though I've looked at the archives I find them hard to maneuver (very nested; can't just 'search on page').
Can anyone point me to a reference/lookup table that maps the IPT field names and content to the names and content of the fields that are currently within a GBIF download (.xlsx example is attached, with 241 columns)?
Thanks if you can, and may you be well,
Annie Simpson, biologist and information scientist BioFoundational Data Team Science Analytics & Synthesis Program U.S. Geological Survey 12201 Sunrise Valley Dr. Mailstop 302 Reston VA 20192 asimpson@usgs.gov +1 703-648-4281 https://orcid.org/0000-0001-8338-5134
Hi Annie,
the GBIF IPT and API mailing lists are also available through mail-archive.comhttp://mail-archive.com which offers a search across the entire archive:
https://www.mail-archive.com/ipt@lists.gbif.org/maillist.html
https://www.mail-archive.com/api-users@lists.gbif.org/maillist.html
As for a mapping I am not aware of any. But the IPT maps to DwC terms and the GBIF API mostly uses DwC terms, but has additions and slight modifications in some areas.
Best, Markus
On 25. Jun 2020, at 16:22, Simpson, Annie <asimpson@usgs.govmailto:asimpson@usgs.gov> wrote:
I'm new to the list, and though I've looked at the archives I find them hard to maneuver (very nested; can't just 'search on page').
Can anyone point me to a reference/lookup table that maps the IPT field names and content to the names and content of the fields that are currently within a GBIF download (.xlsx example is attached, with 241 columns)?
Thanks if you can, and may you be well,
Annie Simpson, biologist and information scientist BioFoundational Data Team Science Analytics & Synthesis Program U.S. Geological Survey 12201 Sunrise Valley Dr. Mailstop 302 Reston VA 20192 asimpson@usgs.govmailto:asimpson@usgs.gov +1 703-648-4281 https://orcid.org/0000-0001-8338-5134
<GBIFdownload20200618-small.xlsx>_______________________________________________ IPT mailing list IPT@lists.gbif.orgmailto:IPT@lists.gbif.org https://lists.gbif.org/mailman/listinfo/ipt
Colleagues:
What is the easiest or most popular way to send large datasets to GBIF, ones that are too large for the IPT software (I think that is more than 100MB zipped, 10+million records)? Does one modify their IPT instance? How? Or is there another process that is preferred?
We currently have IPT Version 2.3.6-r3985b6a installed and plan to upgrade to 2.4.0 soon.
A technical answer is what I seek (on behalf of our technical team).
Again my apologies if the answer to my question is easily found and I'm just not finding it.
Annie Simpson, BISON product owner (she/her/hers) BioFoundational Data Team Science Analytics & Synthesis Program U.S. Geological Survey 12201 Sunrise Valley Dr. Mailstop 302 Reston VA 20192 asimpson@usgs.gov +1 703-648-4281 https://orcid.org/0000-0001-8338-5134 https://bison.usgs.gov [https://services.arcgisonline.com/arcgis/rest/services/World_Topo_Map/MapSer...]https://bison.usgs.gov/ Biodiversity Information Serving Our Nation (BISON)https://bison.usgs.gov/ USGS Biodiversity Information Serving Our Nation (BISON) is a unique, web-based Federal mapping resource for species occurrence data in the United States and its Territories and Canada, including marine Exclusive Economic Zones (EEZs). bison.usgs.gov
Hi Annie,
If your data is in a database (MySQL, Oracle, etc), you can make a database connection to the database within the IPT instead uploading a file. https://github.com/gbif/ipt/wiki/IPT2ManualManageResources.wiki#database-as-...
Or (and this is a hack) you create a same named file with a just few records that you can upload and map. Then you take the same named file with all the records and copy it over the uploaded file on the server in the ipt data folder. Then you go back to the IPT, edit the data source and click the Analyze button, Save, then publish. Someone with access to the directories on the server would need to do this.
Best,
Laura
Laura Anne Russell Programme Officer for Participation and Engagement Global Biodiversity Information Facility (GBIF) Secretariat
larussell@gbif.org (email) laura.anne.russell (Skype) @pagodarose (Twitter) #CiteTheDOI @GBIF
+45 35 33 35 51 (office, direct line)
GBIF Universitetsparken 15 DK-2100 Copenhagen Ø Denmark
From: IPT ipt-bounces@lists.gbif.org on behalf of "Simpson, Annie" asimpson@usgs.gov Date: Tuesday, 7 July 2020 at 16.48 To: "ipt@lists.gbif.org" ipt@lists.gbif.org Subject: [IPT] How does one upload large datasets to GBIF?
Colleagues:
What is the easiest or most popular way to send large datasets to GBIF, ones that are too large for the IPT software (I think that is more than 100MB zipped, 10+million records)? Does one modify their IPT instance? How? Or is there another process that is preferred?
We currently have IPT Version 2.3.6-r3985b6a installed and plan to upgrade to 2.4.0 soon.
A technical answer is what I seek (on behalf of our technical team).
Again my apologies if the answer to my question is easily found and I'm just not finding it.
Annie Simpson, BISON product owner (she/her/hers) BioFoundational Data Team Science Analytics & Synthesis Program U.S. Geological Survey 12201 Sunrise Valley Dr. Mailstop 302 Reston VA 20192 asimpson@usgs.gov +1 703-648-4281 https://orcid.org/0000-0001-8338-5134 https://bison.usgs.gov [Image removed by sender.]https://bison.usgs.gov/ Biodiversity Information Serving Our Nation (BISON)https://bison.usgs.gov/ USGS Biodiversity Information Serving Our Nation (BISON) is a unique, web-based Federal mapping resource for species occurrence data in the United States and its Territories and Canada, including marine Exclusive Economic Zones (EEZs). bison.usgs.gov
I could also mention that it is possible to script the creation of the Darwin Core Archives and then use the GBIF Registry API for the connections with GBIF. Symbiota, PlutoF and some others are successfully doing this. It does require some initial coordination with our Product Team on how to set up and coordinate the registration process and potentially with our Informatics Team.
Best,
Laura
Laura Anne Russell Programme Officer for Participation and Engagement Global Biodiversity Information Facility (GBIF) Secretariat
larussell@gbif.org (email) laura.anne.russell (Skype) @pagodarose (Twitter) #CiteTheDOI @GBIF
+45 35 33 35 51 (office, direct line)
GBIF Universitetsparken 15 DK-2100 Copenhagen Ø Denmark
From: IPT ipt-bounces@lists.gbif.org on behalf of "Simpson, Annie" asimpson@usgs.gov Date: Tuesday, 7 July 2020 at 16.48 To: "ipt@lists.gbif.org" ipt@lists.gbif.org Subject: [IPT] How does one upload large datasets to GBIF?
Colleagues:
What is the easiest or most popular way to send large datasets to GBIF, ones that are too large for the IPT software (I think that is more than 100MB zipped, 10+million records)? Does one modify their IPT instance? How? Or is there another process that is preferred?
We currently have IPT Version 2.3.6-r3985b6a installed and plan to upgrade to 2.4.0 soon.
A technical answer is what I seek (on behalf of our technical team).
Again my apologies if the answer to my question is easily found and I'm just not finding it.
Annie Simpson, BISON product owner (she/her/hers) BioFoundational Data Team Science Analytics & Synthesis Program U.S. Geological Survey 12201 Sunrise Valley Dr. Mailstop 302 Reston VA 20192 asimpson@usgs.gov +1 703-648-4281 https://orcid.org/0000-0001-8338-5134 https://bison.usgs.gov [Image removed by sender.]https://bison.usgs.gov/ Biodiversity Information Serving Our Nation (BISON)https://bison.usgs.gov/ USGS Biodiversity Information Serving Our Nation (BISON) is a unique, web-based Federal mapping resource for species occurrence data in the United States and its Territories and Canada, including marine Exclusive Economic Zones (EEZs). bison.usgs.gov
Thank you, Laura, for your replies.
The datasets have been exported from databases and cleaned. They are generally UTF-8 tab delimited files. So it seems that the GBIF Registry API would be the correct solution.
We currently have 8 of these large datasets, only 2 of which would not be updated in the future. Do you have names of GBIF Product Team Members whom my technical team should contact to begin this process? Is there "how to" documentation you can point me to that they should read first?
Annie
________________________________ From: Laura Anne Russell larussell@gbif.org Sent: Tuesday, July 7, 2020 11:17 AM To: Simpson, Annie asimpson@usgs.gov; ipt@lists.gbif.org ipt@lists.gbif.org Subject: [EXTERNAL] Re: [IPT] How does one upload large datasets to GBIF?
This email has been received from outside of DOI - Use caution before clicking on links, opening attachments, or responding.
I could also mention that it is possible to script the creation of the Darwin Core Archives and then use the GBIF Registry API for the connections with GBIF. Symbiota, PlutoF and some others are successfully doing this. It does require some initial coordination with our Product Team on how to set up and coordinate the registration process and potentially with our Informatics Team.
Best,
Laura
Laura Anne Russell
Programme Officer for Participation and Engagement
Global Biodiversity Information Facility (GBIF) Secretariat
larussell@gbif.org (email)
laura.anne.russell (Skype)
@pagodarose (Twitter)
#CiteTheDOI @GBIF
+45 35 33 35 51 (office, direct line)
GBIF
Universitetsparken 15
DK-2100 Copenhagen Ø
Denmark
From: IPT ipt-bounces@lists.gbif.org on behalf of "Simpson, Annie" asimpson@usgs.gov Date: Tuesday, 7 July 2020 at 16.48 To: "ipt@lists.gbif.org" ipt@lists.gbif.org Subject: [IPT] How does one upload large datasets to GBIF?
Colleagues:
What is the easiest or most popular way to send large datasets to GBIF, ones that are too large for the IPT software (I think that is more than 100MB zipped, 10+million records)? Does one modify their IPT instance? How? Or is there another process that is preferred?
We currently have IPT Version 2.3.6-r3985b6a installed and plan to upgrade to 2.4.0 soon.
A technical answer is what I seek (on behalf of our technical team).
Again my apologies if the answer to my question is easily found and I'm just not finding it.
Annie Simpson, BISON product owner
(she/her/hers)
BioFoundational Data Team
Science Analytics & Synthesis Program
U.S. Geological Survey
12201 Sunrise Valley Dr. Mailstop 302
Reston VA 20192
asimpson@usgs.gov
+1 703-648-4281
https://orcid.org/0000-0001-8338-5134
[Image removed by sender.]https://bison.usgs.gov/
Biodiversity Information Serving Our Nation (BISON)https://bison.usgs.gov/
USGS Biodiversity Information Serving Our Nation (BISON) is a unique, web-based Federal mapping resource for species occurrence data in the United States and its Territories and Canada, including marine Exclusive Economic Zones (EEZs).
bison.usgs.gov
Hi Annie,
With additional RAM allocated, the IPT can publish proportionally larger datasets. However, this can be inefficient (or expensive in terms of RAM), especially if the dataset has extensions.
To construct the DWCA outside an IPT you will need:
- data files. It's a good idea to check them for common errors -- incorrect number of columns, duplicate occurrenceIds and so on (the IPT does several checks like this).
- a meta.xml data description file, linking columns to Darwin Core terms. This can be written by hand, using various programming languages, or (often easiest if the process isn't to be repeated) by using an IPT to make a suitable mapping and extracting the resulting file. - an eml.xml metadata file, describing the dataset. The same applies here -- the IPT is useful for providing a UI to write this metadata, especially if all 8 are similar.
Once the DWCA exists, it should be copied to a webserver.
Note that using the registry API is not strictly necessary, and a publisher with a small, unchanging number of datasets outside the IPT need not use it. They can simply give the helpdesk a URL for each dataset's DWCA file, and update the DWCA files at those URLs as necessary.
Using the API is useful for adding additional datasets, making changes (e.g. changing the URL) of the existing 8, or prompting GBIF to reprocess a dataset. To use the API the technical team should create a suitable username (e.g. "usgs" or "bison") on both gbif.org and gbif-uat.org. The latter is our test system. They should then contact helpdesk@gbif.org to ask for permission for that account to make changes under the USGS https://www.gbif.org/publisher/c3ad790a-d426-4ac1-8e32-da61f81f0117 publisher, or whichever publisher is/are appropriate. This will only be on the test system at first.
It's then possible to register a new dataset under that publisher, following the example here: https://github.com/gbif/registry/tree/master/registry-examples/src/test/scri... and see the result.
For general questions on this, the GBIF API mailing list is probably most appropriate: https://lists.gbif.org/mailman/listinfo/api-users
If you have problems or errors with a specific dataset, helpdesk@gbif.org will be the best contact. (They also read both mailing lists.)
Cheers
Matt
On 07/07/2020 17:48, Simpson, Annie wrote:
Thank you, Laura, for your replies.
The datasets have been exported from databases and cleaned. They are generally UTF-8 tab delimited files. So it seems that the GBIF Registry API would be the correct solution.
We currently have 8 of these large datasets, only 2 of which would not be updated in the future. Do you have names of GBIF Product Team Members whom my technical team should contact to begin this process? Is there "how to" documentation you can point me to that they should read first?
Annie
*From:* Laura Anne Russell larussell@gbif.org *Sent:* Tuesday, July 7, 2020 11:17 AM *To:* Simpson, Annie asimpson@usgs.gov; ipt@lists.gbif.org ipt@lists.gbif.org *Subject:* [EXTERNAL] Re: [IPT] How does one upload large datasets to GBIF?
* This email has been received from outside of DOI - Use caution before clicking on links, opening attachments, or responding. *
I could also mention that it is possible to script the creation of the Darwin Core Archives and then use the GBIF Registry API for the connections with GBIF. Symbiota, PlutoF and some others are successfully doing this. It does require some initial coordination with our Product Team on how to set up and coordinate the registration process and potentially with our Informatics Team.
Best,
Laura
Laura Anne Russell
Programme Officer for Participation and Engagement
Global Biodiversity Information Facility (GBIF) Secretariat
larussell@gbif.org (email)
laura.anne.russell (Skype)
@pagodarose (Twitter)
#CiteTheDOI @GBIF
+45 35 33 35 51 (office, direct line)
GBIF
Universitetsparken 15
DK-2100 Copenhagen Ø
Denmark
*From: *IPT ipt-bounces@lists.gbif.org on behalf of "Simpson, Annie" asimpson@usgs.gov *Date: *Tuesday, 7 July 2020 at 16.48 *To: *"ipt@lists.gbif.org" ipt@lists.gbif.org *Subject: *[IPT] How does one upload large datasets to GBIF?
Colleagues:
What is the easiest or most popular way to send large datasets to GBIF, ones that are too large for the IPT software (I think that is more than 100MB zipped, 10+million records)? Does one modify their IPT instance? How? Or is there another process that is preferred?
We currently have IPT Version 2.3.6-r3985b6a installed and plan to upgrade to 2.4.0 soon.
A technical answer is what I seek (on behalf of our technical team).
Again my apologies if the answer to my question is easily found and I'm just not finding it.
Annie Simpson, BISON product owner
(she/her/hers)
BioFoundational Data Team
Science Analytics & Synthesis Program
U.S. Geological Survey
12201 Sunrise Valley Dr. Mailstop 302
Reston VA 20192
asimpson@usgs.gov
+1 703-648-4281
https://orcid.org/0000-0001-8338-5134
Image removed by sender. https://bison.usgs.gov/
Biodiversity Information Serving Our Nation (BISON) https://bison.usgs.gov/
USGS Biodiversity Information Serving Our Nation (BISON) is a unique, web-based Federal mapping resource for species occurrence data in the United States and its Territories and Canada, including marine Exclusive Economic Zones (EEZs).
bison.usgs.gov
IPT mailing list IPT@lists.gbif.org https://lists.gbif.org/mailman/listinfo/ipt
participants (4)
-
Laura Anne Russell
-
Markus Döring
-
Matthew Blissett
-
Simpson, Annie