Populating portal for a country
Hi all,
I'm trying to set-up an experimental instance of ALA for Belgium. The idea is to present all data published by our belgian institutions, in order to give those institutations / collections / datasets more visibility.
What's the recommended way to load this data (everything at GBIF published by Belgian institutions). I used the "Add all GBIF resource for a country" from the Collectory admin interface, but
1) I encountered some errors. All datasets detected correctly, but only 10 are loaded - for the rest I get the message "Failed. Please check your authentication credentials are valid". The authentication credentials are indeed correct, since I only entered once (so, also for the working dataset). When looking at my downloads on the GBIF website, I can see no track those "failed" datasets. Can I get debugging information ? Should I fill a bug report ? In the ala-collectory repository ? Or some other ?
2) Also, the /collectory/manage/gbifLoadCountry pages states that "This is a simple method of bootstrapping an installation with data provided by GBIF web services. This is not intended for long-term production use.". So maybe I should use another method to load all the data for my country.
3) How are the (dataset) metadata populated when using this method ? It seems the basic metadata is there, but not much more. Should we do it by hand ? Should we write a tool to do it ? How about updates?
Thanks a lot!
Thanks Nicolas.
1) My understanding is the GBIF API has been bombarded recently and they have now put in place a limit of the number of concurrent downloads to 3. The collectory currently attempts 10 download at a time - hence our problem. We’ll try and push out an update to fix this very soon.
2) ALA initially wrote this bootstrapping tool to demonstrate the system (the system isn’t much good without data:) ). I think it has now reached some level of maturity and we should remove that text.
3) The metadata that comes across is pretty limited I agree, and we’d welcome some contributions to improving the level of metadata that is retrieved and then pulled into the collectory for display. The metadata (retrieved from the EML documents in the DwC-A files) is currently updated each time a dataset is reloaded from GBIF. We are using the UUIDs that GBIF has for datasets to avoid creating duplicate resources. Hence you can reload datasets without issues.
Hope this helps,
Dave Martin ALA
On 10 Jun 2015, at 1:59 pm, Nicolas Noé <n.noe@biodiversity.bemailto:n.noe@biodiversity.be> wrote:
Hi all,
I'm trying to set-up an experimental instance of ALA for Belgium. The idea is to present all data published by our belgian institutions, in order to give those institutations / collections / datasets more visibility.
What's the recommended way to load this data (everything at GBIF published by Belgian institutions). I used the "Add all GBIF resource for a country" from the Collectory admin interface, but
1) I encountered some errors. All datasets detected correctly, but only 10 are loaded - for the rest I get the message "Failed. Please check your authentication credentials are valid". The authentication credentials are indeed correct, since I only entered once (so, also for the working dataset). When looking at my downloads on the GBIF website, I can see no track those "failed" datasets. Can I get debugging information ? Should I fill a bug report ? In the ala-collectory repository ? Or some other ?
2) Also, the /collectory/manage/gbifLoadCountry pages states that "This is a simple method of bootstrapping an installation with data provided by GBIF web services. This is not intended for long-term production use.". So maybe I should use another method to load all the data for my country.
3) How are the (dataset) metadata populated when using this method ? It seems the basic metadata is there, but not much more. Should we do it by hand ? Should we write a tool to do it ? How about updates?
Thanks a lot! _______________________________________________ Ala-portal mailing list Ala-portal@lists.gbif.orgmailto:Ala-portal@lists.gbif.org http://lists.gbif.org/mailman/listinfo/ala-portal
- My understanding is the GBIF API has been bombarded recently and they have now put in place a limit of the number of concurrent downloads to 3. The collectory currently attempts 10 download at a time - hence our problem.
We’ll try and push out an update to fix this very soon.
I’d like to just explain why this was rushed out. This was a necessary step as GBIF.org was exposed to DOS attacks (often inadvertently) which was happening. In addition the GBIF R library now makes it trivial to download from GBIF and users can easily write a loop in code and trigger huge numbers of download which would block the system for others.
We’ll continue to tune the values by monitoring operations, and probably introduce higher per-user limits so we can e.g. increase the concurrency for known ALA accounts.
Please contact me if this causes any issue Nico.
On 10 Jun 2015, at 1:59 pm, Nicolas Noé n.noe@biodiversity.be wrote:
Hi all,
I'm trying to set-up an experimental instance of ALA for Belgium. The idea is to present all data published by our belgian institutions, in order to give those institutations / collections / datasets more visibility.
What's the recommended way to load this data (everything at GBIF published by Belgian institutions). I used the "Add all GBIF resource for a country" from the Collectory admin interface, but
I encountered some errors. All datasets detected correctly, but only 10 are loaded - for the rest I get the message "Failed. Please check your authentication credentials are valid". The authentication credentials are indeed correct, since I only entered once (so, also for the working dataset). When looking at my downloads on the GBIF website, I can see no track those "failed" datasets. Can I get debugging information ? Should I fill a bug report ? In the ala-collectory repository ? Or some other ?
Also, the /collectory/manage/gbifLoadCountry pages states that "This is a simple method of bootstrapping an installation with data provided by GBIF web services. This is not intended for long-term production use.". So maybe I should use another method to load all the data for my country.
How are the (dataset) metadata populated when using this method ? It seems the basic metadata is there, but not much more. Should we do it by hand ? Should we write a tool to do it ? How about updates?
Thanks a lot! _______________________________________________ Ala-portal mailing list Ala-portal@lists.gbif.org http://lists.gbif.org/mailman/listinfo/ala-portal
Ala-portal mailing list Ala-portal@lists.gbif.org http://lists.gbif.org/mailman/listinfo/ala-portal
Hi Nico,
For the point 3, I wrote some scripts in order to add data using the GBIF API. It helps me a lot especially for contacts and institutions. I will clean up and test a little bit more then I can put it on my github if that can help.
Cheers from Madagascar ! Marie
On Thu, Jun 11, 2015 at 10:33 AM, Tim Robertson trobertson@gbif.org wrote:
- My understanding is the GBIF API has been bombarded recently and they
have now put in place a limit of the number of concurrent downloads to 3. The collectory currently attempts 10 download at a time - hence our problem. We’ll try and push out an update to fix this very soon.
I’d like to just explain why this was rushed out. This was a necessary step as GBIF.org was exposed to DOS attacks (often inadvertently) which was happening. In addition the GBIF R library now makes it trivial to download from GBIF and users can easily write a loop in code and trigger huge numbers of download which would block the system for others.
We’ll continue to tune the values by monitoring operations, and probably introduce higher per-user limits so we can e.g. increase the concurrency for known ALA accounts.
Please contact me if this causes any issue Nico.
On 10 Jun 2015, at 1:59 pm, Nicolas Noé n.noe@biodiversity.be wrote:
Hi all,
I'm trying to set-up an experimental instance of ALA for Belgium. The idea is to present all data published by our belgian institutions, in order to give those institutations / collections / datasets more visibility.
What's the recommended way to load this data (everything at GBIF published by Belgian institutions). I used the "Add all GBIF resource for a country" from the Collectory admin interface, but
- I encountered some errors. All datasets detected correctly, but only 10
are loaded - for the rest I get the message "Failed. Please check your authentication credentials are valid". The authentication credentials are indeed correct, since I only entered once (so, also for the working dataset). When looking at my downloads on the GBIF website, I can see no track those "failed" datasets. Can I get debugging information ? Should I fill a bug report ? In the ala-collectory repository ? Or some other ?
- Also, the /collectory/manage/gbifLoadCountry pages states that "This is
a simple method of bootstrapping an installation with data provided by GBIF web services. This is not intended for long-term production use.". So maybe I should use another method to load all the data for my country.
- How are the (dataset) metadata populated when using this method ? It
seems the basic metadata is there, but not much more. Should we do it by hand ? Should we write a tool to do it ? How about updates?
Thanks a lot! _______________________________________________ Ala-portal mailing list Ala-portal@lists.gbif.org http://lists.gbif.org/mailman/listinfo/ala-portal
Ala-portal mailing list Ala-portal@lists.gbif.org http://lists.gbif.org/mailman/listinfo/ala-portal
Ala-portal mailing list Ala-portal@lists.gbif.org http://lists.gbif.org/mailman/listinfo/ala-portal
--
Thanks to the three of you for your complementary answers, that's really helpful.
Dave, I confirm that we are really interested in an improved version that triggers download in a more gentle way to avoid triggering GBIF's defense mechanism. And Marie-Elise, we are also very interested in your scripts. Don't hesitate if we can be of any help to one of you!
Have a great weekend,
Nico
Le 11/06/15 09:42, Marie Elise Lecoq a écrit :
Hi Nico,
For the point 3, I wrote some scripts in order to add data using the GBIF API. It helps me a lot especially for contacts and institutions. I will clean up and test a little bit more then I can put it on my github if that can help.
Cheers from Madagascar ! Marie
On Thu, Jun 11, 2015 at 10:33 AM, Tim Robertson <trobertson@gbif.org mailto:trobertson@gbif.org> wrote:
1) My understanding is the GBIF API has been bombarded recently and they have now put in place a limit of the number of concurrent downloads to 3. The collectory currently attempts 10 download at a time - hence our problem. We’ll try and push out an update to fix this very soon.
I’d like to just explain why this was rushed out. This was a necessary step as GBIF.org <http://GBIF.org> was exposed to DOS attacks (often inadvertently) which was happening. In addition the GBIF R library now makes it trivial to download from GBIF and users can easily write a loop in code and trigger huge numbers of download which would block the system for others. We’ll continue to tune the values by monitoring operations, and probably introduce higher per-user limits so we can e.g. increase the concurrency for known ALA accounts. Please contact me if this causes any issue Nico.
On 10 Jun 2015, at 1:59 pm, Nicolas Noé <n.noe@biodiversity.be <mailto:n.noe@biodiversity.be>> wrote: Hi all, I'm trying to set-up an experimental instance of ALA for Belgium. The idea is to present all data published by our belgian institutions, in order to give those institutations / collections / datasets more visibility. What's the recommended way to load this data (everything at GBIF published by Belgian institutions). I used the "Add all GBIF resource for a country" from the Collectory admin interface, but 1) I encountered some errors. All datasets detected correctly, but only 10 are loaded - for the rest I get the message "Failed. Please check your authentication credentials are valid". The authentication credentials are indeed correct, since I only entered once (so, also for the working dataset). When looking at my downloads on the GBIF website, I can see no track those "failed" datasets. Can I get debugging information ? Should I fill a bug report ? In the ala-collectory repository ? Or some other ? 2) Also, the /collectory/manage/gbifLoadCountry pages states that "This is a simple method of bootstrapping an installation with data provided by GBIF web services. This is not intended for long-term production use.". So maybe I should use another method to load all the data for my country. 3) How are the (dataset) metadata populated when using this method ? It seems the basic metadata is there, but not much more. Should we do it by hand ? Should we write a tool to do it ? How about updates? Thanks a lot! _______________________________________________ Ala-portal mailing list Ala-portal@lists.gbif.org <mailto:Ala-portal@lists.gbif.org> http://lists.gbif.org/mailman/listinfo/ala-portal
_______________________________________________ Ala-portal mailing list Ala-portal@lists.gbif.org <mailto:Ala-portal@lists.gbif.org> http://lists.gbif.org/mailman/listinfo/ala-portal
_______________________________________________ Ala-portal mailing list Ala-portal@lists.gbif.org <mailto:Ala-portal@lists.gbif.org> http://lists.gbif.org/mailman/listinfo/ala-portal
--
Ala-portal mailing list Ala-portal@lists.gbif.org http://lists.gbif.org/mailman/listinfo/ala-portal
participants (4)
-
David.Martin@csiro.au
-
Marie Elise Lecoq
-
Nicolas Noé
-
Tim Robertson