wrong faceted API results/counts (at least in our datasets) and lowercased facet values (everywhere)
Hello.
I am trying to play with faceted results from the occurrence api, but returned values are very odd IMHO.
Perhaps I am misunderstanding how faceting should work? Or there might be some problem with the indexing of these particular datasets. I am pretty lost. This is what I found:
*(1) RESULTS COUNT NOT MATCHING SUM OF ALL FACET COUNTS*
I put a simple example so everything is returned in one page.
http://api.gbif.org/v1/occurrence/search?facet=ScientificName&limit=10&a...
The count value is 4, the number of results is 4. But the number of facets is 1, and its count is 2.
The faceted term ScientificName is a mandatory field, so no null values should happen. I would expect every occurrence having a value for it. And the number of values is short, so everything is returned in one request (no paging needed). So, in such a case shouldn't the sum of facet counts be equal to the number of results? Why the count of the faceted name is not 4?
*(2) LOWERCASE FACETS (facets values not matching results values):* Look at the same api request above (plant names)
results: Scientificname: "Generic_name specific_name (Basionym_Authors) Name_Authors"
facets: name: "generic_name specific_name (basionym_authors) name_authors"
Why are the facets names always in lowercase? I would say that is an error which shouldn't happen.
But I reported it some days ago and got no answer, so I wonder if this is the intended api behaviour.
http://dev.gbif.org/issues/browse/PF-2758
Not only scientific names are lowercased. This also happens to collectionCode in the next question.
*(3) FACETING COLLECTIONCODE VALUES of a single institution fails depending on filtering parameter used to match the institution (code or uuid):*
Our institution (uuid= def87a70-0837-11d9-acb2-b8a03c50a862 , institutionCode=SANT) serves datasets from 4 collections, which should sum up more than 100000 records.
Why do I get only 2 of our 4 datasets faceted in the following request, which uses our publishingOrg uuid? (uuid should be the preferred option to do this, as code might not be unique for our institution)
http://api.gbif.org/v1/occurrence/search?facet=collectionCode&limit=0&am...
Why do I got 4 of 4 if I filter the request using institutionCode instead? (fortunately, nobody else uses the same institutionCode yet, so numbers are correct)
http://api.gbif.org/v1/occurrence/search?facet=collectionCode&limit=0&am...
And why do counts differ for the same facet value (sant-lich) in those two requests? (9960 in the 1st request, 10007 in the 2nd one)
Why are facet values lowercase again? ("sant-lich" instead of "SANT-Lich")
*(4) FACETING SCIENTIFICNAME FAILS FOR SOME DATASETS, but works as expected for others: *
More than 1000 faceted Scientificnames returned for our SANT-Lich and SANT-Algae collections. Both of them look correct results:
http://api.gbif.org/v1/occurrence/search?facet=ScientificName&limit=0&am...
http://api.gbif.org/v1/occurrence/search?facet=ScientificName&limit=0&am...
But no facets returned for SANT-Bryo (which contains several hundred distinct scientificname values):
http://api.gbif.org/v1/occurrence/search?facet=ScientificName&limit=0&am...
And only 7 facets for SANT scientificnames (should be over 10 thousand, as this is by far our largest dataset):
http://api.gbif.org/v1/occurrence/search?facet=ScientificName&limit=0&am...
Other than the lowercase facets issue (2), I couldn't reproduce issues 1,3,4 in other institutions datasets. So I wonder if all this is somehow related to a wrong indexing of our IPT.
Has anyone else detected these problems?
Thanks a lot in advance for your help
David
Hello David,
thanks for pointing out these faceting issues very clearly. At first glance I agree with all your points and the behaviour is awkward or simply buggy in some cases. We will need some time to investigate further.
Many thanks, Markus
On 10 Feb 2017, at 22:24, Herbario SANT <sant.herbarium@gmail.commailto:sant.herbarium@gmail.com> wrote:
Hello.
I am trying to play with faceted results from the occurrence api, but returned values are very odd IMHO.
Perhaps I am misunderstanding how faceting should work? Or there might be some problem with the indexing of these particular datasets. I am pretty lost. This is what I found:
(1) RESULTS COUNT NOT MATCHING SUM OF ALL FACET COUNTS
I put a simple example so everything is returned in one page.
http://api.gbif.org/v1/occurrence/search?facet=ScientificName&limit=10&a...
The count value is 4, the number of results is 4. But the number of facets is 1, and its count is 2.
The faceted term ScientificName is a mandatory field, so no null values should happen. I would expect every occurrence having a value for it. And the number of values is short, so everything is returned in one request (no paging needed). So, in such a case shouldn't the sum of facet counts be equal to the number of results? Why the count of the faceted name is not 4?
(2) LOWERCASE FACETS (facets values not matching results values):
Look at the same api request above (plant names)
results: Scientificname: "Generic_name specific_name (Basionym_Authors) Name_Authors"
facets: name: "generic_name specific_name (basionym_authors) name_authors"
Why are the facets names always in lowercase? I would say that is an error which shouldn't happen.
But I reported it some days ago and got no answer, so I wonder if this is the intended api behaviour.
http://dev.gbif.org/issues/browse/PF-2758
Not only scientific names are lowercased. This also happens to collectionCode in the next question.
(3) FACETING COLLECTIONCODE VALUES of a single institution fails depending on filtering parameter used to match the institution (code or uuid):
Our institution (uuid= def87a70-0837-11d9-acb2-b8a03c50a862 , institutionCode=SANT) serves datasets from 4 collections, which should sum up more than 100000 records.
Why do I get only 2 of our 4 datasets faceted in the following request, which uses our publishingOrg uuid? (uuid should be the preferred option to do this, as code might not be unique for our institution)
http://api.gbif.org/v1/occurrence/search?facet=collectionCode&limit=0&am...
Why do I got 4 of 4 if I filter the request using institutionCode instead? (fortunately, nobody else uses the same institutionCode yet, so numbers are correct)
http://api.gbif.org/v1/occurrence/search?facet=collectionCode&limit=0&am...
And why do counts differ for the same facet value (sant-lich) in those two requests? (9960 in the 1st request, 10007 in the 2nd one)
Why are facet values lowercase again? ("sant-lich" instead of "SANT-Lich")
(4) FACETING SCIENTIFICNAME FAILS FOR SOME DATASETS, but works as expected for others:
More than 1000 faceted Scientificnames returned for our SANT-Lich and SANT-Algae collections. Both of them look correct results:
http://api.gbif.org/v1/occurrence/search?facet=ScientificName&limit=0&am...
http://api.gbif.org/v1/occurrence/search?facet=ScientificName&limit=0&am...
But no facets returned for SANT-Bryo (which contains several hundred distinct scientificname values):
http://api.gbif.org/v1/occurrence/search?facet=ScientificName&limit=0&am...
And only 7 facets for SANT scientificnames (should be over 10 thousand, as this is by far our largest dataset):
http://api.gbif.org/v1/occurrence/search?facet=ScientificName&limit=0&am...
Other than the lowercase facets issue (2), I couldn't reproduce issues 1,3,4 in other institutions datasets. So I wonder if all this is somehow related to a wrong indexing of our IPT.
Has anyone else detected these problems?
Thanks a lot in advance for your help
David
-- David García San León Herbario SANT Universidade de Santiago de Compostela
_______________________________________________ API-users mailing list API-users@lists.gbif.orgmailto:API-users@lists.gbif.org http://lists.gbif.org/mailman/listinfo/api-users
David,
(1) RESULTS COUNT NOT MATCHING SUM OF ALL FACET COUNTS
It seems that something went wrong indexing some SANT datasets, next week we’ll rebuild our search index and hopefully some of those issues will disappear. By the moment I’d suggest to facet on species_key which seems to give the correct number of records: http://api.gbif.org/v1/occurrence/search?facet=species_key&limit=10&...
(2) LOWERCASE FACETS (facets values not matching results values):
This was an implementation decision to deal with cases which values were reported using different casing standards, which mostly applies for fields like locality and authors but not really for scientific name, collection code, record number, etc. We’ll test a new version of the search index that preserve the original values and we’ll report the results back to you.
(3) FACETING COLLECTIONCODE VALUES of a single institution fails depending on filtering parameter used to match the institution (code or uuid):
It looks to me that just like in (1) something went wrong indexing some datasets, as you can see in the following queries you get a difference in amount of datasets published by ’SANT’ vs its correspondent organisation UUID: http://api.gbif.org/v1/occurrence/search?facet=collectionCode&limit=0&am... —> 2 http://api.gbif.org/v1/occurrence/search?facet=collectionCode&limit=0&am... —> 4
(4) FACETING SCIENTIFICNAME FAILS FOR SOME DATASETS, but works as expected for others:
I’d recommend that, by now, you should try to facet on species_key, we’ll test all these inconsistencies later next week, we are in the processing of building a new GBIF Backbone (http://www.gbif.org/dataset/d7dddbf4-2cf0-4f39-9b2a-bb099caae36c) and that will require to reprocess all the occurrence records, while doing that we try to fix some the issues reported here.
Thanks, Federico Mendez.
From: API-users <api-users-bounces@lists.gbif.orgmailto:api-users-bounces@lists.gbif.org> on behalf of Herbario SANT <sant.herbarium@gmail.commailto:sant.herbarium@gmail.com> Date: Friday 10 February 2017 at 22:24 To: "api-users@lists.gbif.orgmailto:api-users@lists.gbif.org" <api-users@lists.gbif.orgmailto:api-users@lists.gbif.org> Cc: helpdesk <helpdesk@gbif.orgmailto:helpdesk@gbif.org> Subject: [API-users] wrong faceted API results/counts (at least in our datasets) and lowercased facet values (everywhere)
Hello.
I am trying to play with faceted results from the occurrence api, but returned values are very odd IMHO.
Perhaps I am misunderstanding how faceting should work? Or there might be some problem with the indexing of these particular datasets. I am pretty lost. This is what I found:
(1) RESULTS COUNT NOT MATCHING SUM OF ALL FACET COUNTS
I put a simple example so everything is returned in one page.
http://api.gbif.org/v1/occurrence/search?facet=ScientificName&limit=10&a...
The count value is 4, the number of results is 4. But the number of facets is 1, and its count is 2.
The faceted term ScientificName is a mandatory field, so no null values should happen. I would expect every occurrence having a value for it. And the number of values is short, so everything is returned in one request (no paging needed). So, in such a case shouldn't the sum of facet counts be equal to the number of results? Why the count of the faceted name is not 4?
(2) LOWERCASE FACETS (facets values not matching results values):
Look at the same api request above (plant names)
results: Scientificname: "Generic_name specific_name (Basionym_Authors) Name_Authors"
facets: name: "generic_name specific_name (basionym_authors) name_authors"
Why are the facets names always in lowercase? I would say that is an error which shouldn't happen.
But I reported it some days ago and got no answer, so I wonder if this is the intended api behaviour.
http://dev.gbif.org/issues/browse/PF-2758
Not only scientific names are lowercased. This also happens to collectionCode in the next question.
(3) FACETING COLLECTIONCODE VALUES of a single institution fails depending on filtering parameter used to match the institution (code or uuid):
Our institution (uuid= def87a70-0837-11d9-acb2-b8a03c50a862 , institutionCode=SANT) serves datasets from 4 collections, which should sum up more than 100000 records.
Why do I get only 2 of our 4 datasets faceted in the following request, which uses our publishingOrg uuid? (uuid should be the preferred option to do this, as code might not be unique for our institution)
http://api.gbif.org/v1/occurrence/search?facet=collectionCode&limit=0&am...
Why do I got 4 of 4 if I filter the request using institutionCode instead? (fortunately, nobody else uses the same institutionCode yet, so numbers are correct)
http://api.gbif.org/v1/occurrence/search?facet=collectionCode&limit=0&am...
And why do counts differ for the same facet value (sant-lich) in those two requests? (9960 in the 1st request, 10007 in the 2nd one)
Why are facet values lowercase again? ("sant-lich" instead of "SANT-Lich")
(4) FACETING SCIENTIFICNAME FAILS FOR SOME DATASETS, but works as expected for others:
More than 1000 faceted Scientificnames returned for our SANT-Lich and SANT-Algae collections. Both of them look correct results:
http://api.gbif.org/v1/occurrence/search?facet=ScientificName&limit=0&am...
http://api.gbif.org/v1/occurrence/search?facet=ScientificName&limit=0&am...
But no facets returned for SANT-Bryo (which contains several hundred distinct scientificname values):
http://api.gbif.org/v1/occurrence/search?facet=ScientificName&limit=0&am...
And only 7 facets for SANT scientificnames (should be over 10 thousand, as this is by far our largest dataset):
http://api.gbif.org/v1/occurrence/search?facet=ScientificName&limit=0&am...
Other than the lowercase facets issue (2), I couldn't reproduce issues 1,3,4 in other institutions datasets. So I wonder if all this is somehow related to a wrong indexing of our IPT.
Has anyone else detected these problems?
Thanks a lot in advance for your help
David
-- David García San León Herbario SANT Universidade de Santiago de Compostela
Thanks a lot Federico & Markus.
I am just looking for a quick and easy summary of the names of specimens stored in our collections.
Faceting on their keys might be easy, but does not help in terms of speed. We would still need to launch thousands of api requests (one per key) to achieve what I was trying to find out in just one http request.
So I prefer to wait until faceting ScientificNames (and collection/instution codes/uuids) works as expected.
Is there any risk if we publish any updates on our IPT while you are fixing wrong indexing of our datasets?
On the other hand, as you mentioned keys, I wonder if faceting is always limited to facet-count pairs. Perhaps there might be triads in some particular cases?
I would find it helpful if faceting ScientificNames could return the faceted names, their counts, PLUS their keys (instead of just names & counts). That would facilitate further connections to other api requests which are based on keys.
Thanks a lot for your help
David
On 14 February 2017 at 11:23, Federico Mendez fmendez@gbif.org wrote:
David,
*(1) RESULTS COUNT NOT MATCHING SUM OF ALL FACET COUNTS*
It seems that something went wrong indexing some SANT datasets, next week we’ll rebuild our search index and hopefully some of those issues will disappear. By the moment I’d suggest to facet on species_key which seems to give the correct number of records: http://api.gbif.org/v1/occurrence/search?facet=species_key&limit=10& collectionCode=SANT-Lich&genusKey=2581943
*(2) LOWERCASE FACETS (facets values not matching results values):*
This was an implementation decision to deal with cases which values were reported using different casing standards, which mostly applies for fields like locality and authors but not really for scientific name, collection code, record number, etc. We’ll test a new version of the search index that preserve the original values and we’ll report the results back to you.
*(3) FACETING COLLECTIONCODE VALUES of a single institution fails depending on filtering parameter used to match the institution (code or uuid):*
It looks to me that just like in (1) something went wrong indexing some datasets, as you can see in the following queries you get a difference in amount of datasets published by ’SANT’ vs its correspondent organisation UUID: http://api.gbif.org/v1/occurrence/search?facet=collectionCode&limit=0&am... publishingOrg=def87a70-0837-11d9-acb2-b8a03c50a862&facet=dataset_key —> 2 http://api.gbif.org/v1/occurrence/search?facet=collectionCode&limit=0&am... institutionCode=SANT&facet=dataset_key —> 4
*(4) FACETING SCIENTIFICNAME FAILS FOR SOME DATASETS, but works as expected for others: *
I’d recommend that, by now, you should try to facet on species_key, we’ll test all these inconsistencies later next week, we are in the processing of building a new GBIF Backbone (http://www.gbif.org/dataset/ d7dddbf4-2cf0-4f39-9b2a-bb099caae36c) and that will require to reprocess all the occurrence records, while doing that we try to fix some the issues reported here.
Thanks, Federico Mendez.
From: API-users api-users-bounces@lists.gbif.org on behalf of Herbario SANT sant.herbarium@gmail.com Date: Friday 10 February 2017 at 22:24 To: "api-users@lists.gbif.org" api-users@lists.gbif.org Cc: helpdesk helpdesk@gbif.org Subject: [API-users] wrong faceted API results/counts (at least in our datasets) and lowercased facet values (everywhere)
Hello.
I am trying to play with faceted results from the occurrence api, but returned values are very odd IMHO.
Perhaps I am misunderstanding how faceting should work? Or there might be some problem with the indexing of these particular datasets. I am pretty lost. This is what I found:
*(1) RESULTS COUNT NOT MATCHING SUM OF ALL FACET COUNTS*
I put a simple example so everything is returned in one page.
http://api.gbif.org/v1/occurrence/search?facet=ScientificName&limit=10&a... collectionCode=SANT-Lich&genusKey=2581943
The count value is 4, the number of results is 4. But the number of facets is 1, and its count is 2.
The faceted term ScientificName is a mandatory field, so no null values should happen. I would expect every occurrence having a value for it. And the number of values is short, so everything is returned in one request (no paging needed). So, in such a case shouldn't the sum of facet counts be equal to the number of results? Why the count of the faceted name is not 4?
*(2) LOWERCASE FACETS (facets values not matching results values): * Look at the same api request above (plant names)
results: Scientificname: "Generic_name specific_name (Basionym_Authors) Name_Authors"
facets: name: "generic_name specific_name (basionym_authors) name_authors"
Why are the facets names always in lowercase? I would say that is an error which shouldn't happen.
But I reported it some days ago and got no answer, so I wonder if this is the intended api behaviour.
http://dev.gbif.org/issues/browse/PF-2758
Not only scientific names are lowercased. This also happens to collectionCode in the next question.
*(3) FACETING COLLECTIONCODE VALUES of a single institution fails depending on filtering parameter used to match the institution (code or uuid):*
Our institution (uuid= def87a70-0837-11d9-acb2-b8a03c50a862 , institutionCode=SANT) serves datasets from 4 collections, which should sum up more than 100000 records.
Why do I get only 2 of our 4 datasets faceted in the following request, which uses our publishingOrg uuid? (uuid should be the preferred option to do this, as code might not be unique for our institution)
http://api.gbif.org/v1/occurrence/search?facet=collectionCode&limit=0&am... publishingOrg=def87a70-0837-11d9-acb2-b8a03c50a862
Why do I got 4 of 4 if I filter the request using institutionCode instead? (fortunately, nobody else uses the same institutionCode yet, so numbers are correct)
http://api.gbif.org/v1/occurrence/search?facet=collectionCode&limit=0&am... institutionCode=SANT
And why do counts differ for the same facet value (sant-lich) in those two requests? (9960 in the 1st request, 10007 in the 2nd one)
Why are facet values lowercase again? ("sant-lich" instead of "SANT-Lich")
*(4) FACETING SCIENTIFICNAME FAILS FOR SOME DATASETS, but works as expected for others: *
More than 1000 faceted Scientificnames returned for our SANT-Lich and SANT-Algae collections. Both of them look correct results:
http://api.gbif.org/v1/occurrence/search?facet=ScientificName&limit=0&am... collectionCode=SANT-Lich&ScientificName.facetLimit=50000&ScientificName. facetOffset=0
http://api.gbif.org/v1/occurrence/search?facet=ScientificName&limit=0&am... collectionCode=SANT-Algae&ScientificName.facetLimit=50000&ScientificName. facetOffset=0
But no facets returned for SANT-Bryo (which contains several hundred distinct scientificname values):
http://api.gbif.org/v1/occurrence/search?facet=ScientificName&limit=0&am... collectionCode=SANT-Bryo&ScientificName.facetLimit=50000&ScientificName. facetOffset=0
And only 7 facets for SANT scientificnames (should be over 10 thousand, as this is by far our largest dataset):
http://api.gbif.org/v1/occurrence/search?facet=ScientificName&limit=0&am... collectionCode=SANT&ScientificName.facetLimit=50000&ScientificName. facetOffset=0
Other than the lowercase facets issue (2), I couldn't reproduce issues 1,3,4 in other institutions datasets. So I wonder if all this is somehow related to a wrong indexing of our IPT.
Has anyone else detected these problems?
Thanks a lot in advance for your help
David
-- David García San León Herbario SANT Universidade de Santiago de Compostela
David, it’s ok to publish thru IPT, just be aware that your changes will be available in the portal until next week.
Federico Mendez.
From: Herbario SANT <sant.herbarium@gmail.commailto:sant.herbarium@gmail.com> Date: Tuesday 14 February 2017 at 12:15 To: Federico Mendez <fmendez@gbif.orgmailto:fmendez@gbif.org> Cc: "api-users@lists.gbif.orgmailto:api-users@lists.gbif.org" <api-users@lists.gbif.orgmailto:api-users@lists.gbif.org>, helpdesk <helpdesk@gbif.orgmailto:helpdesk@gbif.org> Subject: Re: [API-users] wrong faceted API results/counts (at least in our datasets) and lowercased facet values (everywhere)
Thanks a lot Federico & Markus.
I am just looking for a quick and easy summary of the names of specimens stored in our collections.
Faceting on their keys might be easy, but does not help in terms of speed. We would still need to launch thousands of api requests (one per key) to achieve what I was trying to find out in just one http request.
So I prefer to wait until faceting ScientificNames (and collection/instution codes/uuids) works as expected.
Is there any risk if we publish any updates on our IPT while you are fixing wrong indexing of our datasets?
On the other hand, as you mentioned keys, I wonder if faceting is always limited to facet-count pairs. Perhaps there might be triads in some particular cases?
I would find it helpful if faceting ScientificNames could return the faceted names, their counts, PLUS their keys (instead of just names & counts). That would facilitate further connections to other api requests which are based on keys.
Thanks a lot for your help
David
On 14 February 2017 at 11:23, Federico Mendez <fmendez@gbif.orgmailto:fmendez@gbif.org> wrote: David,
(1) RESULTS COUNT NOT MATCHING SUM OF ALL FACET COUNTS
It seems that something went wrong indexing some SANT datasets, next week we’ll rebuild our search index and hopefully some of those issues will disappear. By the moment I’d suggest to facet on species_key which seems to give the correct number of records: http://api.gbif.org/v1/occurrence/search?facet=species_key&limit=10&...
(2) LOWERCASE FACETS (facets values not matching results values):
This was an implementation decision to deal with cases which values were reported using different casing standards, which mostly applies for fields like locality and authors but not really for scientific name, collection code, record number, etc. We’ll test a new version of the search index that preserve the original values and we’ll report the results back to you.
(3) FACETING COLLECTIONCODE VALUES of a single institution fails depending on filtering parameter used to match the institution (code or uuid):
It looks to me that just like in (1) something went wrong indexing some datasets, as you can see in the following queries you get a difference in amount of datasets published by ’SANT’ vs its correspondent organisation UUID: http://api.gbif.org/v1/occurrence/search?facet=collectionCode&limit=0&am... —> 2 http://api.gbif.org/v1/occurrence/search?facet=collectionCode&limit=0&am... —> 4
(4) FACETING SCIENTIFICNAME FAILS FOR SOME DATASETS, but works as expected for others:
I’d recommend that, by now, you should try to facet on species_key, we’ll test all these inconsistencies later next week, we are in the processing of building a new GBIF Backbone (http://www.gbif.org/dataset/d7dddbf4-2cf0-4f39-9b2a-bb099caae36c) and that will require to reprocess all the occurrence records, while doing that we try to fix some the issues reported here.
Thanks, Federico Mendez.
From: API-users <api-users-bounces@lists.gbif.orgmailto:api-users-bounces@lists.gbif.org> on behalf of Herbario SANT <sant.herbarium@gmail.commailto:sant.herbarium@gmail.com> Date: Friday 10 February 2017 at 22:24 To: "api-users@lists.gbif.orgmailto:api-users@lists.gbif.org" <api-users@lists.gbif.orgmailto:api-users@lists.gbif.org> Cc: helpdesk <helpdesk@gbif.orgmailto:helpdesk@gbif.org> Subject: [API-users] wrong faceted API results/counts (at least in our datasets) and lowercased facet values (everywhere)
Hello.
I am trying to play with faceted results from the occurrence api, but returned values are very odd IMHO.
Perhaps I am misunderstanding how faceting should work? Or there might be some problem with the indexing of these particular datasets. I am pretty lost. This is what I found:
(1) RESULTS COUNT NOT MATCHING SUM OF ALL FACET COUNTS
I put a simple example so everything is returned in one page.
http://api.gbif.org/v1/occurrence/search?facet=ScientificName&limit=10&a...
The count value is 4, the number of results is 4. But the number of facets is 1, and its count is 2.
The faceted term ScientificName is a mandatory field, so no null values should happen. I would expect every occurrence having a value for it. And the number of values is short, so everything is returned in one request (no paging needed). So, in such a case shouldn't the sum of facet counts be equal to the number of results? Why the count of the faceted name is not 4?
(2) LOWERCASE FACETS (facets values not matching results values):
Look at the same api request above (plant names)
results: Scientificname: "Generic_name specific_name (Basionym_Authors) Name_Authors"
facets: name: "generic_name specific_name (basionym_authors) name_authors"
Why are the facets names always in lowercase? I would say that is an error which shouldn't happen.
But I reported it some days ago and got no answer, so I wonder if this is the intended api behaviour.
http://dev.gbif.org/issues/browse/PF-2758
Not only scientific names are lowercased. This also happens to collectionCode in the next question.
(3) FACETING COLLECTIONCODE VALUES of a single institution fails depending on filtering parameter used to match the institution (code or uuid):
Our institution (uuid= def87a70-0837-11d9-acb2-b8a03c50a862 , institutionCode=SANT) serves datasets from 4 collections, which should sum up more than 100000 records.
Why do I get only 2 of our 4 datasets faceted in the following request, which uses our publishingOrg uuid? (uuid should be the preferred option to do this, as code might not be unique for our institution)
http://api.gbif.org/v1/occurrence/search?facet=collectionCode&limit=0&am...
Why do I got 4 of 4 if I filter the request using institutionCode instead? (fortunately, nobody else uses the same institutionCode yet, so numbers are correct)
http://api.gbif.org/v1/occurrence/search?facet=collectionCode&limit=0&am...
And why do counts differ for the same facet value (sant-lich) in those two requests? (9960 in the 1st request, 10007 in the 2nd one)
Why are facet values lowercase again? ("sant-lich" instead of "SANT-Lich")
(4) FACETING SCIENTIFICNAME FAILS FOR SOME DATASETS, but works as expected for others:
More than 1000 faceted Scientificnames returned for our SANT-Lich and SANT-Algae collections. Both of them look correct results:
http://api.gbif.org/v1/occurrence/search?facet=ScientificName&limit=0&am...
http://api.gbif.org/v1/occurrence/search?facet=ScientificName&limit=0&am...
But no facets returned for SANT-Bryo (which contains several hundred distinct scientificname values):
http://api.gbif.org/v1/occurrence/search?facet=ScientificName&limit=0&am...
And only 7 facets for SANT scientificnames (should be over 10 thousand, as this is by far our largest dataset):
http://api.gbif.org/v1/occurrence/search?facet=ScientificName&limit=0&am...
Other than the lowercase facets issue (2), I couldn't reproduce issues 1,3,4 in other institutions datasets. So I wonder if all this is somehow related to a wrong indexing of our IPT.
Has anyone else detected these problems?
Thanks a lot in advance for your help
David
-- David García San León Herbario SANT Universidade de Santiago de Compostela
-- David García San León Herbario SANT Universidade de Santiago de Compostela
participants (3)
-
Federico Mendez
-
Herbario SANT
-
Markus Döring