[API-users] wrong faceted API results/counts (at least in our datasets) and lowercased facet values (everywhere)
sant.herbarium at gmail.com
Fri Feb 10 22:24:36 CET 2017
I am trying to play with faceted results from the occurrence api, but
returned values are very odd IMHO.
Perhaps I am misunderstanding how faceting should work? Or there might be
some problem with the indexing of these particular datasets.
I am pretty lost. This is what I found:
*(1) RESULTS COUNT NOT MATCHING SUM OF ALL FACET COUNTS*
I put a simple example so everything is returned in one page.
The count value is 4, the number of results is 4.
But the number of facets is 1, and its count is 2.
The faceted term ScientificName is a mandatory field, so no null values
should happen. I would expect every occurrence having a value for it.
And the number of values is short, so everything is returned in one request
(no paging needed).
So, in such a case shouldn't the sum of facet counts be equal to the number
Why the count of the faceted name is not 4?
*(2) LOWERCASE FACETS (facets values not matching results values):*
Look at the same api request above (plant names)
Scientificname: "Generic_name specific_name (Basionym_Authors) Name_Authors"
name: "generic_name specific_name (basionym_authors) name_authors"
Why are the facets names always in lowercase?
I would say that is an error which shouldn't happen.
But I reported it some days ago and got no answer, so I wonder if this is
the intended api behaviour.
Not only scientific names are lowercased. This also happens to
collectionCode in the next question.
*(3) FACETING COLLECTIONCODE VALUES of a single institution fails depending
on filtering parameter used to match the institution (code or uuid):*
Our institution (uuid= def87a70-0837-11d9-acb2-b8a03c50a862 ,
institutionCode=SANT) serves datasets from 4 collections, which should sum
up more than 100000 records.
Why do I get only 2 of our 4 datasets faceted in the following request,
which uses our publishingOrg uuid? (uuid should be the preferred option to
do this, as code might not be unique for our institution)
Why do I got 4 of 4 if I filter the request using institutionCode instead?
(fortunately, nobody else uses the same institutionCode yet, so numbers are
And why do counts differ for the same facet value (sant-lich) in those two
(9960 in the 1st request, 10007 in the 2nd one)
Why are facet values lowercase again? ("sant-lich" instead of "SANT-Lich")
*(4) FACETING SCIENTIFICNAME FAILS FOR SOME DATASETS, but works as expected
for others: *
More than 1000 faceted Scientificnames returned for our SANT-Lich and
SANT-Algae collections. Both of them look correct results:
But no facets returned for SANT-Bryo (which contains several hundred
distinct scientificname values):
And only 7 facets for SANT scientificnames (should be over 10 thousand, as
this is by far our largest dataset):
Other than the lowercase facets issue (2), I couldn't reproduce issues
1,3,4 in other institutions datasets.
So I wonder if all this is somehow related to a wrong indexing of our IPT.
Has anyone else detected these problems?
Thanks a lot in advance for your help
David García San León
Universidade de Santiago de Compostela
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the API-users