<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Hi Jorrit,<br>
</div>
<blockquote
cite="mid:2911AEC7-9919-4530-AB09-CD0B118F93D9@xs4all.nl"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<div class="">Ok. Sounds like we are on the same page. What do you
think would be the most effective way to document this content
issue?</div>
</blockquote>
collecting a bunch of links to API responses that include mangled
characters looks like a good option to me.<br>
<br>
Also, you might want to follow the links to the datasets and their
providers, and all the way back to the dataset source pages (some
three links to follow or so) and see if the mangled characters show
up as well on the pages of the original data providers.<br>
<br>
If the latter is the case, it's likely the providers' responsibility
to fix the data. If not, there might be an issue along the transfer
routes between the original providers and GBIF.<br>
<br>
Just a thought,<br>
Guido<br>
<br class="">
<blockquote
cite="mid:2911AEC7-9919-4530-AB09-CD0B118F93D9@xs4all.nl"
type="cite">
<div>
<blockquote type="cite" class="">
<div class="">On Nov 23, 2015, at 3:35 PM, Guido Sautter <<a
moz-do-not-send="true" href="mailto:sautter@ipd.uka.de"
class=""><a class="moz-txt-link-abbreviated" href="mailto:sautter@ipd.uka.de">sautter@ipd.uka.de</a></a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type" class="">
<div bgcolor="#FFFFFF" text="#000000" class="">
<div class="moz-cite-prefix">Hi Jorrit,<br class="">
</div>
<blockquote
cite="mid:A8D664E7-7024-492F-9020-4645484374A3@xs4all.nl"
type="cite" class="">Thanks for your reply.</blockquote>
welcome as can be.<br class="">
<br class="">
<blockquote
cite="mid:A8D664E7-7024-492F-9020-4645484374A3@xs4all.nl"
type="cite" class="">
<div class="">Thanks for confirming that there’s an
character conversion issue happening somewhere. </div>
<div class=""><br class="">
</div>
<div class="">Since the mangled characters appear in
both html and json provided by GBIF, I’d say it is
probably a gbif issue.</div>
</blockquote>
Well, what we can say at this point is that GBIF _has_
mangled characters ... which doesn't mean the mangling
necessarily happened at their facilities.<br class="">
<br class="">
<blockquote
cite="mid:A8D664E7-7024-492F-9020-4645484374A3@xs4all.nl"
type="cite" class="">
<div class="">Is there a way to find out whether the
invalid character handling occurs in a data provider
or within GBIF itself?</div>
</blockquote>
Sorry to say, no. That's why I stated that characters got
mangled "at some point". All we can say is that it
happened upstream from GBIF's API.<br class="">
<br class="">
Best,<br class="">
Guido<br class="">
<br class="">
<blockquote
cite="mid:A8D664E7-7024-492F-9020-4645484374A3@xs4all.nl"
type="cite" class="">
<div class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On Nov 23, 2015, at 3:14 PM, Guido
Sautter <<a moz-do-not-send="true"
href="mailto:sautter@ipd.uka.de" class="">sautter@ipd.uka.de</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type" class="">
<div bgcolor="#FFFFFF" text="#000000" class="">
<div class="moz-cite-prefix">That usually
happens when, at some point, UTF-8 encoded
text is read as ANSI. It only happens if the
text contains characters above 127 (0x79),
however.<br class="">
<br class="">
Hope that helps,<br class="">
Guido<br class="">
<br class="">
</div>
<blockquote
cite="mid:62A4D2F9-1172-4E2A-A84D-BC3929211EA5@xs4all.nl"
type="cite" class="">
<meta http-equiv="Content-Type"
content="text/html; charset=windows-1252"
class="">
Hey y’all:
<div class=""><br class="">
</div>
<div class="">I am noticing some funny
characters (e.g. "Wintergrün”) for
species available here:</div>
<div class=""><br class="">
</div>
<div class=""><a moz-do-not-send="true"
href="http://www.gbif.org/species/2882753/vernaculars"
class="">http://www.gbif.org/species/2882753/vernaculars</a></div>
<div class=""><br class="">
</div>
<div class="">Same is observed using the
api:</div>
<div class=""><br class="">
</div>
<div class=""><a moz-do-not-send="true"
href="http://api.gbif.org/v1/species/2882753/vernacularNames"
class="">http://api.gbif.org/v1/species/2882753/vernacularNames</a></div>
<div class=""><br class="">
</div>
<div class="">I am assuming that the actual
common name should be something like
“Wintergrün”.</div>
<div class=""><br class="">
</div>
<div class="">While I was looking into this,
I also noticed that no characterset is
specified in http response headers.</div>
<div class=""><br class="">
</div>
<div class="">Please confirm that this is
expected behavior. </div>
<div class=""><br class="">
</div>
<div class="">thx,</div>
<div class="">-jorrit</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<br class="">
<fieldset class="mimeAttachmentHeader"></fieldset>
<br class="">
<pre class="" wrap="">_______________________________________________
API-users mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:API-users@lists.gbif.org">API-users@lists.gbif.org</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://lists.gbif.org/mailman/listinfo/api-users">http://lists.gbif.org/mailman/listinfo/api-users</a>
</pre>
</blockquote>
<br class="">
</div>
_______________________________________________<br
class="">
API-users mailing list<br class="">
<a moz-do-not-send="true"
href="mailto:API-users@lists.gbif.org"
class="">API-users@lists.gbif.org</a><br
class="">
<a moz-do-not-send="true"
class="moz-txt-link-freetext"
href="http://lists.gbif.org/mailman/listinfo/api-users">http://lists.gbif.org/mailman/listinfo/api-users</a><br
class="">
</div>
</blockquote>
</div>
<br class="">
</div>
</blockquote>
<br class="">
</div>
</div>
</blockquote>
</div>
<br class="">
</blockquote>
<br>
</body>
</html>