<html>
  <head>
    <meta content="text/html; charset=windows-1252"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">Hi Jorrit,<br>
    </div>
    <blockquote
      cite="mid:2911AEC7-9919-4530-AB09-CD0B118F93D9@xs4all.nl"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html;
        charset=windows-1252">
      <div class="">Ok. Sounds like we are on the same page. What do you
        think would be the most effective way to document this content
        issue?</div>
    </blockquote>
    collecting a bunch of links to API responses that include mangled
    characters looks like a good option to me.<br>
    <br>
    Also, you might want to follow the links to the datasets and their
    providers, and all the way back to the dataset source pages (some
    three links to follow or so) and see if the mangled characters show
    up as well on the pages of the original data providers.<br>
    <br>
    If the latter is the case, it's likely the providers' responsibility
    to fix the data. If not, there might be an issue along the transfer
    routes between the original providers and GBIF.<br>
    <br>
    Just a thought,<br>
    Guido<br>
    <br class="">
    <blockquote
      cite="mid:2911AEC7-9919-4530-AB09-CD0B118F93D9@xs4all.nl"
      type="cite">
      <div>
        <blockquote type="cite" class="">
          <div class="">On Nov 23, 2015, at 3:35 PM, Guido Sautter <<a
              moz-do-not-send="true" href="mailto:sautter@ipd.uka.de"
              class=""><a class="moz-txt-link-abbreviated" href="mailto:sautter@ipd.uka.de">sautter@ipd.uka.de</a></a>> wrote:</div>
          <br class="Apple-interchange-newline">
          <div class="">
            <meta content="text/html; charset=windows-1252"
              http-equiv="Content-Type" class="">
            <div bgcolor="#FFFFFF" text="#000000" class="">
              <div class="moz-cite-prefix">Hi Jorrit,<br class="">
              </div>
              <blockquote
                cite="mid:A8D664E7-7024-492F-9020-4645484374A3@xs4all.nl"
                type="cite" class="">Thanks for your reply.</blockquote>
              welcome as can be.<br class="">
              <br class="">
              <blockquote
                cite="mid:A8D664E7-7024-492F-9020-4645484374A3@xs4all.nl"
                type="cite" class="">
                <div class="">Thanks for confirming that there’s an
                  character conversion issue happening somewhere. </div>
                <div class=""><br class="">
                </div>
                <div class="">Since the mangled characters appear in
                  both html and json provided by GBIF, I’d say it is
                  probably a gbif issue.</div>
              </blockquote>
              Well, what we can say at this point is that GBIF _has_
              mangled characters ... which doesn't mean the mangling
              necessarily happened at their facilities.<br class="">
              <br class="">
              <blockquote
                cite="mid:A8D664E7-7024-492F-9020-4645484374A3@xs4all.nl"
                type="cite" class="">
                <div class="">Is there a way to find out whether the
                  invalid character handling occurs in a data provider
                  or within GBIF itself?</div>
              </blockquote>
              Sorry to say, no. That's why I stated that characters got
              mangled "at some point". All we can say is that it
              happened upstream from GBIF's API.<br class="">
              <br class="">
              Best,<br class="">
              Guido<br class="">
              <br class="">
              <blockquote
                cite="mid:A8D664E7-7024-492F-9020-4645484374A3@xs4all.nl"
                type="cite" class="">
                <div class="">
                  <div class="">
                    <blockquote type="cite" class="">
                      <div class="">On Nov 23, 2015, at 3:14 PM, Guido
                        Sautter <<a moz-do-not-send="true"
                          href="mailto:sautter@ipd.uka.de" class="">sautter@ipd.uka.de</a>>

                        wrote:</div>
                      <br class="Apple-interchange-newline">
                      <div class="">
                        <meta content="text/html; charset=windows-1252"
                          http-equiv="Content-Type" class="">
                        <div bgcolor="#FFFFFF" text="#000000" class="">
                          <div class="moz-cite-prefix">That usually
                            happens when, at some point, UTF-8 encoded
                            text is read as ANSI. It only happens if the
                            text contains characters above 127 (0x79),
                            however.<br class="">
                            <br class="">
                            Hope that helps,<br class="">
                            Guido<br class="">
                            <br class="">
                          </div>
                          <blockquote
                            cite="mid:62A4D2F9-1172-4E2A-A84D-BC3929211EA5@xs4all.nl"
                            type="cite" class="">
                            <meta http-equiv="Content-Type"
                              content="text/html; charset=windows-1252"
                              class="">
                            Hey y’all:
                            <div class=""><br class="">
                            </div>
                            <div class="">I am noticing some funny
                              characters (e.g. "Wintergrün”) for
                              species available here:</div>
                            <div class=""><br class="">
                            </div>
                            <div class=""><a moz-do-not-send="true"
                                href="http://www.gbif.org/species/2882753/vernaculars"
                                class="">http://www.gbif.org/species/2882753/vernaculars</a></div>
                            <div class=""><br class="">
                            </div>
                            <div class="">Same is observed using the
                              api:</div>
                            <div class=""><br class="">
                            </div>
                            <div class=""><a moz-do-not-send="true"
                                href="http://api.gbif.org/v1/species/2882753/vernacularNames"
                                class="">http://api.gbif.org/v1/species/2882753/vernacularNames</a></div>
                            <div class=""><br class="">
                            </div>
                            <div class="">I am assuming that the actual
                              common name should be something like
                              “Wintergrün”.</div>
                            <div class=""><br class="">
                            </div>
                            <div class="">While I was looking into this,
                              I also noticed that no characterset is
                              specified in http response headers.</div>
                            <div class=""><br class="">
                            </div>
                            <div class="">Please confirm that this is
                              expected behavior. </div>
                            <div class=""><br class="">
                            </div>
                            <div class="">thx,</div>
                            <div class="">-jorrit</div>
                            <div class=""><br class="">
                            </div>
                            <div class=""><br class="">
                            </div>
                            <div class=""><br class="">
                            </div>
                            <div class=""><br class="">
                            </div>
                            <br class="">
                            <fieldset class="mimeAttachmentHeader"></fieldset>
                            <br class="">
                            <pre class="" wrap="">_______________________________________________
API-users mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:API-users@lists.gbif.org">API-users@lists.gbif.org</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://lists.gbif.org/mailman/listinfo/api-users">http://lists.gbif.org/mailman/listinfo/api-users</a>
</pre>
                          </blockquote>
                          <br class="">
                        </div>
                        _______________________________________________<br
                          class="">
                        API-users mailing list<br class="">
                        <a moz-do-not-send="true"
                          href="mailto:API-users@lists.gbif.org"
                          class="">API-users@lists.gbif.org</a><br
                          class="">
                        <a moz-do-not-send="true"
                          class="moz-txt-link-freetext"
                          href="http://lists.gbif.org/mailman/listinfo/api-users">http://lists.gbif.org/mailman/listinfo/api-users</a><br
                          class="">
                      </div>
                    </blockquote>
                  </div>
                  <br class="">
                </div>
              </blockquote>
              <br class="">
            </div>
          </div>
        </blockquote>
      </div>
      <br class="">
    </blockquote>
    <br>
  </body>
</html>