[IPT] IPT Digest, Vol 30, Issue 5

Stewart, Aimee Marian astewart at ku.edu
Wed Aug 17 21:08:09 CEST 2011

Sent from 785-331-8952.

"ipt-request at lists.gbif.org" <ipt-request at lists.gbif.org> wrote:

Send IPT mailing list submissions to
        ipt at lists.gbif.org

To subscribe or unsubscribe via the World Wide Web, visit
or, via email, send a message with subject or body 'help' to
        ipt-request at lists.gbif.org

You can reach the person managing the list at
        ipt-owner at lists.gbif.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of IPT digest..."

Today's Topics:

   1. Re: GBIF Case 1773: UTF8 (Mickael Graf)


Message: 1
Date: Tue, 16 Aug 2011 14:38:53 +0200
From: Mickael Graf <Mickael.Graf at nrm.se>
Subject: Re: [IPT] GBIF Case 1773: UTF8
To: "Burke Chih-Jen Ko (GBIF)" <bko at gbif.org>
Cc: GBIF IPT mailing list <ipt at lists.gbif.org>, Johan Dunfalk
        <Johan.Dunfalk at nrm.se>, GBIF Helpdesk <helpdesk at gbif.org>
Message-ID: <9F4FCAEFF4CBCB4CA1AF9E8BDABF4B8FA3629854B0 at saruman>
Content-Type: text/plain; charset="iso-8859-1"

> 1. Did you use mysqldump to create the script that you sent me earlier?

Yes, for the table definition and the data. The script for the view is hand written.

> 2. From the script, I can see the database stores data in UTF-8, is it correct?

Unfortunately no. Default settings for the server are latin1 so is the database. The data itself is (well, shoud be...) utf8 and the table definition has utf8 as character set.

>3. Since the character in the dump sql is already broken, could you try, if you temporarily change the connection charset to utf-8, does the same dump contains the correct character for accented letters? Or please try --default-character-set=latin1 as one of your dump option.

I just did a dump with --default-character-set=latin1 and I can read the accented letters in less,emacs and firefox (where I need to specify it's utf8).

>4. Since all charset settings on your side appear to be latin1, are all databases hosted on the mysql server using UTF8 as the encoding? Including the one serves TapirLink?

Most of my databases use latin1, but I have two of them created using utf8. But for all of them the accented letters are correctly displayed with TapirLink.


I am trying to reproduce your environment here.



On Aug 16, 2011, at 9:26 AM, Mickael Graf wrote:

> Hi Burke,
> I tried both UTF-8, Latin1 and Windows 1252, but the result looks always the same. It looks like this setting has no influence on the final result, at least here.
> /Micka?l
> ________________________________________
> From: Burke Chih-Jen Ko (GBIF) [bko at gbif.org]
> Sent: Monday, August 15, 2011 4:24 PM
> To: Mickael Graf
> Cc: Johan Dunfalk; GBIF IPT mailing list
> Subject: Re: GBIF Case 1773: UTF8
> Hi Micha?l,
> Have you tried using Latin 1 as the character encoding in the source data editing page of IPT?
> Burke
> On Aug 15, 2011, at 3:19 PM, Mickael Graf wrote:
>> Hi Burke,
>> Changing my.cnf breaks everything. So I reversed back. I need to study how to correctly migrate my data to a complete utf8 system.
>> MySQL still comes with latin1 as default (I just checked on my Ubuntu 11.04!)
>> How can I test Tim's suggestion?
>> Cheers,
>> Micka?l
>> ________________________________________
>> From: Burke Chih-Jen Ko (GBIF) [bko at gbif.org]
>> Sent: Friday, August 12, 2011 4:30 PM
>> To: Mickael Graf
>> Cc: Johan Dunfalk; GBIF IPT mailing list
>> Subject: Re: GBIF Case 1773: UTF8
>> Hi Micka?l,
>> I can see from your script that the database is created using UTF-8, but it could be the connection characterset that interprets the UTF-8 information as iso-8859-1. Force opening a UTF-8 text file with N?rke using latin1 charset indeed render the text as N??rke.
>> In the [mysqld] section of /etc/my.cnf, you can instruct the server to start with preferred characterset and collation:
>> character_set_server=utf8
>> default-character-set=utf8
>> character_set_client=utf8
>> collation_server=utf8_general_ci
>> skip-character-set-client-handshake
>> The last line force the connection charset as the one specified for the server.
>> So I suggest some steps:
>> 1. Add lines above to your my.cnf
>> 2. Restart the mysql,
>> 3. First see if things still looks the same on TapirLink.
>> 4. Try export the same sample script you gave us earlier, if the n?rke shows as it should be, then it should be fine on IPT.
>> Let me know if this setting works.
>> But if this change breaks TapirLink and others, you'll need to decide to configure others all or, see if Tim's suggestion works. (jdbc:mysql://localhost:3306/specimen_collections?autoReconnect=true&useUnicode=true&characterEncoding=UTF8&characterSetResults=UTF8)
>> Cheers,
>> Burke


IPT mailing list
IPT at lists.gbif.org

End of IPT Digest, Vol 30, Issue 5

More information about the IPT mailing list