Hi,
I am still stuck with this issue. I tested different encoding with the data and I am pretty sure the issue lies in the IPT/Tomcat area (TapirLink/Apache is fine).
Burke, did you manage to get a correct result with the data I sent you? If yes, what did you do and what is your configuration?
Cheers, Mickaël ________________________________________ From: Burke Chih-Jen Ko (GBIF) [bko@gbif.org] Sent: Tuesday, August 16, 2011 10:07 AM To: Mickael Graf Cc: Johan Dunfalk; GBIF IPT mailing list; GBIF Helpdesk Subject: Re: GBIF Case 1773: UTF8
Hi Mickaël,
I'd like to learn some details about encoding settings on your side.
1. Did you use mysqldump to create the script that you sent me earlier? 2. From the script, I can see the database stores data in UTF-8, is it correct? 3. Since the character in the dump sql is already broken, could you try, if you temporarily change the connection charset to utf-8, does the same dump contains the correct character for accented letters? Or please try --default-character-set=latin1 as one of your dump option. 4. Since all charset settings on your side appear to be latin1, are all databases hosted on the mysql server using UTF8 as the encoding? Including the one serves TapirLink?
I am trying to reproduce your environment here.
Cheers,
Burke
On Aug 16, 2011, at 9:26 AM, Mickael Graf wrote:
Hi Burke,
I tried both UTF-8, Latin1 and Windows 1252, but the result looks always the same. It looks like this setting has no influence on the final result, at least here.
/Mickaël
From: Burke Chih-Jen Ko (GBIF) [bko@gbif.org] Sent: Monday, August 15, 2011 4:24 PM To: Mickael Graf Cc: Johan Dunfalk; GBIF IPT mailing list Subject: Re: GBIF Case 1773: UTF8
Hi Michaël,
Have you tried using Latin 1 as the character encoding in the source data editing page of IPT?
Burke
On Aug 15, 2011, at 3:19 PM, Mickael Graf wrote:
Hi Burke,
Changing my.cnf breaks everything. So I reversed back. I need to study how to correctly migrate my data to a complete utf8 system. MySQL still comes with latin1 as default (I just checked on my Ubuntu 11.04!)
How can I test Tim's suggestion?
Cheers, Mickaël
From: Burke Chih-Jen Ko (GBIF) [bko@gbif.org] Sent: Friday, August 12, 2011 4:30 PM To: Mickael Graf Cc: Johan Dunfalk; GBIF IPT mailing list Subject: Re: GBIF Case 1773: UTF8
Hi Mickaël,
I can see from your script that the database is created using UTF-8, but it could be the connection characterset that interprets the UTF-8 information as iso-8859-1. Force opening a UTF-8 text file with Närke using latin1 charset indeed render the text as Närke.
In the [mysqld] section of /etc/my.cnf, you can instruct the server to start with preferred characterset and collation:
character_set_server=utf8 default-character-set=utf8 character_set_client=utf8 collation_server=utf8_general_ci skip-character-set-client-handshake
The last line force the connection charset as the one specified for the server.
So I suggest some steps:
- Add lines above to your my.cnf
- Restart the mysql,
- First see if things still looks the same on TapirLink.
- Try export the same sample script you gave us earlier, if the närke shows as it should be, then it should be fine on IPT.
Let me know if this setting works.
But if this change breaks TapirLink and others, you'll need to decide to configure others all or, see if Tim's suggestion works. (jdbc:mysql://localhost:3306/specimen_collections?autoReconnect=true&useUnicode=true&characterEncoding=UTF8&characterSetResults=UTF8)
Cheers,
Burke