Hi Burke,
Thank you for your help. It helped me narrowing the problem and I think I am now close to publish the resource.
For the record I did a mysqldump of the table using --default-character-set=latin1 (dumping in two files, one for the structure, the other for the data), converted the files with the forceUTF8 library and replacing occurrences of "latin1" with "utf8".
The resulting data is displayed correctly both with IPT and with TapirLink, and with both latin1 and utf8 as the character set for MySQL server. But this is on my own computer, testing on the server with MySQL/latin1 gives me errors with TapirLink. But nevermind, I'll create a temporary database for the time of migration.
Again, thanks a lot.
Cheers, Mickaël ________________________________________ From: Burke Chih-Jen Ko (GBIF) [bko@gbif.org] Sent: Friday, August 26, 2011 11:06 AM To: Mickael Graf Cc: Johan Dunfalk; GBIF IPT mailing list; GBIF Helpdesk Subject: Re: GBIF Case 1773: UTF8
Hi MIchaël,
Yes this time I have correct data to test. I changed from your script is to re-save the file in latin1, and add drop-table script from your previous dump. It was saved in UTF-8 despite the sql settings in the script are all latin1. The refined file is attached.
Then I reproduce the environment as the steps here:
1) Adjust MySQL server and client encoding settings to match yours: The server and client connection show: Server characterset: latin1 Db characterset: latin1 Client characterset: latin1 Conn. characterset: latin1
2) Create a database from the attached script, the database encoding is latin1.
3) Follow the normal procedure to create a SQL source in IPT. See "settings" screen shot.
4) Since JDBC driver detects source encoding automatically, the encoding setting in the bottom-left doesn't matter for SQL source. However, we're thinking about forcing the encoding as instructed. Please refer to mysql jdbc connector page[1].
5) The preview result on my side is attached as the result.png image. Närke is rendered correctly, whether your browser encoding is latin1 or UTF-8.
Since we assume everything on your side is latin1, if it still doesn't work, you can change a line in the jdbc.properties file of a *deployed* IPT, to force jdbc encoding:
6) In [Tomcat root]/webapps/ipt/WEB-INF/classes, you have jdbc.properties, at line 7, you have
mysql.url=jdbc:mysql://{host}/{database}
7) add the encoding setting to the connection, so it reads as mysql.url=jdbc:mysql://{host}/{database}?characterEncoding=Cp1252
The encoding name used by JDBC driver is slightly different from MySQL[1, again].
Let me know if you can work out a refreshed result. Otherwise I suspect there was once UTF-8 encoding involved in certain steps while you establishing the database, therefore you might want to consider a clean start, using a small set of data, or the script you gave me, by;
1. Export all your database as SQL script, make sure the client you use(phpmyadmin?) also honours latin1 in every step. 2. Check the file encoding and the contents are exported correctly. 3. Import the SQL and try from IPT again.
Hope this helps. Do let us know if your problem is resolved.
Thanks,
Burke
[1] http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-charsets.html