[IPT] GBIF Case 1773: UTF8

Mickael Graf Mickael.Graf at nrm.se
Fri Sep 2 15:30:45 CEST 2011

Hi Burke,

Thank you for your help. It helped me narrowing the problem and I think I am now close to publish the resource.

For the record I did a mysqldump of the table using --default-character-set=latin1 (dumping in two files, one for the structure, the other for the data), converted the files with the forceUTF8 library and replacing occurrences of "latin1" with "utf8".

The resulting data is displayed correctly both with IPT and with TapirLink, and with both latin1 and utf8 as the character set for MySQL server. But this is on my own computer, testing on the server with MySQL/latin1 gives me errors with TapirLink. But nevermind, I'll create a temporary database for the time of migration.

Again, thanks a lot.

From: Burke Chih-Jen Ko (GBIF) [bko at gbif.org]
Sent: Friday, August 26, 2011 11:06 AM
To: Mickael Graf
Cc: Johan Dunfalk; GBIF IPT mailing list; GBIF Helpdesk
Subject: Re: GBIF Case 1773: UTF8

Hi MIchaël,

Yes this time I have correct data to test. I changed from your script is to re-save the file in latin1, and add drop-table script from your previous dump. It was saved in UTF-8 despite the sql settings in the script are all latin1. The refined file is attached.

Then I reproduce the environment as the steps here:

1) Adjust MySQL server and client encoding settings to match yours:
The server and client connection show:
Server characterset:    latin1
Db     characterset:    latin1
Client characterset:    latin1
Conn.  characterset:    latin1

2) Create a database from the attached script, the database encoding is latin1.

3) Follow the normal procedure to create a SQL source in IPT. See "settings" screen shot.

4) Since JDBC driver detects source encoding automatically, the encoding setting in the bottom-left doesn't matter for SQL source. However, we're thinking about forcing the encoding as instructed. Please refer to mysql jdbc connector page[1].

5) The preview result on my side is attached as the result.png image. Närke is rendered correctly, whether your browser encoding is latin1 or UTF-8.

Since we assume everything on your side is latin1, if it still doesn't work, you can change a line in the jdbc.properties file of a *deployed* IPT, to force jdbc encoding:

6) In [Tomcat root]/webapps/ipt/WEB-INF/classes, you have jdbc.properties, at line 7, you have


7) add the encoding setting to the connection, so it reads as

The encoding name used by JDBC driver is slightly different from MySQL[1, again].

Let me know if you can work out a refreshed result. Otherwise I suspect there was once UTF-8 encoding involved in certain steps while you establishing the database, therefore you might want to consider a clean start, using a small set of data, or the script you gave me, by;

1. Export all your database as SQL script, make sure the client you use(phpmyadmin?) also honours latin1 in every step.
2. Check the file encoding and the contents are exported correctly.
3. Import the SQL and try from IPT again.

Hope this helps. Do let us know if your problem is resolved.



[1] http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-charsets.html

More information about the IPT mailing list