Monday, January 03, 2005

Unicode Localizations

Most Struts/JSTL developers know that in order to localize a web application for different regions and languages you must first extract the locale-sensitive messages (e.g., field labels, warning messages, etc) from your JSP and Java code and place them in an application resource .properties file. Then you create various versions of this file where the name's suffix reflects the locale and language (e.g., _fr_ca for French Canada) and the values in the file contain locale-specific messages. At runtime the application framework will use the messages from the appropriate resource file to serve messages based on the user's locale.

I had always assumed that a UTF-8 encoded properties file would allow me to use accented characters for words like français. But today I discovered that while I can encode my files in UTF-8, the Struts framework uses the java.util.Properties class which quite inconveniently doesn't understand unicode characters. From the Javadocs:

When saving properties to a stream or loading them from a stream, the ISO 8859-1 character encoding is used.

So you must use a goofy little utility that ships with the JDK called native2ascii which will convert a UTF-8 encoded properties file into an ASCII file which contains unicode escape codes to represent those characters. You then use this generated file in your Struts application. Fortunately Ant comes with an optional task to run this utility.

To me this all seems rather silly especially when you consider Java has built in support for unicode. Begs the question: Why is the Properties class built like this?

No comments: