Charsets ; I hate them :(
So the project I am working on uses a number of technologies including JSTL and freemarker.
Somebody had entered the following character: "è" which was displaying perfectly in JSP, but freemarker was replacing it with "?".
Of course I thought, it must be freemarkers fault :) I was meticulously careful about never converting a byte[] into a String using the string constructor unless the charset was specified, but I never realised you had to do it the other way as well :( When calling getBytes(), the encoding of the string is completely ignored and the platform default is used…. Why?
So the following will do bad things:
new String(new String("some string with a funny character è", encoding).getBytes(), encoding);
lame.
The problem was also a little bit more interesting because on windows machines the default platform is unicode based, on solaris it isn't, so the problem only exhibited itself on solaris.
I have (I am ashamed to say) never really delved into the joy of charsets and text encoding, instead preferring to stick my head in the sand. Luckily Chris May sits next on the desk next to me :)
There is also getBytes(charset), don't know if that's any use.
27 Mar 2006, 13:58
I think it's because Java Strings are always 16-bit Unicode, rather than storing the encoding you created it with, and so when you do getBytes you have to tell it again what encoding the bytes should be in.
27 Mar 2006, 14:01
Hi Nick,
Yeah, getBytes(charset) is effectively what I have to do.
It all "feels" far more difficult than it needs to be though :(
27 Mar 2006, 14:53
Add a comment
You are not allowed to comment on this entry as it has restricted commenting permissions.