March 27, 2006

Charsets ; I hate them :(

So the project I am working on uses a number of technologies including JSTL and freemarker.

Somebody had entered the following character: "è" which was displaying perfectly in JSP, but freemarker was replacing it with "?".

Of course I thought, it must be freemarkers fault :) I was meticulously careful about never converting a byte[] into a String using the string constructor unless the charset was specified, but I never realised you had to do it the other way as well :( When calling getBytes(), the encoding of the string is completely ignored and the platform default is used…. Why?

So the following will do bad things:


  new String(new String("some string with a funny character è", encoding).getBytes(), encoding);

lame.

The problem was also a little bit more interesting because on windows machines the default platform is unicode based, on solaris it isn't, so the problem only exhibited itself on solaris.

I have (I am ashamed to say) never really delved into the joy of charsets and text encoding, instead preferring to stick my head in the sand. Luckily Chris May sits next on the desk next to me :)


- 3 comments by 1 or more people Not publicly viewable

  1. There is also getBytes(charset), don't know if that's any use.

    27 Mar 2006, 13:58

  2. I think it's because Java Strings are always 16-bit Unicode, rather than storing the encoding you created it with, and so when you do getBytes you have to tell it again what encoding the bytes should be in.

    27 Mar 2006, 14:01

  3. Hi Nick,

    Yeah, getBytes(charset) is effectively what I have to do.

    It all "feels" far more difficult than it needs to be though :(

    27 Mar 2006, 14:53


Add a comment

You are not allowed to comment on this entry as it has restricted commenting permissions.

March 2006

Mo Tu We Th Fr Sa Su
Feb |  Today  | Apr
      1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31      

Search this blog

Tags

Galleries

Most recent comments

  • Interesting… While I'm not completely convinced in such microbenchmarks, I'm pretty sure that 1ms … by Alexander Snaps on this entry
  • Hello. I bought the book yesterday. I was trying to find the source code for chapter 11 and chapter … by Suleman on this entry
  • http://woosight.net/account/login?username=demo by live mashup demo on this entry
  • Thanks mate ….. This blog was really helpful. by Maaz Hurzuk on this entry
  • Ty. Not directly helpful for my problem, but pointed me in the right direction. You will also get th… by Mike E. on this entry

Blog archive

Loading…
Not signed in
Sign in

Powered by BlogBuilder
© MMXXIII