January 03, 2008

I'm leaving Warwick

I don’t write here much any more, but I felt I should add one last entry…

After 3 years as a student and 7.5 years working here, I’m leaving Warwick. I’m going to work for a small IT consultancy company in Leamington Spa called Black Pepper Software. They are a great bunch of people and I’m really looking forward to starting (on Monday 14th Jan), but I’m obviously really going to miss it here after being here for so long.

My last day is next Friday (11th January), so after that this rarely read and even more rarely written blog will still be here, but will be made inactive.

Byeeeeeeeee!


July 16, 2007

My mad dog

Writing about web page http://www.youtube.com/watch?v=3ySTY7HCLXM

Took Gizmo to the woods and discovered that he likes to swim…and then go mad.

Turn up the volume to hear me swear when he almost knocks me over half way through :)


May 30, 2007

Amazing Microsoft Surface Computing

Writing about web page http://www.microsoft.com/surface/

I’ve not been inspired to blog for a while, but something has really caught my imagination over the last couple of days as it has a bunch of other people.

What’s really amazed me is the Microsoft (yes…Microsoft…) Surface Computing system.

It’s basically a flat touch sensitive computer screen, but they have implemented it so elegantly it has to be seen to be believed. The best demo I’ve seen is this one:

First Look: Microsoft Surface Computing




March 26, 2007

Java UTF–8 international character support with Tomcat and Oracle

Introduction 

I've spent the last few days looking at getting proper international character support working in our Files.Warwick application working.

At E-Lab we've never been that great at doing internationalisation support. BlogBuilder does a pretty good job of internationalisation as can be seen by quite a lot of our bloggers writing in Chinese/Korean/Japanese.

However, it's a bit of a cludge and doesn't work everywhere.

It didn't take long for someone to upload a file to Files.Warwick with an "é" in the file name. Due to our previous lack of thought in this area, this swiftly turned into a ? :(

So...how do you get your app to support international characters throughout?

What is international character support?

You'll hear all sorts of jargon regarding internationalisation support. Here is a little explanation of what it is all about.

What I do NOT mean is i18n support which is making the application support multiple languages in the interface so that you can read help pages and admin links in French or Chinese. What I mean by internationalisation support is being able to accept user input in any language or character set.

Tim Bray has a really good explanation of some of the issues surrounding ASCII/Unicode/UTF-8.

UTF-8 all the way through the stack

We need to look at UTF-8 support in the following areas:

  1. URLs
  2. Apache
  3. HTML
  4. Javascript
  5. POST data
  6. File download (Content-Disposition)
  7. JSPs
  8. Java code
  9. Tomcat
  10. Oracle
  11. File system

I'll go through each of these areas and explain how well they are supported by default and what changes you might need to make to support UTF-8 in each area.

URLs 

URLs should only contain ASCII characters. The ASCII character set is quite restrictive if you want to use Chinese characters for instance, so there is some encoding needed here. So if you've got a file with a Chinese character and you want to link to it, you need to do this:

"中.doc" ->  "%E4%B8%AD.doc"

Thankfully this can be done with a bit of Java:

java.net.URLEncoder.encode("中.doc","UTF-8");

So, whenever you need to generate something for the address bar or a direct or something like that, you must URL encode the data. You don't have to detect this as it doesn't hurt to do this for links which are just plain old ASCII as they don't get changed, as you can see with the ".doc" ending on the above example.

Apache

Generally you don't need to worry about Apache as it shouldn't be messing with your HMTL or URLs. However, if you are doing some proxying with mod_proxy then you might need to have a think about this. We use mod_proxy to do proxying from Apache through to Tomcat. If you've got encoded characters in URL that you need to convert into some query string for your underlying app then you're going to have a strange little problem.

If you have a URL coming into Apache that looks like this:

http://mydomain/%E4%B8%AD.doc and you have a mod_rewrite/proxy rule like this:

RewriteRule ^/(.*) http://mydomain:8080/filedownload/?filename=$1 [QSA,L,P]

Unfortunately the $1 is going to get mangled during the rewrite. QSA (QueryStringAppend) actually deals with these characters just fine and will send this through untouched, but when you grab a bit of the URL such as my $1 here then the characters get mangled as Apache tries to do some unescaping of its own into ISO-8859-1, but it's UTF-8 not ISO-8859-1 so it doesn't work properly. So, to keep our special characters in UTF-8, we'll escape it back again.

RewriteMap escape int:escape
RewriteRule ^/(.*) http://mydomain:8080/filedownload/?filename=${escape:$1} [QSA,L,P]

Take a look at your rewrite logs to see if this is working.

HTML 

HTML support for UTF-8 is good, you just need to make sure you set the character encoding properly on your pages. This should be as simple as bit of code in the HEAD of your page:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 

You should be able to write out UTF-8 characters for real into the page without any special encoding. 

Javascript 

Javascript supports UTF-8 characters very well so as long as you don't use escape() then when your users enter characters, they shouldn't get mangled. We also use AJAX do do some functions in our application so you need to think about that as well but again, it should just work.

All of the above only holds true if you set the character encoding right on your surrounding HTML.

POST data

Getting POST datafrom the user in the right format is simple too. As long as your HTML has the right encoding then you should be ok.

File download (Content-Disposition) 

If you want to serve files for download from your app, as we obviously do with Files.Warwick then you'll need to understand how browsers deal with non ASCII characters in file names when downloading. Unfortunately the standard is not exactly well defined as no one really thought about UTF-8 file names until recently.

Internet Explorer supports URL encoded file names but Firefox supports a rather strange Base64 encoded value for high byte file names, so something like this should do the job:


String userAgent = request.getHeader("User-Agent");
String encodedFileName = null;

if (userAgent.contains("MSIE") || userAgent.contains("Opera")) {
encodedFileName = URLEncoder.encode(node.getName(), "UTF-8");
} else {
encodedFileName = "=?UTF-8?B?" + new String(Base64.encodeBase64(node.getName().getBytes("UTF-8")), "UTF-8") + "?=";
}

response.setHeader("Content-Disposition", "attachment; filename=\"" + encodedFileName + "\"");

Obviously you can tweak the user agent detection to be a bit smarter than this. 

JSPs 

UTF-8 support in JSPs is pretty much a one liner.

<%@ page language="java" pageEncoding="utf-8" contentType="text/html;charset=utf-8" %>

Include that at the top of every single JSP perhaps in a prelude.jsp file and you're away. 

Java code

As long as you source strings are properly encoded then generally you can rely on Java to keep your UTF-8 encoded input. However, be careful what String functions you perform on your UTF-8 data. Be sure to do things like this:

myStr.getBytes("UTF-8") rather than just myStr.getBytes()

If you don't then you'll most likely end up with ISO-8859-1 bytes instead. If for some reason you can not get your input data to be UTF-8, and it is coming in with a different encoding, you could do something like this to convert it to UTF-8:

String myUTF8 = new String(my8859.getBytes("ISO-8859-1"),"UTF-8")

Debugging can be fun with high byte characters as generally logging to a console isn't going to show you the characters you are expecting. If you did this:

System.out.println(new String(new byte[] { -28, -72, -83},"UTF-8")

Then you'd probably just see a ? rather than the Chinese character that it really should be. However, you can make log4j log UTF-8 messages. Just add 

<param name="Encoding" value="UTF-8"/>

To the appender in your log4j.xml config. Or this:

log4j.appender.myappender.Encoding=UTF-8

To your log4j.properties file. You might still only see the UTF-8 data properly if you view the log file in an editor/viewer that can view UTF-8 data (Windows notepad is ok for instance).

Tomcat

By default Tomcat will encode everything in ISO-8859-1. You can in theory override this by setting the incoming encoding of the HttpServletRequest to be UTF-8, but once some of the request is read, then the encoding is set, so chances are you might not be able to manually do:

request.setCharacterEncoding("UTF-8")

early enough to have an effect. So instead you can tell Tomcat you want it to run in UTF-8 mode by default. Just add the following to the Connector you want UTF-8 on in your server.xml config file in Tomcat.

URIEncoding="UTF-8"

Not doing this has the fun quirk that if you have a request like this:

/test.htm?highByte=%E4%B8%AD

If you did request.getQueryString() you'd get the raw String that "highByte=%E4%B8%AD", but if you did request.getParameter("highByte") then you'd get the ISO-8859-1 encoded value instead which would not be right. Sigh.

Oracle

You could just URL encode all of your data and put it into the database in ASCII like you always used to. However, that doesn't make for very readable data. There are two options here although I've only tried the one.

  1. Set the default character encoding of your Oracle database to be UTF-8. However, it is set on a per server basis, not a per schema basis so your whole server would be affected.
  2. Use NVARCHAR2 fields instead of VARCHAR2 fields and you can store real UTF-8 data.

We went for option 2 as we have a shared Oracle server. First of all, convert all fields that you want to store UTF-8 data in from VARCHAR2s to NVARCHAR2s. Be careful as I don't think you can change back!

You then need to tell your JDBC code somehow that it needs to send data that the NVARCHAR2 fields can undertand. There are a couple of ways of doing this too:

  1. Set the defaultNChar property on the connection to true.
  2. Use the setFormOfUse() method that is an Oracle specific extension to the PrepearedStatement

I went for option 1 as the problem with option 2 is that you have to somehow get at the Oracle specific connection or prepared statement within your Java code. This is not fun as you'll often be using a connection pool that will hide away these details.

Files system 

File system support of UTF-8 characters is again pretty good, but you are sometimes going to have issues with viewing the file listings. I just couldn't get a UTF-8 file name to display properly over a putty SSH connection. Through a simple Java test program, I could write and read back a UTF-8 file name on our Solaris 10 box, but all I could ever actually read when doing an "ls" was ?????.doc. So for the sake of maintainability of the file system I went for a URL encoded version of the file. This isn't ideal, but it works.

Conclusion

As you can see, there is quite a lot of work involved in supporting UTF-8 throughout. A lot of my time was spent researching as my understanding of encoding issues wasn't great. Now that I've put together this guide, I hope all of our apps can start to work towards full UTF-8 support.

Of course the above guide is quite specific to my experience in the app I was dealing with and the environment I work in so your experiences might be more or less painful :) 


December 15, 2006

New car!

After owning my Audi A4 for 3.5 years, it was time for a change. With the arrival of Dog #2 it was a bit of a squeeze in the back of Steph’s Polo for the two of them, so I needed a car with a bigger boot.

After just 4 days on Autotrader, my Audi was gone…a sad day indeed. But just a few days later and I’ve got my replacement. I would have liked an Audi A3 Sportback or even the Volvo V50, but I just can’t justify it really so I went for a top of the line (except the ST as I really don’t want a 20mpg, group 20 insurance car) year and a half old Focus…which works out a hell of a lot cheaper than the A3 of V50 (unsurprisingly).

Focus

So far I’m really happy with it, it drives surprisingly well, the 2.0 turbo diesel really flies and it handles brilliantly. I must admit that I do miss the refinement of the Audi, but I don’t think I’ll miss it in the long run. Now I just have to see if I can get the dogs in the back!


November 24, 2006

Files project update

Follow-up to Spring and Hypersonic/Hibernate tests from Kieran's blog

It’s been a while since my last update on this project. Unfortuately we’ve not done as much as I would have liked. Both Sarah and I have had holiday and we’ve been busy on other projects.

We’re back in the swing of things now and we’re moving forward a lot quicker now that most of the underlying infrastructure is in place.

To give an idea of the size of the project already (as well as just numbers can tell you anything):

  • 75 classes and interfaces
  • 22 test classes with 50 tests
  • 15 hibernate mapping files
  • 5 database tables (we are mapping quite a few classes to a single table in quite a few places)
  • 14 jsps (not many as we’ve not got loads of interfaces to some of the underlying code yet)

So, what does this code do then?

We’ve recently added quotas and the ability to email a file to someone. This basically sends an email with a link in it to a unique download URL that lasts a week and lets the person who sent the file keep track of downloads of that file (they get notified by email when it downloads and can see the download count on an web interface within their account).

We’ve also got the permissions system in now so that you can give view/edit/admin permissions to a person or a group of people (as usual there is no interface for this yet…just the code).

Everything is still pretty ugly as we’ve not done any graphic design work, so I’m not going to post any screenshots!


November 13, 2006

Gizmo's first 36 hours

Follow-up to Yet another new member of the family…Zeno/Gizmo from Kieran's blog

We’ve had Gizmo now for almost 36 hours and he’s been absolutely fantastic.

Gizmo in his bed Gizmo in the garden
Maggie and Gizmo in the garden Me and Gizmo

Even though he did pee in the house a few times yesterday and first thing this morning, he is a lot better now and is getting the right idea now and has been good all day.

I spent the day at home and kept an eye on him and Maggie. As expected, Maggie is not yet his biggest fan and has growled at him quite a lot. However, they’ve not actually had a fight and don’t look like they will. I even managed to leave them both alone at home for an hour today and they were fine…I got a great double bouncy greeting when I got home :)

Gizmo is also really responsive to training already and is already being really obedient, he’ll be all trained up and into his routine in no time.

All in all Steph and I are really pleased with him and couldn’t have picked a nicer dog!


November 10, 2006

Trying to get hold of people

Writing about web page http://www.dilbert.com/


November 08, 2006

DropSend for sale

Writing about web page http://www.barenakedapp.com/dropsend/dropsend-monthly-profit

Carson systems are selling DropSend so that they can concentrate on their new Amigo project. What makes this interesting is that they are doing their usual openness during the sale and are posting what would usually be regarded as trade secrets on their blog:


How much profit does DropSend bring in each month?
  • Revenue: $9,041.81 per month (and growing by 8.6% per month)
  • Costs: $2,100 per month (Servers at 365main.com + maintenance)
  • Profit: $6,941.81 per month

Looks like a lot of people have this sending/storing large files problem.


November 05, 2006

Yet another new member of the family…Zeno/Gizmo

Follow-up to New member of the family from Kieran's blog

After having a dog for almost 2 years…it’s time for another one!

We’ve had Maggie for almost 2 years now (2 years in January) and we’ve occasionally been thinking it’d be nice to have another dog, but we’ve never been that serious about it.

After going on holiday to visit my parents and yet again seeing Gina and Tiro and what good mates they are and seeing how well Maggie had got on with Steph’s Dad’s dog whilst we were away, it made us think again about another dog. This time, it seems right so we went and had a look around the Dogs Trust the last two weeks and yesterday we found the one :)

He’s currently called Zeno and was rescued as a stray from Ireland. He’s a Black Lab cross of some kind (possibly Border Collie) and about a year old. He’s a really nice boy, very friendly and seems to get on really well with Maggie when we took her to meet him both yesterday and today. We’re going to change his name to Gizmo as Zeno is just a made up kennel name and he doesn’t respond to it at all. With a bit of luck we’ll pick him up next Sunday!

It’s going to be hard work having a second dog, but I think it’s going to be really good for us as well as really good company for Maggie.

Maggie
Maggie close up

Best friends, Gina and Tiro
South Africa 2006 32

Zeno/Gizmo
Zeno/Gizmo and Maggie Zeno/Gizmo Zeno/Gizmo, me and Maggie Zeno/Gizmo


October 2023

Mo Tu We Th Fr Sa Su
Sep |  Today  |
                  1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31               

Tags

Search this blog

Most recent comments

  • One thing that was glossed over is that if you use Spring, there is a filter you can put in your XML… by Mathew Mannion on this entry
  • You are my hero. by Mathew Mannion on this entry
  • And may all your chickens come home to roost – in a nice fluffy organic, non–supermarket farmed kind… by Julie Moreton on this entry
  • Good luck I hope that you enjoy the new job! by on this entry
  • Good luck Kieran. :) by on this entry

Galleries

Not signed in
Sign in

Powered by BlogBuilder
© MMXXIII