All 29 entries tagged Programming

View all 126 entries tagged Programming on Warwick Blogs | View entries tagged Programming at Technorati | There are no images tagged Programming on this blog

March 26, 2007

Java UTF–8 international character support with Tomcat and Oracle

Introduction 

I've spent the last few days looking at getting proper international character support working in our Files.Warwick application working.

At E-Lab we've never been that great at doing internationalisation support. BlogBuilder does a pretty good job of internationalisation as can be seen by quite a lot of our bloggers writing in Chinese/Korean/Japanese.

However, it's a bit of a cludge and doesn't work everywhere.

It didn't take long for someone to upload a file to Files.Warwick with an "é" in the file name. Due to our previous lack of thought in this area, this swiftly turned into a ? :(

So...how do you get your app to support international characters throughout?

What is international character support?

You'll hear all sorts of jargon regarding internationalisation support. Here is a little explanation of what it is all about.

What I do NOT mean is i18n support which is making the application support multiple languages in the interface so that you can read help pages and admin links in French or Chinese. What I mean by internationalisation support is being able to accept user input in any language or character set.

Tim Bray has a really good explanation of some of the issues surrounding ASCII/Unicode/UTF-8.

UTF-8 all the way through the stack

We need to look at UTF-8 support in the following areas:

  1. URLs
  2. Apache
  3. HTML
  4. Javascript
  5. POST data
  6. File download (Content-Disposition)
  7. JSPs
  8. Java code
  9. Tomcat
  10. Oracle
  11. File system

I'll go through each of these areas and explain how well they are supported by default and what changes you might need to make to support UTF-8 in each area.

URLs 

URLs should only contain ASCII characters. The ASCII character set is quite restrictive if you want to use Chinese characters for instance, so there is some encoding needed here. So if you've got a file with a Chinese character and you want to link to it, you need to do this:

"中.doc" ->  "%E4%B8%AD.doc"

Thankfully this can be done with a bit of Java:

java.net.URLEncoder.encode("中.doc","UTF-8");

So, whenever you need to generate something for the address bar or a direct or something like that, you must URL encode the data. You don't have to detect this as it doesn't hurt to do this for links which are just plain old ASCII as they don't get changed, as you can see with the ".doc" ending on the above example.

Apache

Generally you don't need to worry about Apache as it shouldn't be messing with your HMTL or URLs. However, if you are doing some proxying with mod_proxy then you might need to have a think about this. We use mod_proxy to do proxying from Apache through to Tomcat. If you've got encoded characters in URL that you need to convert into some query string for your underlying app then you're going to have a strange little problem.

If you have a URL coming into Apache that looks like this:

http://mydomain/%E4%B8%AD.doc and you have a mod_rewrite/proxy rule like this:

RewriteRule ^/(.*) http://mydomain:8080/filedownload/?filename=$1 [QSA,L,P]

Unfortunately the $1 is going to get mangled during the rewrite. QSA (QueryStringAppend) actually deals with these characters just fine and will send this through untouched, but when you grab a bit of the URL such as my $1 here then the characters get mangled as Apache tries to do some unescaping of its own into ISO-8859-1, but it's UTF-8 not ISO-8859-1 so it doesn't work properly. So, to keep our special characters in UTF-8, we'll escape it back again.

RewriteMap escape int:escape
RewriteRule ^/(.*) http://mydomain:8080/filedownload/?filename=${escape:$1} [QSA,L,P]

Take a look at your rewrite logs to see if this is working.

HTML 

HTML support for UTF-8 is good, you just need to make sure you set the character encoding properly on your pages. This should be as simple as bit of code in the HEAD of your page:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 

You should be able to write out UTF-8 characters for real into the page without any special encoding. 

Javascript 

Javascript supports UTF-8 characters very well so as long as you don't use escape() then when your users enter characters, they shouldn't get mangled. We also use AJAX do do some functions in our application so you need to think about that as well but again, it should just work.

All of the above only holds true if you set the character encoding right on your surrounding HTML.

POST data

Getting POST datafrom the user in the right format is simple too. As long as your HTML has the right encoding then you should be ok.

File download (Content-Disposition) 

If you want to serve files for download from your app, as we obviously do with Files.Warwick then you'll need to understand how browsers deal with non ASCII characters in file names when downloading. Unfortunately the standard is not exactly well defined as no one really thought about UTF-8 file names until recently.

Internet Explorer supports URL encoded file names but Firefox supports a rather strange Base64 encoded value for high byte file names, so something like this should do the job:


String userAgent = request.getHeader("User-Agent");
String encodedFileName = null;

if (userAgent.contains("MSIE") || userAgent.contains("Opera")) {
encodedFileName = URLEncoder.encode(node.getName(), "UTF-8");
} else {
encodedFileName = "=?UTF-8?B?" + new String(Base64.encodeBase64(node.getName().getBytes("UTF-8")), "UTF-8") + "?=";
}

response.setHeader("Content-Disposition", "attachment; filename=\"" + encodedFileName + "\"");

Obviously you can tweak the user agent detection to be a bit smarter than this. 

JSPs 

UTF-8 support in JSPs is pretty much a one liner.

<%@ page language="java" pageEncoding="utf-8" contentType="text/html;charset=utf-8" %>

Include that at the top of every single JSP perhaps in a prelude.jsp file and you're away. 

Java code

As long as you source strings are properly encoded then generally you can rely on Java to keep your UTF-8 encoded input. However, be careful what String functions you perform on your UTF-8 data. Be sure to do things like this:

myStr.getBytes("UTF-8") rather than just myStr.getBytes()

If you don't then you'll most likely end up with ISO-8859-1 bytes instead. If for some reason you can not get your input data to be UTF-8, and it is coming in with a different encoding, you could do something like this to convert it to UTF-8:

String myUTF8 = new String(my8859.getBytes("ISO-8859-1"),"UTF-8")

Debugging can be fun with high byte characters as generally logging to a console isn't going to show you the characters you are expecting. If you did this:

System.out.println(new String(new byte[] { -28, -72, -83},"UTF-8")

Then you'd probably just see a ? rather than the Chinese character that it really should be. However, you can make log4j log UTF-8 messages. Just add 

<param name="Encoding" value="UTF-8"/>

To the appender in your log4j.xml config. Or this:

log4j.appender.myappender.Encoding=UTF-8

To your log4j.properties file. You might still only see the UTF-8 data properly if you view the log file in an editor/viewer that can view UTF-8 data (Windows notepad is ok for instance).

Tomcat

By default Tomcat will encode everything in ISO-8859-1. You can in theory override this by setting the incoming encoding of the HttpServletRequest to be UTF-8, but once some of the request is read, then the encoding is set, so chances are you might not be able to manually do:

request.setCharacterEncoding("UTF-8")

early enough to have an effect. So instead you can tell Tomcat you want it to run in UTF-8 mode by default. Just add the following to the Connector you want UTF-8 on in your server.xml config file in Tomcat.

URIEncoding="UTF-8"

Not doing this has the fun quirk that if you have a request like this:

/test.htm?highByte=%E4%B8%AD

If you did request.getQueryString() you'd get the raw String that "highByte=%E4%B8%AD", but if you did request.getParameter("highByte") then you'd get the ISO-8859-1 encoded value instead which would not be right. Sigh.

Oracle

You could just URL encode all of your data and put it into the database in ASCII like you always used to. However, that doesn't make for very readable data. There are two options here although I've only tried the one.

  1. Set the default character encoding of your Oracle database to be UTF-8. However, it is set on a per server basis, not a per schema basis so your whole server would be affected.
  2. Use NVARCHAR2 fields instead of VARCHAR2 fields and you can store real UTF-8 data.

We went for option 2 as we have a shared Oracle server. First of all, convert all fields that you want to store UTF-8 data in from VARCHAR2s to NVARCHAR2s. Be careful as I don't think you can change back!

You then need to tell your JDBC code somehow that it needs to send data that the NVARCHAR2 fields can undertand. There are a couple of ways of doing this too:

  1. Set the defaultNChar property on the connection to true.
  2. Use the setFormOfUse() method that is an Oracle specific extension to the PrepearedStatement

I went for option 1 as the problem with option 2 is that you have to somehow get at the Oracle specific connection or prepared statement within your Java code. This is not fun as you'll often be using a connection pool that will hide away these details.

Files system 

File system support of UTF-8 characters is again pretty good, but you are sometimes going to have issues with viewing the file listings. I just couldn't get a UTF-8 file name to display properly over a putty SSH connection. Through a simple Java test program, I could write and read back a UTF-8 file name on our Solaris 10 box, but all I could ever actually read when doing an "ls" was ?????.doc. So for the sake of maintainability of the file system I went for a URL encoded version of the file. This isn't ideal, but it works.

Conclusion

As you can see, there is quite a lot of work involved in supporting UTF-8 throughout. A lot of my time was spent researching as my understanding of encoding issues wasn't great. Now that I've put together this guide, I hope all of our apps can start to work towards full UTF-8 support.

Of course the above guide is quite specific to my experience in the app I was dealing with and the environment I work in so your experiences might be more or less painful :) 


November 24, 2006

Files project update

Follow-up to Spring and Hypersonic/Hibernate tests from Kieran's blog

It’s been a while since my last update on this project. Unfortuately we’ve not done as much as I would have liked. Both Sarah and I have had holiday and we’ve been busy on other projects.

We’re back in the swing of things now and we’re moving forward a lot quicker now that most of the underlying infrastructure is in place.

To give an idea of the size of the project already (as well as just numbers can tell you anything):

  • 75 classes and interfaces
  • 22 test classes with 50 tests
  • 15 hibernate mapping files
  • 5 database tables (we are mapping quite a few classes to a single table in quite a few places)
  • 14 jsps (not many as we’ve not got loads of interfaces to some of the underlying code yet)

So, what does this code do then?

We’ve recently added quotas and the ability to email a file to someone. This basically sends an email with a link in it to a unique download URL that lasts a week and lets the person who sent the file keep track of downloads of that file (they get notified by email when it downloads and can see the download count on an web interface within their account).

We’ve also got the permissions system in now so that you can give view/edit/admin permissions to a person or a group of people (as usual there is no interface for this yet…just the code).

Everything is still pretty ugly as we’ve not done any graphic design work, so I’m not going to post any screenshots!


November 01, 2006

Spring and Hypersonic/Hibernate tests

Follow-up to Files project dev server from Kieran's blog

Having been away on holiday for 2 weeks and having quite a bit of catching up to do with other stuff, we’ve not made huge leaps in the last few weeks. However, we’re building up steam again now and have finally sorted out after a few restarts the domain model we’re going to be going for around the key aspects of accounts, files, folders, etc…

Up until now for speed of prototyping, we’ve been working with Spring, but not yet involved a database as we can quite easily just talk directly to the file system for now. However, now is the time to start getting more complex and we need somewhere to store all the metadata of all kinds about the files and accounts.

As usual, we’ll try and incrementally do the Hibernate mappings and start to build the database scheme. To do this quickly we’ll be building against some tests and a hypersonic database to start with. Spring provides the handy “AbstractTransactionalDataSourceSpringContextTests” class which allow easy binding of Spring objects and also a simple way to plug in transactional capabilities to your tests.

By coupling these test with the Hypersonic database which can be built and torn down in memory in just milliseconds, we can prototype the database very quickly.

Hibernate session-factory config

<session-factory>
        <property name="dialect">org.hibernate.dialect.HSQLDialect</property>
         <property name="use_outer_join">true</property>
        <property name="hbm2ddl.auto">create-drop</property>

        <mapping resource="......hbm.xml"/>
   </session-factory>

Spring sessionfactory and datasource


<bean id="sessionFactory" class="org.springframework.orm.hibernate3.LocalSessionFactoryBean">
        <property name="configLocation"><value>hypersonic-hibernate.cfg.xml</value></property>
        <property name="dataSource" ref="dataSource"/>
    </bean>

    <bean id="dataSource" class="org.springframework.jdbc.datasource.DriverManagerDataSource">
        <property name="driverClassName">
          <value>org.hsqldb.jdbcDriver</value>
        </property>
        <property name="url">
          <value>jdbc:hsqldb:.</value>
         </property>
        <property name="username">
          <value>sa</value>
        </property>
        <property name="password">
          <value></value>
        </property>
    </bean>

So based on your mappings files, the database schema gets created in a new hypersonic database for each test giving you a working and clear schema to test against. Magic.


public class HypersonicTests extends AbstractTransactionalDataSourceSpringContextTests {

    protected String[] getConfigLocations() {
        return new String[] { "file:apps/webinterface/src/applicationContext.xml","file:apps/webinterface/test-src/hypersonic-db-context.xml"};
    }

}

public class DbConnectionTests extends HypersonicTests {

    private SessionFactory _sessionFactory;

    public final void testDbConnection() throws Exception {

        Session session = getSessionFactory().openSession();

        session.save(new AccountImpl(null, null, "Test", null));

        session.flush();

        List accounts = session.createCriteria(Account.class).list();

        assertEquals(1, accounts.size());

    }

    public SessionFactory getSessionFactory() {
        return _sessionFactory;
    }

    public void setSessionFactory(final SessionFactory sessionFactory) {
        _sessionFactory = sessionFactory;
    }

}

October 05, 2006

Files project dev server

Follow-up to Getting a project up and running from Kieran's blog

One of the important things to try and get ready as early as possible is a test/pre-production system that is fairly close to what you expect your live environment to be. This is so that you don’t spend the whole time developing on a single JBoss instance on a single processor Windows box with local storage and then deploy on a multi-processor, multi-JBoss and remote storage box and discover that nothing works!

We are now starting to run Solaris 10 which gives us the great Zones feature. Our sys admins have setup a zone on one of our new boxes that is a test/pre-prod environment for the files project. We’ll run something like this:

  • Single Apache 1.3x instance with HAProxy to load balance between the JBoss instances
  • Two JBoss instances both running live rather than a live and a standby
  • For now local storage, but eventually we’ll use our NetApp

The twin live JBoss instances means that our application will have to be completely stateless. This is a good thing for scalability, but it makes multi-step processes within the application a bit harder as we won’t have a session to store data in. This is usually not a problem for simple applications, but working on something like a mutli-step zip upload could be tricky.

The other advantage of having a test instance up and running is that you can start to point very early test users at it (rather than a local instance on your own machine). This gets you some good early feedback/ideas/bug-spotting.


September 25, 2006

Getting a project up and running

Follow-up to New online files project from Kieran's blog

Starting a new project is quite intimidating as you start with absolutely nothing. Before you really get going you’ve got to get the following stuff together:

  1. A JIRA project (this is our great bug tracking software from Atlassian)
  2. A CVS project (gotta backup that code)
  3. A basic project structure in Eclipse (need to ensure you can easily build multiple distributions from a single code base)
  4. An Ant build.xml file to build the project…even though there’s not really got much to build yet…there will soon
  5. All the basic Jar files you’re going to need, such as Spring and the like

Once you’ve got the basic project infrastructure in place, you might actually be able to write some code. Some people might say that you’ll have to write a spec first, but that’s not how we do things. We are very keen to get things out the door because we and our users don’t really know what they want until they start using stuff. This works well for us as we’re pretty good at being responsive to our users’ needs and can keep the project nice and easy to refactor and change as we go along.

Being a good boy, I’m making sure that I’ve got lots of tests right from the start. This kind of project is basically all about files so the key thing that it would be nice to get right first time is how to model the file system. It is worth spending a bit of time on the really key parts of the system as you could refactor this later on, but you really wouldn’t want to.

Whilst this very early coding and infrastructure work is going on, it is quite hard to have more than one person working on the project. Once there is a bit more meat to the project someone else can start to get a bit more involved and start something like the file download part of the project. In the mean time it’s worth doing some things that can be done in parallel. A couple of things we have going on in the background are:

  1. The visual/graphic design work is starting to be looked at by Hannah
  2. Looking into how we might implement certain file system protocols is being done by Sarah

Although it’s only a couple of days in, I already have some reasonably good code for basic file management and file upload, but not much in the way of a web interface for it yet, except a basic file upload and file listings page.


September 20, 2006

New online files project

After working on Single Sign On and BlogBuilder and various other smaller projects on and off over the last couple of years, I have a big new project to get my teeth into.

Basically we (me and Sarah) are reworking how members of the University can get at their files over the web and send and receive large files given the restrictions and problems with emailing large files.

The full scope of the project is not yet known so I couldn’t just list all of the features that the system will have. However, our basic goals are:

  • Upload files to a web based file store (and of course then download them so that you can get at them at home easily)
  • Set permisisons on those files (based around our SSO and WebGroups system)
  • Be able to send other users files that you’ve uploaded so that they get a link to the file to download over the web
  • Allow non-Warwick users to send you large files that you won’t be able to get over email

There is a lot more possible detail in these features that we’ve had a think about already, but a lot of the finer decisions are yet to be taken.

We like to do things in a fairly agile way so that we get out working software quite quickly and then rapidly improve it based on testing and user feedback. This means hopefully there’ll be something to see relatively quickly (but don’t expect miracles) and it’ll improve with new versions all the time.

I’ll be writing about our progress here and giving some insights into how projects like this get built here at E-Lab.


September 13, 2006

Implementing the Atom Publishing Protocol

Writing about web page http://www.ietf.org/html.charters/atompub-charter.html

Yesterday I did a deploy of BlogBuilder that includes support for the Atom Publishing Protocol (APP). What this essentially means is that you can use a desktop client, a web service or your own programming to create, edit and delete entries from a Warwick Blog.

We chose the APP because the other blogging APIs out there are all a bit horrible really and the APP is new and shiny and relatively easy to understand and program for.

The implementation was not without its difficulties. For a start reference clients and servers are very thin on the ground at the moment as the spec is not actually 100% complete (although almost there). This meant a fair bit of time just getting my head around how it worked and sniffing traffic to spot what a working client and server actually did when they talked to each other.

In the end, with the help of the Atomic Client, Tim Bray’s Atom Protocol Exerciser and Elias Torres’s public server implementation I was able to get it all working…and here’s how I did it :)

We already used Rome in places for our Atom/RSS feed creation and parsing. It’s slightly tricky to get it to do everything you need to create an Atom Protocol server as you don’t always need to send and receive whole feeds. Sometimes you’ll just want to parse/create a single entry. This leads to code which manually creates or strips the feed around the single entry so that Rome can then parse it properly… blah.

As we’re a Spring shop here, I used a single MultiActionController to do all of the GET/POST/PUT/DELETE functionality that the Atom Protocol needs. You can easily map incoming requests to the right method with something like this MethodNameResolver:

public class HttpMethodTypeMethodResolver implements MethodNameResolver {

    private Map<String, String> methodMappings;

    public final String getHandlerMethodName(final HttpServletRequest request) throws NoSuchRequestHandlingMethodException {

        String method = request.getMethod().toUpperCase();

        if (methodMappings.containsKey(method)) {
            return methodMappings.get(method);
        }

        throw new NoSuchRequestHandlingMethodException(request);

    }

    public final String[] getSupportedMethods() {
        return methodMappings.keySet().toArray(new String[] {});
    }

    public final Map<String, String> getMethodMappings() {
        return methodMappings;
    }

    public final void setMethodMappings(final Map<String, String> methods) {
        this.methodMappings = methods;
    }

}

Then you just set this up to map your DELETE -> deleteEntry() method and PUT -> updateEntry() method and so on…

Another little thing with Rome is that you have to use the Atom specific parts of the API, you can’t just use the generic SyndFeed/SyndEntry classes as they do not convert to/from the exact Atom markup you need. So instead you need to use the Feed and Entry classes in the com.sun.syndication.feed.atom package.

So far I’ve only tested our implementation against Tim Bray’s APE and the Atomic client (which doesn’t send authentication headers when doing POST/PUT/DELETE operations so we could only really try it out with the GET stuff.

We’ll be trying it out against the new Office 2007 Atom client code later on today so it’ll be interesting to see if we need to tweak it a bit more (as I think everyone interprets the as to be completed standard a little differently).


August 14, 2006

New web sign–on change password screen

Writing about web page https://websignon.warwick.ac.uk/origin/changepassword.htm

I've recently been working on a new system to allow easy, secure and informative passwords changes on the web.

At the moment if you are a main Warwick user, you can change your main ITS account password (from our NDS directory) via the managed desktop or on the web via the my.insite portal. In an effort to improve the usability and availability of password management, we decided to create a new single page that sits within the web sign–on project that would allow any user, not just central Warwick users to change their passwords.

We have a model whereby users that login to web sign–on can come from a variety of sources:

  • Central NDS directory
  • Warwick Alumni service ran externally
  • WBS Alumni service
  • WBS NDS directory
  • External user database for Warwick related users

A user does not have to worry about which of these types of user they are, they just login and the system works out where they are from and authenticates them securely at that source. Each of these sources can now optionally incorporate a password change interface that we are plugging into.

In the first instance the page will only allow central NDS users to change their passwords, but over the coming weeks we will add in as many of the other sources as we can.

Changing a password is actually a pretty boring thing really, however, we've made it a bit more interesting by giving some nice visual feedback about the strength of your password so that you can judge how strong your password is and understand why we are not letting you have a password of "letmein".

Change password screenshot

This is done through a fair bit of javascript and a bunch of AJAX calls back to the server to work out if your password is strong enough. Once all of the criteria are met, the "Change password" button is activated and it allows you to change your password.

The required password strength is probably going to be something people are going to take a little while to get used to as it is fairly strict. From the University approved new password policy:

4.1 Choice of passwords
Passwords should:

  • Be at least 8 characters long.
  • Contain at least three of the following four types of character: letters in
    lower, letters in upper case, numbers, and symbols (e.g. “£$%^&*).
  • Be changed every six months for a new password (more often for
    systems requiring greater security).

In the long run we hope that this will mean that the average password strength is going to go up and this will raise people's awareness of what makes a stronger password and why it is important.


July 24, 2006

Shibboleth and Question Mark Perception

Writing about web page http://www.questionmark.com/uk/perception/index.htm

I spent a fair bit of Thursday and Friday integrating (our newly upgraded to version 4) installation of Question Mark Perception with our Single Sign On system.

Perception does not support Shibboleth out of the box, but it does have a web integration layer called QMWise that allows external systems to push users and other data into the system bypassing Perception's internal authentication.

A project done at Leeds and funded by JISC has created a Java layer (QMShibb) that sits on top of QMWise and allows you to easily then protect that Java layer with Shibboleth or in fact any Single Sign On system you like.

QMShibb - Shibboleth enabling Questionmark Perception
QMShibb - Installing Tomcat with IIS 6
QMShibb - Installation, configuration and testing

With the help of a Perception consultant we got this working with our own Shibboleth based Single Sign On system sitting on top of the QMShibb java layer. So, hopefully we'll very soon be able to do a lot more with Perception as the old user management overhead has now more or less gone away.

To truely minimise the admin overhead, more work should be done with QMWise so that user groupings get pushing into Perception, but the removal of username/password issues will be a great time saver.


June 19, 2006

SpringOne conference

Writing about web page http://www.springone.com/

I spent last Wednesday, Thursday and Friday in Antwerp attending the SpringOne conference.

It's the first conference dedicated to the Spring Framework which is what we use to develop the majority of our application here in e-lab.

Chris blogged every session he went to so I won't go over those again.

For me the most interesting parts that came out of it were:

  1. New features of Spring 2.0 explained a lot better than the documentation (which is improving but still doesn't cover Spring 2.0 well enough I don't think)
  2. Discussion of rich domain models was interesting (which we've tried to do for a while), but everyone was talking like it was all new. There are a couple of new methods for achieving this with AOP or Spring 2.0's new @Configurable annotation
  3. Productivity improvements with Spring MVC, primarily the move towards convention over configuration (a big nod to Ruby)
  4. New namespace support in Spring 2.0 that will hopefully greatly reduce the reams and reams of XML that you always have to create and maintain
  5. AOP/AspectJ support, was there in Spring before 2.0, but it is much more powerful and documented now, it was great to see real world examples, even if they did sometimes take it too far
  6. Voca's presentation about how they had up to 500 people working on their new Spring based application that basically runs the UK's bank transfer systems, 100,000,000 transactions in 4 hours!!!

September 2023

Mo Tu We Th Fr Sa Su
Aug |  Today  |
            1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30   

Tags

Search this blog

Most recent comments

  • One thing that was glossed over is that if you use Spring, there is a filter you can put in your XML… by Mathew Mannion on this entry
  • You are my hero. by Mathew Mannion on this entry
  • And may all your chickens come home to roost – in a nice fluffy organic, non–supermarket farmed kind… by Julie Moreton on this entry
  • Good luck I hope that you enjoy the new job! by on this entry
  • Good luck Kieran. :) by on this entry

Galleries

Not signed in
Sign in

Powered by BlogBuilder
© MMXXIII