All 13 entries tagged Web

View all 68 entries tagged Web on Warwick Blogs | View entries tagged Web at Technorati | There are no images tagged Web on this blog

June 25, 2008

hCalendar annoyances

Writing about web page

The BBC have announced that they’re going to stop using the hCalendar microformat in their markup, because of accessibility issues with the way that hCal uses the <abbr> HTML element.

Back up a minute. The what now?

hCalendar is a microformat; a way of embedding machine-readable data into HTML with a minimum of fuss.

So, I might start with a bit of HTML like this:

<h2>Exciting Meeting</h2>
<span> July 12th, 10am-11am</span>
<span> Meeting room 3</span>

... and then I add a little bit of extra markup, which doesn’t affect the display of the element*, but just adds some supplementary information

* ahem. As we’ll see later.

<div class="vevent"> 
<h2 class="summary">Exciting Meeting</h2>
<span><abbr class="dtstart" title="2008-07-12T10:00:00+01:00"> July 12th, 10am</abbr>-<abbr class="dtstart" title="2008-07-12T11:00:00+01:00">11am</abbr></span>
<span class="location" > Meeting room 3</span>

Now, I’ve got something which can be machine-read and unambiguously converted into an event. There’s a nice little Firefox plugin called operator which can spot this kind of markup and offer to add it to google calendar (or other calendars). There are similar microformats for contact information, reviews, licenses, and a bunch of other stuff.

Now, this is exactly the kind of low-ceremony, small-s-semantic-web technology that I like. It’s easy to get into, and you can bung it onto all of your pages. It’s easy to consume; you can write parsers for it in a few lines of code. It doesn’t really matter if only a few people use it, because there’s almost no cost to implementing it.


The use of the <abbr> element to hold machine-readable dates is contentious, to say the least. It’s technically correct (or at least arguable) to say that “11am” is an abbreviation of “2008-07-12T11:00:00+01:00”, but there are two downsides; one big and one small.

The small downside is that most browsers will render the title attribute as a pop-up when you hover over the abbr . Since the content of the title is pretty much gibberish to your average user, this is unhelpful.
A much bigger deal, though, is that many screen readers will read out the title as a big string of numbers. Here’s the market leader, JAWS, on IE7 reading the markup

<abbr class="dtstart" title="20070312T1700-06">
 March 12, 2007 at 5 PM, Central Standard Time

(taken from hAccessibility )

- and this, AIUI, is what has driven the BBC away from hCalendar. Embedding hCal might be beneficial for the small number of users that have Operator installed, but for anyone using JAWS it’s a massive inconvenience.

Now, we use hCalendar ourselves; if you go to a Sitebuilder Calendar page (like this one) and choose “Agenda view” from the little drop-down on the RHS, you’ll see Operator spring into life and spot all the events.

“Why do I have to choose ‘Agenda View?’” I hear you ask – and this exposes another quirk of hCalendar. hCalendar is supposed to add markup to visible content, and since our grid-based calendars don’t display the date of an event (dur, that’s what the grid’s for) there’s not enough markup for Operator to latch onto. Annoyingly, there is complete hCalendar markup for each event in the page’s HTML, but it’s hidden (set to display:none) and only shows as a pop-up. Operator, by default, ignores non-visible markup, and isn’t smart enough to re-parse when it becomes visible. (There’s a preference setting to display hidden events, but it’s off by default).

So, I’m now asking myself: should we consider dropping our hCalendar support? I view the BBC’s web standards as a good measure of pragmatism vs. idealism, and since we’re both kind of in the same space (we’re both non-profit-making institutions, supposedly acting for the benefit of all), I generally hold that what’s good for them is good for us too. I’ve never had a complaint from a JAWS user that they couldn’t use our calendars, but then I’ve never had feedback from an Operator user that they loved our embedded hCal either. I personally find it very useful, but then if I designed all our systems only to meet my needs, they’d be rather different (and I’d be looking for a job…)

Hopefully, the microformats community will work out a suitable solution. This problem has been known about for well over a year now, and largely ignored, but perhaps the BBC’s stance will prove to be enough to bring about an improvement. If not, I fear that until HTML5 and the <time> element become widespread, hCalendar is not going to gain much acceptance :-(

October 05, 2005

How are newlines represented in an HTML textarea ?

Writing about web page

OK, this is just a blatant attempt at stuffing a result into google, since it took me a while to find this out. I hope someone else finds it useful.

If you have an HTML text area, and you want to parse the content of said textarea as a list of lines, you might be wondering what character you need to look for to indicate a line break. \n ? \r\n? That wierd thing that macOS 9 does?

Contrary to what you might expect, the answer is not "it depends on the platform". The merciful W3C (blessed be their spec.) decreed it to be constant regardless of what hare-brained operating system choice your user has made.

If your form's enctype is application/x-www-form-urlencoded (the default) , then the answer is '%0D%0A' i.e. CRLF, url-encoding. If the enctype is multipart/form-data, then it's an unencoded CRLF. This behaviour, it turns out, is inherited from the MIME specification.

Since any binding library worth it's salt will decode the url-encoding for you, the string that you should look for is CR-LF. This is not the same as the default line separator (LF) on any unix box – if you use that, you'll end up with a spurious \r character at the end of each line of parsed text.

October 04, 2005

Http Digest Authentication

You may want to skip this entry; unless you're interested in HTTP it won't be terribly interesting. I'm testing out the theory that writing something down is a good way to understand it, since this is about the third time I've tried to get Digest Auth to stick in my head.

So, here goes:

Digest authentication (spec ) is one of the standardised HTTP authentication mechanisms. It was designed to protect against some of HTTP Basic's more egregious failings (such as the fact that it passes the user credentials in plain text )

At it's most basic, it works as follows

  • Client requests a resource which is protected.

  • Server responds with HTTP 401, and the header line
        WWW-Authenticate: Digest realm="Some realm", domain="/urlspace",nonce="long_random_string" 
  • client re-submits the first response, this time with an additional header
       Authorization: Digest username="user",
realm="some realm",
response="md5 hash of username, pwd, uri, method, and nonce"
  • Server verifies hash and serves response.

The optional qop (Quality of Protection) parameter in the www-authenticate header specifies which additional safeguards are to be used. In particular, if qop=auth-int is specified by the server, then the client returns a cnonce (client nonce) value, which is used as part of the hash. This stops a malicious proxy from specifying a nonce value designed to make cracking the hash possible.

Most implementations refine this by using a nonce that varies with time. However, there are some performance issues to consider here; if you vary the nonce on every request then parallel (pipelined) requests become impossible. Since virtually every browser now supports pipelining, this will have a fairly serious impact. The optional nc (nonce-count) attribute in the Authorization header allows the server to periodically supply a fresh nonce for further use.

Digest authentication more or less completely disables proxy cacheing, unless the response is marked as 'Cache-control: public' or 'Cache-control: must-revalidate' (in the latter case, the cache must HEAD the request back to the original server to verify before serving to the client).

Whilst digest authentication does provide reasonably good protection of user credentials, and (with a sufficently short-lived nonce), can also prevent replaying of requests, it does nothing to protect against packet-sniffing to extract content. For this, HTTPS is required. (In which case, many of the motivations for not using Basic go away).

Client support for digest authentication is good in modern browsers, but pretty shonky in V4 and earlier user-agents.

So, in summary:

  • HTTP Basic BAD
  • Digest Better
  • HTTPS + Basic good
  • HTTPS + Digest not appreciably better.

September 26, 2005

Logging request durations in jboss/tomcat

Step 1: Find the tomcat config file. For tomcat 4 this is in


for tomcat 5 it's


Step 2: change this

<Valve className="org.apache.catalina.valves.AccessLogValve"
                        prefix="localhost_access" suffix=".log"

to this

<Valve className="org.apache.catalina.valves.AccessLogValve"
                        prefix="localhost_access" suffix=".log"
                        pattern='%h %l %u %t %D "%r" %s %b"' directory="${jboss.server.home.dir}/log"/>

n.b. in jboss 3.2.7 you'll need to uncomment the valve as it's disabled by default.

Easy. A few lines of {insert scripting langauge of choice} can then give you a 'maximum resource consumer' type report.


So, first day of term. I won't be getting too much coding done today. Instead, I'll be spending the day with one eye stuck to the green-screen (our application performance monitor) and the other on ganglia (the server monitor), to see how the uni web server stands up to the first day of term. This is the first time we've served the home pages from Sitebuilder, so there's a lot of extra load compared to previous years.
So far, not too bad. request times are a bit slower than usual for logged in users, but we're handling about 1200 page impressions (about 3500 hits ) per minute at the moment, and it seems to be OK.
By about wednesday, if past experience is anything to go by, I'll have regained my faith in the server enough to concentrate on other stuff for more than 15 minutes at a time (assuming it doesn't break in the meantime) …

September 07, 2005

How to watch HTTP headers

I suspect I am the only person who will ever need this, but I'm sufficiently pleased with it to make a permanent record here:

will print the HTTP headers for each request/response pair, without any of the HTML guff (which, lets face it, is rarely interesting).

The awk trickery came from this useful reference page

postscript: textile doesn't seem to be making a very good job of rendering the hat characters. I'll see if I can entify them or something…

post-postscript: HAH! I spurn textile. from now on I'm going to make all my posts in .png format. maybe.

September 06, 2005

Reducing turnaround time in web–app development

So, inspired by a brief daliance with Ruby On Rails, I started to think about how I could cut down the turnaround time for making changes to web apps written in Java. Now I start to think about it, it's really a pretty bad situation.

Take the new search application, for instance. If I make a change to a class, then it takes about 10 seconds to run the ant build, another 10 for the app. to deploy (basically, for the .war file to unpack), another 10 for the app. to start (spring config parsing and hibernate initialisation time), and another 3–4 seconds for the JSPs to compile. Same for a change to a JSP, a config file, or in fact more or less anything inside the .war. Add to that the fact that, since Spring+Hibernate+Jboss leaks memory on a redeploy, every 5 re-deploys needs a complete restart of the app server (about 1 minute). So for 10 changes that's 10 * 35 seconds + 2 * 1 minute – just about 8 minutes of thumb-twiddling. Not Good Enough.

Now, an awful lot of this is just un-necessary, and has arisen largely from inertia. So, step 1: get rid of JBoss, install Tomcat. Server restart time goes from 1 minute to about 10 seconds, and the server seems to leak much less memory – only needs a restart every 20 or so redeploys.

Step 1.1: Fiddle about for ages trying to work out how the frig you're supposed to get a datasource to work and register in JNDI. For future reference, this is how to do it with tomcat 5.5

– create a file in ${CATALINA_HOME}/conf/Catalina/localhost/ called {context}.xml

– make it look like this

<Context path="/search" docBase="${catalina.home}/webapps/search" 
        debug="5" reloadable="false" crossContext="true">
<Resource name="jdbc/searchDS" auth="Container" type="javax.sql.DataSource"

(n.b. this makes the datasource available at java:/comp/env/jdbc/searchDS)

Step 2: Tweak the project structure slightly so that within the project there's a deployable exploded war file. This isn't too hard, using eclipse's ability to specify multiple build directories – just make sure the class files etc are properly excluded from CVS. symlink the war directory into tomcat's webapps dir.

– I now have the ability to make changes to JSPs and the like and have them immediately update (+/- 5 seconds for recompilation) and we can redeploy the entire app. using the manager web application in about 10 seconds (no need to run ant or unpack the war file)

Step 3: Run tomcat in debug mode from within eclipse. You can set this up by hand, or use the Sysdeo tomcat plugin. I did the latter because it's less configuration effort. This allows you to use eclipse's HotSwap technology to make some kinds of changes to classes on the fly, without needing a redeploy. You still have to redeploy a change to a config file, or an incompatible class change (basically, adding new fields or non-private methods), but it cuts out about 30% of the redeploys. Incidentally, it's important to mark your context as reloadable="false" else tomcat will watch for class changes and re-deploy the whole app (10 seconds) every time it sees a change, rather than allowing it be be hotswapped ( which takes no discernable time)

So, the result is that my 10 changes are down from 8 minutes to about 1 minute – a worthwhile saving, I think, and it's probably already saved me one of the two hours it took to set up (mostly sorting out the wretched datasource).

The next challenge, though, is to take an application that's a bit more complicated than search (which is, after all, just a single web app. with half-a-dozen persistent classes). I'd really like to get the same kinds of benefit for Sitebuilder, because there's much more to gain there. A typical sitebuilder change requires a 2–3 minute ant build (inc. XDoclet re-generation of EJBs, checkstyle tests, and 200-odd unit tests) and deployment of 1 JMX SAR, 1 EJB JAR, and 1 WAR - about 1 minute from start to end, plus it requires an app server restart every few deploys. Sucky. As you can imagine, we rely a lot more on unit tests rather than redeploying the app constantly, but it's not an ideal solution. In the medium term, the in-progress sitebuilder rewrite will fix this (by ditching all the EJB crap), but in the short term we could save a lot of work by

– a smarter build that skipped unnecessary steps (like XDoclet regeneration when the sources haven't changed)
– deploying exploded packages, and fixing the classloader configs so that we can selectively redeploy just the .WAR, or just the .WAR + ejb-jar
– symlinking the exploded .WAR back into the project to allow JSP changes on-the-fly

update For my own future enlightenment, here's how to do the equivalent datasource stuff in tomcat 5.0.x

<Context path="/sitebuilder2" docBase="${catalina.home}/webapps/sitebuilder2"
        debug="5" reloadable="false" crossContext="true">
        <Resource name="jdbc/CmsDS" auth="Container" type="javax.sql.DataSource"/>
<ResourceParams name="jdbc/CmsDS">


September 01, 2005

Can you do big applications in little languages?

Follow-up to Ruby vs Java from Secret Plans and Clever Tricks

In a comment on my previous post, Jon said

My overall impression of Ruby on Rails is that it might be good for getting things going quickly, but it's bad for building large, stable, maintainable systems

I think this is interesting enough to examine in more detail. Though before I start, I'd better add a disclaimer: I'm a Java programmer, and whatever I might say in the rest of this entry, I'm likely to remain one for the forseeable (I hope!)

Anyway, on with the show. I think you can take two approaches to the statement above. Approach 1 is the easy one: Sure, you wouldn't write an airline reservation system in RoR, any more than you'd use J2EE to munge the output of top, but how large is large? Does any UK university have a bespoke web system which is too big to manage in RoR ? Sometimes, Java programmers can be guilty of treating every application as if it was the flight control system for a 747, when it's really just CRUD for 3 database tables.

…which leads me to approach 2, the more interesting approach. What are the limiting factors for an app. written in RoR? Or PHP+{Cake/Biscuit/Mojavi/etc.}, since it shares many of the same characteristics ?

I think that scalability in terms of performance is a complete red herring. There are any number of mahooosive apps running on LAMP architectures – tens or hundreds of millions of hits a day. There are less for Rails, in part because it's much newer technology, but I don't see anything there that makes me think it wouldn't scale in the same way.

Similarly stability, at least in terms of uptime. PHP and Rails' shared-nothing, sessionless architectures actually (ISTM) make it easier to provide resilience in the form of load-balanced servers, and the periodic cycling of httpd workers makes worries about memory leaks and the like much less of a big deal. Again, looking at the real world there are loads of LAMP sites whose application uptime is up above 99.99%; I'd contend that there are very few web applications with an uptime requirement that couldn't be met with a scripting langage-based architecture.

So we're left with questions of maintainability, which is where it gets interesting. There's absolutely no doubt in my mind that there are some awful bits of PHP out there running stuff on the internet. Not least because I've seen and had to clear up some of it. But there's some bloody terrible java too. And when you consider Ruby, the picture gets even muddier; Ruby is at least as OO as Java, if not more. There's nothing inherehent in Ruby that's any more likely to make you write crap than there is in Java.

At the end of the day, maintainability is, ISTM, a people issue and not a language issue. And this may be one area where Java scores. Because the barrier to entry for java apps is higher, (a) the average coding skill level is higher, and (b) the average team size (and thereby probability of at least one good coder involved in the project) is higher. But that just suggests that the same team ought to be able to produce equally maintainable code, regardless of platform – so they should choose whichever one makes the job easier.

There are a few thing that do weigh heavily in Java's favour though. Decent tooling (IDES, build tools, etc) and, arguably more importantly, high quality libraries for core stuff like socket malarkey, threading, unicode handling and XML parsing.

Nontheless, I'm skeptical of the claim that a scripting langage "can't do" big complicated applications. It feels a little bit like something that (to paraphrase Cal Henderson) "Is said to be true because it would be good if it was true".

Still, I'm sticking with Java. It's like a favourite jumper; sure it might be a bit scratchy, and some of those new t-shirts the cool kids are wearing sure look good, but I can't quite bring myself to risk getting caught out in the cold :-)

August 31, 2005

Ruby vs Java

Writing about web page

Actually, it's not a Ruby vs. Java post as such, if you want a language p*ssing contest you can look here.

However, following on from last week's Flickr event, I've been devoting a bit of time to thinking about the alternatives to our current J2EE deveopment environment, and whether we can learn anything from them. I couldn't quite bring myself to try PHP, but Ruby seemed like an a suitable point of comparison.

So… Language-wise, Ruby is quite nice. It's properly OO, dynamically typed, with a reasonable exception system. Using begin/end instead of { and } makes my toes curl a bit, but at least it's optional.

Rails is to Ruby as (approximately) JSP,Spring&Hibernate (or JDO&JSF) is to Java; an MVC-ish framework, a templating language and a persistence framework. It's really easy to do basic CRUD in; the framework does most of the work for you and there are code-generators to get you started. However, if you want that sort of thing in Java you can have it, with something like appfuse

The (apparent) lack of a decent IDE is aggravating; I've got pretty used to just banging on ctrl-. ('fill the next bit in') and ctrl-1 ('fix this error') in eclipse, and having to go back to vi was a bit of a slap in the face. ISTM that this is one of the big disadvantages of a dynamically typed language. But the tradeoff is the instant deploy: change code, hit refresh, view results. I'd forgotten how efficient that makes things; I must try and get that working in eclipse again. This is especially a problem with Spring and Hibernate, both of which take ages to post-process a deploy for various reasons

For my next trick, I'm going to try and do something which isn't quite standard CRUD, to see if Rails is trading off flexibility for ease-of-use or not.

August 25, 2005

Deploy every 30 minutes: Redux

Follow-up to Release once every half hour from Secret Plans and Clever Tricks

So, it's true, they really do deploy (up to) every half an hour. Pretty cool; In answer to the two questions from the previous entry:

  • Why do they do it?

Because they can. Because, for them at least, 'Release early, Release often' doesn't become any less effective, for any value of often. The smaller and quicker the releases, the less chance of regression, the faster features get to users, and the sooner feedback comes back to the team. Basically, they release pretty much every feature and bug-fix as soon as it's complete – they don't really bother with 'batching' releases like we do.

  • How do they do it?

There are a number of tricks:

– A true on-click build and deploy. It looks like this.

For most types of changes, the implementation is nothing more than a simple rsync from CVS HEAD onto the server farm. Sometimes an apachectl graceful, a squid/memcached flush or a restart of the various daemons is also required, but it's a zero-downtime thing even under peak loads.

– A similar on-click rollback to any previous version if it all goes wrong.

– A good separation of layers in the application to minimise liklihood of collateral damage from a change.

– A small team with pretty complete knowledge of the code that they're updating. (2 java programmers, 2 PHPers, a designer and a front-end guy)

– A component-based application infrastructure with clear interfaces between components. Most application logic is coded in PHP, with cacheing provided via Squid and memcached, and key long-running daemon processes in java. Individual components can be redeployed with comparatively little risk to other components, so long as interfaces aren't modified.

Surprisingly, there's not much weight put on automated testing. There are automated test suites for some key black-box components (the email parser for instance) and a few functional tess (written using perl's www:mechanize) but mostly they're reliant on developers testing on the staging server (though they now also have the services of yahoo's 'surfers' user-testing department).

Well. Quite frankly I'm jealous. Not that I don't think we could do it too, but it would be a lot of work to get there from where we are now. ISTM, also, that the benefits of of this set-up are a bit like doing XP; until you actually get everything working together you don't see much benefit, but once you do, suddenly it changes all the rules.

Most recent entries


Search this blog

on twitter...


    RSS2.0 Atom
    Not signed in
    Sign in

    Powered by BlogBuilder
    © MMXXI