All entries for Friday 16 July 2010

July 16, 2010

HTML5 History API – an accident waiting to happen?

Writing about web page http://www.w3.org/TR/html5/history.html

HTML5 has a very fancy new feature, which allows clients to manipulate the browser’s history via javascript.

This is designed as a replacement for all those horrible hacks that use #fragments at the end of URLs to hold state on pages that use AJAX to load content. However, as currently implemented, it looks like a bit of a disaster for sites with a lot of editors.

window.history.replaceState(data, title [, url ] ) 

allows you to replace the current entry in the history with an arbitrary URL. Flickr use this in their Lightbox interface – if you go to http://www.flickr.com/photos/chrismay/4797065896/in/photostream/lightbox/ in a browser that supports it (Chrome or Safari) then you can see that “next/previous” change the URL bar, but if you look at the network traffic you’ll see that there’s never a request for the new URLs, and neither is their any kind of loading event fired.

What this means, is that if I can write javascript on a webpage, then I can change the apparent URL of that webpage to whatever I want, whether or not that page really exists.

Per the spec, I can change the path of the URL to whatever I like, so long as I keep the same host and protocol.

So, if I wanted to, I could quite easily write a blog entry which did the following:
  • publish a tinyURL link to a blog post I’d written
  • changed the URL on that post to “http://blogs.warwick.ac.uk/login”
  • Replaced the page’s content with something that looked a bit like Web Sign-on, and said “Sorry, your session has expired. Please re-enter your password”
  • Collect usernames and passwords, and do evil stuff.

Now, smart people like you and I would spot that all logins to WarwickBlogs happen via https://websignon.warwick.ac.uk, and thereby smell a rat. But the vast majority of users wouldn’t pick up on this; an official-looking URL and a familiar looking page would be enough to catch them.

Or how about this: I make a page on the university website that looks like the official news page, and says “University closed today due to zombie attack”. Then I use history manipulation to change it’s URL to http://www2.warwick.ac.uk/newsflash, and put out a twitter message along the lines of “RT @warwickuni: University closed today http://is.gd/uowzombies”. No-one comes to work and the university is in trouble.

It’s clear to me that this API is a potential problem for sites like ours, where we have large numbers of contributors (about 5000 active editors in our case) able to create content that includes javascript. At present, the only defence is a policy-based one; anyone who chose to take advantage of these features would rapidly find themselves with a great deal of free time to re-read our A.U.P. and Computer Misuse regulations*. But if I were in charge of the HTML5 spec, I’d like to add in a few additional constraints:

  • I’d like a way to turn off history-manipulation on a domain-by-domain basis. Some kind of HTTP header would be fine, or alternatively a file under the root a la robots.txt (“features.txt” if you will) that could offer advice to browsers about whether or not they should enable features like this one.

Most egregious of all is that this breaks a very well-established principle of browser behaviour; that you can’t change the URL without reloading the page. A lot of sites have based a lot of security assumptions around this principle, and since Chrome and Safari implement these APIs against any doctype (not just HTML 5), those assumptions are all now broken.

Ah well. I look forward to seeing how this plays out on large sites like ours. Who’s going to get caught out first?

* You Have Been Warned. Don’t be silly.


Most recent entries

Loading…

Search this blog

on twitter...


    Tags

    Not signed in
    Sign in

    Powered by BlogBuilder
    © MMXXII