New Search Interface
Writing about web page http://search.warwick.ac.uk
For the past month or so I’ve been working on a redesign and reworking of the University of Warwick’s search engine at search.warwick.ac.uk. This has involved a complete rewrite of the view model for the search engine and a whole lot of CSS and HTML and banging my head against a brick wall trying to fathom IE’s bizarre rendering engine (I don’t even see it as a bad rendering engine anymore, just a strange one that doesn’t seem to do anything as you’d expect it).
Away from the new shiny monsters that work on the surface of Planet Search, there’s been a few conceptual changes that we’ve tried to work in to try and make the service more useful to users:
One search to rule them all
When search was originally released, as a replacement for an Atomz search service we used to run, it indexed two types of content – blog entries from Warwick Blogs and static pages from Warwick’s web site. This was an improvement on the Atomz search engine because it allowed us to personalise search results for users – users were searching a personalised result set of the content that they could see on the web site and Warwick Blogs, rather than what the whole world could see.
However, the two search pages were exposed as different services (although they were combined into a single index under the hood) with different visual styles and confined only to their own search targets. As part of the new design, we wanted to change this concept more towards a hub of information – if someone is searching for cycling on the University’s web site, for example, it is massively beneficial for the user to also be able to see the extensive entries on Warwick Blogs about cycling, which we think is a big win for the user.
Since we now also index people at Warwick and exam papers, this has other big wins when people search from the top-right search box on Sitebuilder (the Warwick web site) or another enabled application – when people search from the blogs homepage, a lot of the time they’re not searching for a particular entry but for a particular person’s blog. Searching for Mat Mannion from the blogs homepage for example, shows that there is a single search result in “People” as well, which then links to my blog. All are happy.
Help people to what they should be finding
The Nielsen Norman Group’s (NN/g) report on search usability suggests that a search engine should create “custom results pages” if the query:
- occurs very frequently
- is a general and vague term that indicates the user needs an introduction
- generates too many results
As part of the new design we’ve added a lot of “helper” content at the top of results for most of our popular terms. In general, if you search for a department you will find that we now place links to our new Interactive campus map at the top of results, on the basis that a lot of people looking for the department will want to know how to get there. We also leverage some of the content that we have indexed for a while but don’t expose very well, for example when you search for the arts centre you not only get a map and a search box to search the Arts Centre’s website directly, but also a list of what’s on at the arts centre today (which we leverage from an RSS feed the arts centre publishes).
Sometimes this can be a real benefit when there isn’t really a page that satisfies the user’s needs, or the most popular use cases aren’t handled well by the page. A good example of this is the Health Centre – the vast, vast majority of users who search for the health centre want to know either:
- where it is,
- what the opening hours are, or
- what the contact details are
By providing this information right at the top as the first thing the user sees, the user doesn’t need to go to the Health Centre website, which is obviously one for the win column. Another example of this is a search for the panorama room (where a lot of exams happen, so very popular at this time of year) – where we give a hint to where the Panorama Room is and a map to the location of the building.
Leverage our increasing array of audio/visual content
One of the things we’ve tried to do in the new design is to try and showcase the array of audio (mp3) and video (flv) files we have uploaded to the Warwick web site. We have two professionally made sources of audio and video content in Warwick Podcasts and Warwick iCast and it’d be really nice to showcase it. Video on the web is obviously big at the moment, so we’ve made a big effort to present it in a nice way!
Technical Guff
For those who are interested, some nice technical stuff that came out of the project…
Weight recently updated documents higher
Since there is a lot of old content on Sitebuilder, we wanted to try and ensure that the more recently updated content appeared higher in search results (although this is only a portion of the scoring, so even the oldest content in all the land will appear at the top if it is what the user is specifically looking for). While it is fairly easy to do this linearly (i.e. to score between 0 and 1 between the oldest content and the newest content, with 70% “new” content scoring 0.7) we decided it would be beneficial to take a slightly more weighted approach, where documents that were, say, within the last year, didn’t reduce the score from 1.0 too much.
Enter, cosines!

We used a FunctionQuery for this, comme ca:
FunctionQuery fq = new FunctionQuery(new ReciprocalFloatFunction(new ReverseCosineFieldSource(DocumentFields.LAST_MOD_DATE),2, 500, 500));
fq.setBoost(20.0f);
masterQuery.add(fq, Occur.SHOULD);
(You’ll have to play with the boost and searcher.explain(query) to get the sweet spot).
Code: FunctionQuery Lucene Extension
Getting a preview image of an FLV
This one was a toughie, simply because getting the right incantation to pass into FFmpeg required a lot of fannying about. Thankfully, in the end, it proved quite easy…
1. Get the total length and size of the video:
ffmpeg -i (source file) 2>&1
This will actually be an ffmpeg failure, but it will inspect the file and give you a response that you can use, something like:
Seems stream 0 codec frame rate differs from container frame rate: 1000.00 (1000/1) -> 25.00 (25/1)
Input #0, flv, from 'feedback.flv':
Duration: 00:07:49.9, start: 0.000000, bitrate: 96 kb/s
Stream #0.0: Video: vp6f, yuv420p, 400x225, 25.00 fps(r)
Stream #0.1: Audio: mp3, 44100 Hz, stereo, 96 kb/s
Must supply at least one output file
It’s the duration that matters here. A little bit of regex goodness and a lot of fannying about, and you can get a string representing a third of the duration (in this case, about 2 and a half minutes, so 00:02:30.0). You’ll also wanna grab the size out of it, in this case 400×225. You’ll need to sanitise your size to be a power of two
2. Output the preview image as a jpg
ffmpeg -i (source file) -vcodec mjpeg -an -ss (third of duration) -t 00:00:00.1 -r 1 -y -an -f rawvideo -s (size) (output location) 2>&1
or, for example:
ffmpeg -i feedback.flv -vcodec mjpeg -an -ss 00:02:30.0 -t 00:00:00.1 -r 1 -y -an -f rawvideo -s 400x224 feedback.jpg 2>&1
et voila:

Alternatively (and with a lot less hassle/regexing) you can just take the first frame of the video:
ffmpeg -i (source file) -vcodec mjpeg -vframes 1 -an -f rawvideo -s (size) (output location) 2>&1
Basically FFmpeg will do anything you want, providing you can coax it…




Mathew Mannion

Loading…
Nick Howes
Though I had nothing to do with the new Search stuff, I will claim the cosine idea as mine!
22 May 2008, 12:29
Mathew Mannion
Since I have a hand-drawn graph on my desk, I’m claiming at least partial ownership…
22 May 2008, 13:21
Chris May
That’s a very unflattering frame capture. He looks as if the Alien is just about to burst from his chest.
22 May 2008, 15:35
Mathew Mannion
As it turns out, a third of the way through is almost always an unflattering pose… That said, everyone must have or experience their John Hurt Moment
22 May 2008, 15:46
Steven Carpenter
FFMPEG won’t quite do the works for us yet, like convert our recordings to H.264, but I know Mat Will Make It Happen. :-)
22 May 2008, 16:39
Add a comment
You are not allowed to comment on this entry as it has restricted commenting permissions.