May 22, 2008

New Search Interface

Writing about web page http://search.warwick.ac.uk

New Design for search.warwick.ac.uk

For the past month or so I’ve been working on a redesign and reworking of the University of Warwick’s search engine at search.warwick.ac.uk. This has involved a complete rewrite of the view model for the search engine and a whole lot of CSS and HTML and banging my head against a brick wall trying to fathom IE’s bizarre rendering engine (I don’t even see it as a bad rendering engine anymore, just a strange one that doesn’t seem to do anything as you’d expect it).

Away from the new shiny monsters that work on the surface of Planet Search, there’s been a few conceptual changes that we’ve tried to work in to try and make the service more useful to users:

One search to rule them all

Search results (old design; Sitebuilder) Search results (old design; Blogs)

When search was originally released, as a replacement for an Atomz search service we used to run, it indexed two types of content – blog entries from Warwick Blogs and static pages from Warwick’s web site. This was an improvement on the Atomz search engine because it allowed us to personalise search results for users – users were searching a personalised result set of the content that they could see on the web site and Warwick Blogs, rather than what the whole world could see.

However, the two search pages were exposed as different services (although they were combined into a single index under the hood) with different visual styles and confined only to their own search targets. As part of the new design, we wanted to change this concept more towards a hub of information – if someone is searching for cycling on the University’s web site, for example, it is massively beneficial for the user to also be able to see the extensive entries on Warwick Blogs about cycling, which we think is a big win for the user.

Since we now also index people at Warwick and exam papers, this has other big wins when people search from the top-right search box on Sitebuilder (the Warwick web site) or another enabled application – when people search from the blogs homepage, a lot of the time they’re not searching for a particular entry but for a particular person’s blog. Searching for Mat Mannion from the blogs homepage for example, shows that there is a single search result in “People” as well, which then links to my blog. All are happy.

Help people to what they should be finding

The Nielsen Norman Group’s (NN/g) report on search usability suggests that a search engine should create “custom results pages” if the query:

  • occurs very frequently
  • is a general and vague term that indicates the user needs an introduction
  • generates too many results

As part of the new design we’ve added a lot of “helper” content at the top of results for most of our popular terms. In general, if you search for a department you will find that we now place links to our new Interactive campus map at the top of results, on the basis that a lot of people looking for the department will want to know how to get there. We also leverage some of the content that we have indexed for a while but don’t expose very well, for example when you search for the arts centre you not only get a map and a search box to search the Arts Centre’s website directly, but also a list of what’s on at the arts centre today (which we leverage from an RSS feed the arts centre publishes).

Sometimes this can be a real benefit when there isn’t really a page that satisfies the user’s needs, or the most popular use cases aren’t handled well by the page. A good example of this is the Health Centre – the vast, vast majority of users who search for the health centre want to know either:

  • where it is,
  • what the opening hours are, or
  • what the contact details are

By providing this information right at the top as the first thing the user sees, the user doesn’t need to go to the Health Centre website, which is obviously one for the win column. Another example of this is a search for the panorama room (where a lot of exams happen, so very popular at this time of year) – where we give a hint to where the Panorama Room is and a map to the location of the building.

Leverage our increasing array of audio/visual content

Search results (med school, audio/video)

One of the things we’ve tried to do in the new design is to try and showcase the array of audio (mp3) and video (flv) files we have uploaded to the Warwick web site. We have two professionally made sources of audio and video content in Warwick Podcasts and Warwick iCast and it’d be really nice to showcase it. Video on the web is obviously big at the moment, so we’ve made a big effort to present it in a nice way!

Technical Guff

For those who are interested, some nice technical stuff that came out of the project…

Weight recently updated documents higher

Since there is a lot of old content on Sitebuilder, we wanted to try and ensure that the more recently updated content appeared higher in search results (although this is only a portion of the scoring, so even the oldest content in all the land will appear at the top if it is what the user is specifically looking for). While it is fairly easy to do this linearly (i.e. to score between 0 and 1 between the oldest content and the newest content, with 70% “new” content scoring 0.7) we decided it would be beneficial to take a slightly more weighted approach, where documents that were, say, within the last year, didn’t reduce the score from 1.0 too much.

Enter, cosines!

age over time (cosine 0 to pi/2)

We used a FunctionQuery for this, comme ca:

FunctionQuery fq = new FunctionQuery(new ReciprocalFloatFunction(new ReverseCosineFieldSource(DocumentFields.LAST_MOD_DATE),2, 500, 500));
fq.setBoost(20.0f);
masterQuery.add(fq, Occur.SHOULD);

(You’ll have to play with the boost and searcher.explain(query) to get the sweet spot).

Code: FunctionQuery Lucene Extension

Getting a preview image of an FLV

This one was a toughie, simply because getting the right incantation to pass into FFmpeg required a lot of fannying about. Thankfully, in the end, it proved quite easy…

1. Get the total length and size of the video:

ffmpeg -i (source file) 2>&1

This will actually be an ffmpeg failure, but it will inspect the file and give you a response that you can use, something like:

Seems stream 0 codec frame rate differs from container frame rate: 1000.00 (1000/1) -> 25.00 (25/1)
Input #0, flv, from 'feedback.flv':
  Duration: 00:07:49.9, start: 0.000000, bitrate: 96 kb/s
  Stream #0.0: Video: vp6f, yuv420p, 400x225, 25.00 fps(r)
  Stream #0.1: Audio: mp3, 44100 Hz, stereo, 96 kb/s
Must supply at least one output file

It’s the duration that matters here. A little bit of regex goodness and a lot of fannying about, and you can get a string representing a third of the duration (in this case, about 2 and a half minutes, so 00:02:30.0). You’ll also wanna grab the size out of it, in this case 400×225. You’ll need to sanitise your size to be a power of two

2. Output the preview image as a jpg

ffmpeg -i (source file) -vcodec mjpeg -an -ss (third of duration) -t 00:00:00.1 -r 1 -y -an -f rawvideo -s (size) (output location) 2>&1

or, for example:

ffmpeg -i feedback.flv -vcodec mjpeg -an -ss 00:02:30.0 -t 00:00:00.1 -r 1 -y -an -f rawvideo -s 400x224 feedback.jpg 2>&1

et voila:

Feedback FLV preview image

Alternatively (and with a lot less hassle/regexing) you can just take the first frame of the video:

ffmpeg -i (source file) -vcodec mjpeg -vframes 1 -an -f rawvideo -s (size) (output location) 2>&1

Basically FFmpeg will do anything you want, providing you can coax it…


- 5 comments by 4 or more people Not publicly viewable

  1. Nick Howes

    Though I had nothing to do with the new Search stuff, I will claim the cosine idea as mine!

    22 May 2008, 12:29

  2. Mathew Mannion

    Since I have a hand-drawn graph on my desk, I’m claiming at least partial ownership…

    22 May 2008, 13:21

  3. Chris May

    That’s a very unflattering frame capture. He looks as if the Alien is just about to burst from his chest.

    22 May 2008, 15:35

  4. Mathew Mannion

    As it turns out, a third of the way through is almost always an unflattering pose… That said, everyone must have or experience their John Hurt Moment

    22 May 2008, 15:46

  5. Steven Carpenter

    FFMPEG won’t quite do the works for us yet, like convert our recordings to H.264, but I know Mat Will Make It Happen. :-)

    22 May 2008, 16:39


Add a comment

You are not allowed to comment on this entry as it has restricted commenting permissions.

Trackbacks

<about />

 

I’m a Web Developer in e-lab, part of IT Services at the University of Warwick.

<input type="search" />

<ol id="recentComments">

  • First commit was around 12 months ago, but there was a long period … by Mathew Mannion on this entry
  • So, how long did it take?! by Phil Wilson on this entry
  • Hi Matthew, having a problem putting favourites into folders, it on… by Rupert Elder on this entry
  • I wrote one entry in Chinese. It published the content as lots of q… by Hongfeng Sun on this entry
  • I’m sure there’s a setting somewhere that’d filte… by Nick on this entry

<ol id="archive">

Loading…

Am I still fat?

Not signed in
Sign in

Powered by BlogBuilder
© MMXII