All 5 entries tagged Search
October 05, 2008
Writing about web page http://www.jingproject.com
- Write a script first
- Do your screencast at the size you want to embed it at (I did it at 1024x768 but I really should have done it at 800x600 or even 640x480)
- Cocking up 30 seconds before the end twice means you have to re-record the whole thing. Jing saves as a SWF, so you can't do any editing
- The iMac microphone makes me sound like a douchebag
- Don't use a squaky chair
September 29, 2008
September 24, 2008
I made a little API so that if you add "Warwick Search" to your Firefox as a search engine, it gives up some search suggestions when you start typing now.
That shit's not funny, yo. (It gets the suggestions from the most popular searches previously done)
September 15, 2008
As part of working on Warwick's search engine, in the revamp we've been trying to leverage a lot of data that extends above and beyond just web pages and blog entries. This means we've been indexing a lot of data that is then stored in a database and refreshed periodically (typically, every 5 minutes). This is then included from a number of sources, such as our interactive map of campus and the XMPP/Jabber bot that we run at email@example.com (try adding it to your Warwick IM account!) - but it all comes down to exposing the data in the end.
But hey, it's all about sharing, so knock yourself out with the data...
You can get search results for sitebuilder or blogs (or both) from the endpoint, for example at http://search.warwick.ac.uk/search/query.json?q=accommodation&indexSection=sitebuilder (replace the sitebuilder with blogs, or remove the indexSection altogether to get amalgamated results).
Example usage: A live search, or search results from the XMPP bot.
Free seats in workareas
ITS provides a service at seats.warwick.ac.uk that shows where free seats are in workareas, and where the workareas are.
Example usage: Showing where there are free seats in work areas near to your department; results in interactive map; embedded results in search results; querying the Ask bot:
Live Bus Tracking
Travel West Midlands have started to track the positions of buses in relation to some bus stops in real time (presumably by some radio or satellite system) and are publishing the information to customers at their mobile website. For the stops in and around campus, this is useful information to some, so we've indexed the information (only including those buses that are being tracked in real time).
By amalgamating results from a number of sources (the Students' Union, Warwick Arts Centre, and the internal and external News and Events pages) into a single database, it's easy to query information from it and return a fairly comprehensive set of information about what's on on a particular day.
Example usage: Providing a feed of what's on today on campus; showing what's on in relevant searches; Ask bot results:
Where is this room?
Thanks to Emily in e-lab's tireless work collecting and collating the data, we also now have a database of information on what rooms there are on campus and where they are - 3018 rooms in total! As a former student myself, I've experienced the frustrating of receiving a timetable with rooms on and not knowing where they are. Hopefully this will be a help!
The JSON feed requires a query parameter, for example:
Example usage: Linking room data to a map; results on interactive map; injection into search results:
What this exercise has taught me, if nothing else, is that everything would be a whole lot easier if every bit of data provided me an API to access it!
May 22, 2008
Writing about web page http://search.warwick.ac.uk
For the past month or so I’ve been working on a redesign and reworking of the University of Warwick’s search engine at search.warwick.ac.uk. This has involved a complete rewrite of the view model for the search engine and a whole lot of CSS and HTML and banging my head against a brick wall trying to fathom IE’s bizarre rendering engine (I don’t even see it as a bad rendering engine anymore, just a strange one that doesn’t seem to do anything as you’d expect it).
Away from the new shiny monsters that work on the surface of Planet Search, there’s been a few conceptual changes that we’ve tried to work in to try and make the service more useful to users:
One search to rule them all
When search was originally released, as a replacement for an Atomz search service we used to run, it indexed two types of content – blog entries from Warwick Blogs and static pages from Warwick’s web site. This was an improvement on the Atomz search engine because it allowed us to personalise search results for users – users were searching a personalised result set of the content that they could see on the web site and Warwick Blogs, rather than what the whole world could see.
However, the two search pages were exposed as different services (although they were combined into a single index under the hood) with different visual styles and confined only to their own search targets. As part of the new design, we wanted to change this concept more towards a hub of information – if someone is searching for cycling on the University’s web site, for example, it is massively beneficial for the user to also be able to see the extensive entries on Warwick Blogs about cycling, which we think is a big win for the user.
Since we now also index people at Warwick and exam papers, this has other big wins when people search from the top-right search box on Sitebuilder (the Warwick web site) or another enabled application – when people search from the blogs homepage, a lot of the time they’re not searching for a particular entry but for a particular person’s blog. Searching for Mat Mannion from the blogs homepage for example, shows that there is a single search result in “People” as well, which then links to my blog. All are happy.
Help people to what they should be finding
The Nielsen Norman Group’s (NN/g) report on search usability suggests that a search engine should create “custom results pages” if the query:
- occurs very frequently
- is a general and vague term that indicates the user needs an introduction
- generates too many results
As part of the new design we’ve added a lot of “helper” content at the top of results for most of our popular terms. In general, if you search for a department you will find that we now place links to our new Interactive campus map at the top of results, on the basis that a lot of people looking for the department will want to know how to get there. We also leverage some of the content that we have indexed for a while but don’t expose very well, for example when you search for the arts centre you not only get a map and a search box to search the Arts Centre’s website directly, but also a list of what’s on at the arts centre today (which we leverage from an RSS feed the arts centre publishes).
Sometimes this can be a real benefit when there isn’t really a page that satisfies the user’s needs, or the most popular use cases aren’t handled well by the page. A good example of this is the Health Centre – the vast, vast majority of users who search for the health centre want to know either:
- where it is,
- what the opening hours are, or
- what the contact details are
By providing this information right at the top as the first thing the user sees, the user doesn’t need to go to the Health Centre website, which is obviously one for the win column. Another example of this is a search for the panorama room (where a lot of exams happen, so very popular at this time of year) – where we give a hint to where the Panorama Room is and a map to the location of the building.
Leverage our increasing array of audio/visual content
One of the things we’ve tried to do in the new design is to try and showcase the array of audio (mp3) and video (flv) files we have uploaded to the Warwick web site. We have two professionally made sources of audio and video content in Warwick Podcasts and Warwick iCast and it’d be really nice to showcase it. Video on the web is obviously big at the moment, so we’ve made a big effort to present it in a nice way!
For those who are interested, some nice technical stuff that came out of the project…
Weight recently updated documents higher
Since there is a lot of old content on Sitebuilder, we wanted to try and ensure that the more recently updated content appeared higher in search results (although this is only a portion of the scoring, so even the oldest content in all the land will appear at the top if it is what the user is specifically looking for). While it is fairly easy to do this linearly (i.e. to score between 0 and 1 between the oldest content and the newest content, with 70% “new” content scoring 0.7) we decided it would be beneficial to take a slightly more weighted approach, where documents that were, say, within the last year, didn’t reduce the score from 1.0 too much.
We used a FunctionQuery for this, comme ca:
FunctionQuery fq = new FunctionQuery(new ReciprocalFloatFunction(new ReverseCosineFieldSource(DocumentFields.LAST_MOD_DATE),2, 500, 500)); fq.setBoost(20.0f); masterQuery.add(fq, Occur.SHOULD);
(You’ll have to play with the boost and searcher.explain(query) to get the sweet spot).
Getting a preview image of an FLV
This one was a toughie, simply because getting the right incantation to pass into FFmpeg required a lot of fannying about. Thankfully, in the end, it proved quite easy…
1. Get the total length and size of the video:
ffmpeg -i (source file) 2>&1
This will actually be an ffmpeg failure, but it will inspect the file and give you a response that you can use, something like:
Seems stream 0 codec frame rate differs from container frame rate: 1000.00 (1000/1) -> 25.00 (25/1) Input #0, flv, from 'feedback.flv': Duration: 00:07:49.9, start: 0.000000, bitrate: 96 kb/s Stream #0.0: Video: vp6f, yuv420p, 400x225, 25.00 fps(r) Stream #0.1: Audio: mp3, 44100 Hz, stereo, 96 kb/s Must supply at least one output file
It’s the duration that matters here. A little bit of regex goodness and a lot of fannying about, and you can get a string representing a third of the duration (in this case, about 2 and a half minutes, so 00:02:30.0). You’ll also wanna grab the size out of it, in this case 400×225. You’ll need to sanitise your size to be a power of two
2. Output the preview image as a jpg
ffmpeg -i (source file) -vcodec mjpeg -an -ss (third of duration) -t 00:00:00.1 -r 1 -y -an -f rawvideo -s (size) (output location) 2>&1
or, for example:
ffmpeg -i feedback.flv -vcodec mjpeg -an -ss 00:02:30.0 -t 00:00:00.1 -r 1 -y -an -f rawvideo -s 400x224 feedback.jpg 2>&1
Alternatively (and with a lot less hassle/regexing) you can just take the first frame of the video:
ffmpeg -i (source file) -vcodec mjpeg -vframes 1 -an -f rawvideo -s (size) (output location) 2>&1
Basically FFmpeg will do anything you want, providing you can coax it…