September 16, 2004

Semantic querying and the future of search engines

Follow-up to Emergent semantics (from Deleuze) and the semantic web from Transversality - Robert O'Toole

Prof. Wendy Hall of Southampton made this important distinction between two types of web query:

  • A query that simply aims to provide a specific piece of information, an answer to a closed question. She calls this an 'uh…uh' query.
  • A query that aims to go from a semantic map of a document to other documents with a similar map, the aim being that the second document will expand and illuminate the first. This she calls an "er…er" query.

People typically do the second more open type of query by identifying a set of keywords from the first document, and typing them all into a plain google search. In creating their own keywords list they are creating a semantic map of the document, a map that makes sense to them. Perhaps a useful development in browser and search engine design would be to add functionality that makes this easier. A user could select words and phrases from the source document to create the map, and then do a search for related maps. Perhaps these maps could be represented diagramatically?

As you can see, i'm just starting to get the relationship between Deleuze's work on semantics and creativity, the work in AI that I did at Sussex, current work on the semantic web, and e-learning.


- 2 comments by 2 or more people Not publicly viewable

  1. Chris May

    Back when I used to work for CGEY I did quite a bit of work with a search engine called Autonomy which did the kind of thing you describe - you would give it a paragraph, or a whole document, and it would search for other documents about the same ideas. It worked fantasitically well on relatively constrained sets of documents - one of the demos we used quite a lot was the entire copy of the Times in electronic form, for the last 10 years or so – and if you gave it one news story it would pull back an amazing relevant set of 'see alsos' . But with a broader set of documents (such as the results of crawling a cross-section of the web) it was much less impressive – it seemed to be unable to build a coherent map of the corpus.

    Mind, this was about 4–5 years ago, and I dare say they've improved it since then. It used to be fantastically expensive (half-a-million pounds or so) too. There are other products in the same space – Hummingbird used to have one called Fusion, although I can't find a mention of it now. None of them really seem to have taken off.

    The problem, I think, is twofold: the process of creating your own keywords list is remarkably complex – as you identify, it would probably be more effective to make it easier for users to create the sematic map for a document and push that into a search engine; and secondly google is 'good enough' for 99% of searches, so the pressure to innovate is low.

    16 Sep 2004, 09:11

  2. Robert O'Toole

    Yes, the web is driven by the fact that it is 'good enough'. I think Tim Berners-Lee said that the turning pointg with hyperlinking was when he convinced people that it was OK for some of the links to fail, so long as there were plenty of other links to follow. The interesting question is can we provide tools that help people to write better queries and hence save time?

    Autonomy was mentioned as well. It's a neural net based tool. The BBC uses it. Of course like all neural net systems it requires lots of training.

    16 Sep 2004, 17:08


Add a comment

You are not allowed to comment on this entry as it has restricted commenting permissions.