September 14, 2007

Tagsonomy – a taxonomical keyword searching tool

Follow-up to Build your own tagxonomy tool from Transversality - Robert O'Toole

The idea of a taxonomical keyword searching tool has developed further. There is clear demand for it as part of projects in History, English, the Library, the Language Centre and Renaissance Studies. Here is a more formal specification of how it could work.
A taxonomy is a hierarchical structure of categorisations, with a progression to more detailed categories towards the leaves. The taxonomy defines the set of categories that may be applied to resources. Taxonomies are common in academia. Sometimes they are formal and explicit mechanisms. Often they are informal aids to research and academic communications.

Many web publishing systems allow content to be “tagged” with keywords. For example, a Sitebuilder page may be given one or more tags. Similarly a blog entry is classified using tags. Social bookmarking systems like del.icio.us use keyword tagging. Bibliographical databases also commonly use taxonomies of keywords. Most of these systems can be searched using keywords. Many of these search interfaces can be accessed programmatically.

Most keyword tagging and searching systems, such as that used in Sitebuilder, are not taxonomical. There is no taxonomical structure to which an author or researcher can refer when tagging resources or searching for resources.

I have identified a set of real use cases in which a taxonomical approach to keyword searching would be beneficial. These cases also imply that resources are tagged according to the keywords available in the taxonomy.

Here is a description of how the tool could work:

Searching

The search interface contains a tree structure representing the taxonomy of keywords.
The tree structure can be explored, drilling down through its branches.
Individual keywords, or whole branches of keywords can be selected.
There is also a text box allowing the user to type in keywords (with autosuggest giving options as they type).
As keywords are selected, they are listed in a text area showing all currently selected search terms.
Once all of the required keywords have been chosen, a search is done, returning a list of all matching resources.
The search could be ordered in several different ways, for example with resources that have a higher match coming at the top of the list.

Here is an illustration of how the interface might work (it doesn’t yet do the search):

Configuring the taxonomy

The taxonomy could be stored as an xml file on the web (for example in the same location as the search tool).
The xml file could be hand coded, or the search tool could provide an interface for creating a taxonomy file and editing its tags.
The taxonomy file to be used could be specified by the author of the web page on which it appears.
The end user could be allowed to choose a taxonomy file, it could be possible to search for taxonomy files in Sitebuilder.

Configuring the sources

The search tool will need to know about the web application in which resources are stored (for example Sitebuilder).
It will need a method for searching each web application (how to build the search url).
One or more sources could be specified by the author of the page on which the tool is deployed – it could be set to search Sitebuilder, Blogs, etc.
A version of the tool could be made available allowing the end user to choose sources.

Saving and sharing searches

Searches could be saved onto the end users local machine (Shared Object in Flash, filesystem in AIR).
Searches could also be shared with other users.

Using the taxonomy to tag resources

The search tool could be used to assist in tagging resources using a selected taxonomy.
For example, a user selects a series of keywords, and the tool displays the text to paste into the keyword tag field of the Sitebuilder page properties.

Tasks

  1. find out how to save an xml file from Flash into Sitebuilder.

- 6 comments by 4 or more people Not publicly viewable

[Skip to the latest comment]
  1. Chris May

    Why would you want to use XML to store the hierarchy? Wouldn’t it be easier for end users to just make an indented text file – say, something like

    Food
     Chips
     Cake
      Fruit Cake
      Sponge Cake
     Beer
      Lager
      Stout

    This would be faster to parse, and end-users could edit it without needing any special tools (though you could always make a tool, if you wanted).

    You can save files into Sitebuilder from anything that can talk HTTPS, either by automating the “upload file” page, or (much simpler) by using an Atom media upload:

    curl -k -v -u your_usercode --data-binary @datafile.txt -H 'Slug: Firefox_wallpaper.png' "https://sitebuilder.warwick.ac.uk/sitebuilder2/edit/atom/file.htm?page=/services/its/elab/somewhere&forcebasic=true"

    (That example uses HTTP Basic Auth and requires you to enter your password, but if you’ve got the sign-on cookies from the user’s session you can just as easily use single-sign-on.

    14 Sep 2007, 17:26

  2. David Davies

    I can see the approach you’re taking but I have a couple of thoughts about this, especially in relation to separating the activities of tagging content from retrieving tagged content:

    1. Why have users taken so apparently well to informal tagging as a way of classifying material vs taxonomic classification? Simplistically maybe it’s because of either a) life’s too short to select keywords from a taxonomy either because it requires too much thought (tagging is quick in comparison) or the interface presented to facilitate using a taxonomy is poor, or the taxonomy doesn’t have the required granularity or b) there is no taxonomy available from which to select keywords.

    One of the key features of informal tagging is as you’ve described, there is no explicit taxonomic relation between tags. I say explicit because implicitly there is, at least in the mind of the person tagging – they are using their own internal classification system, something that’s meaningful to them, and possible others in a community of practice.

    There are very many formal classification systems or taxonomies to select from, so there can be no argument that there aren’t good classification systems to use. Yet generally they are not used. JORUM uses a number of classification systems but I gather from speaking to them they are having trouble getting contributors to use them, and are instead looking at an approach to tagging instead. I can foresee a potential paradox in an approach to formalise and informal tag cloud into a taxonomy. Accepting for a moment that someone actually can make a hierarchical taxonomy from a tag cloud, would the result then simply be another formal classification system with the same inherent disincentives to use it when tagging content? I like the fact that del.icio.us has not attempted to create taxonomies out of tag clouds and I’m sure it’s for a good reason.

    So the problem from a classification point of view is that by taking a tagging approach there will never be a formal taxonomy.

    2. Creating a tool that takes a taxonomic approach to searching for tagged materials requires that there is a taxonomy rather than a flat list of keywords. I’m not sure how you’d overcome the lack of a formal hierarchy in your taxonomic tool Rob when the tag cloud is informal. There’s probably little doubt that from a user’s perspective of retrieving tagged or classified content, a formal taxonomy improves the completeness of results. But there’s the rub, you don’t have a taxonomy. Again, using the del.icio.us example, auto-completion greatly speeds up storage and retrieval and I guess that’s one of the reasons they’ve stuck with that approach. Word stemming is another potentially useful approach but as with a taxonomy someone will have had to create the stemming dictionary. Instead I prefer del.icio.us’ use of tag bundles.

    If I misunderstood your project and the point is that your users will agree a taxonomic classification then there’s no problem, providing people stick to it when classifying materials. Which of course implies that they won’t be using informal tags…

    15 Sep 2007, 09:07

  3. Robert O'Toole

    Thanks for that David. In the cases that I am dealing with, the authors already have their taxonomies. In fact in some of the cases, they have very large numbers of documents tagged with those taxonomies. They just need a better way of presenting the taxonomy within a search tool. I have a non-hierarchical tool already in use, but they really want a hierarchical tool.

    The search tool might prove to be more widely useful, or maybe not. Perhaps these cases will demonstrate the usefulness of taxonomies. But in these cases it fits nicely.

    17 Sep 2007, 08:12

  4. John Dale

    I’m not sure that I’ve understood what kind of use cases would benefit from a hierarchical taxonomy. In the example widget you included the hierarchy is pretty trivial, and adds little value to a simple flat list (which of course is already possible). Can you explain what the actual taxonomies in the cases you’re dealing with would be?

    17 Sep 2007, 15:36

  5. Robert O'Toole

    Hierarchy is important, although not necessarily the most important aspect of this. In each case, it has been identified as being beneficial. Note that in all of these cases the bulk of the taxonomy has been defined up front, and then applied to resources. In the case of History, it was the product of a process led by the Head of Department (Professor Margot Finn).

    Here’s the CAS Images taxonomy.

    And a full Renaissance Studies taxonomy (the one in the example above is just part of this). In this case they would like to be able to make it more hierarchical. The same taxonomy will be used for bibliographies of several different authors.

    And here is the simple taxonomy used for classifiying Sitebuilder pages in History.

    There are also other similar cases in English, History (a quite sophisticated bibliographical taxonomy), and the languages, not yet fully developed.

    17 Sep 2007, 17:04

  6. Robert O'Toole

    Furthermore, consider the current CAS Images taxonomy. The People node just contains a long flat list of people. Would it be better in this scase to divide them up into further sub-categories? Or perhaps a second top level category (painters, writers etc).

    17 Sep 2007, 17:15


Add a comment

You are not allowed to comment on this entry as it has restricted commenting permissions.