All entries for Tuesday 26 October 2010
October 26, 2010
Writing about web page http://www.iskouk.org/events/linked_data_sep2010.htm
**Finally getting round to making this live after having to put off the editing for OAW and the start of term!**
This event, hosted by UCL, was one that I had been looking forward to for some time. Whether or not linked data is the 'next big thing' in web technology, and one that has to potential to solve a number of thorny problems for the administrators and maintainers of web resources in the face of increasingly complex demands, is a question that only time will answer. However as it stands at present linked data has enormous potential as a service and as a tool and I wanted to find out more before I started getting any awkward questions from stakeholders!
The sessions on the day were a nice mix of technical and non-technical and my biggest fear of being lost before the end of the keynote was mercifully misplaced. Also very usefully the presenters not only spoke about the technology and standards underpinning the creation of linked data but also presented us with a number of real world example of things that linked data can be used to achieve. These kinds of presentations are the ones I'm always on the lookout for any new development because it's always easier to say to someone "linked data can do all these kinds of things" when you have some way to show the power of linked data directly.
Prof Nigel Shadbolt of the University of Southampton gave the morning keynote, focusing on the current policy of the government and the ways in which this might create a 'tipping point' for the ideas behind the semantic web. Here we saw that not only has the previous government made a commitment to releasing government data through the data.gov.uk portal (now at 4000 datasets and counting!) but that this commitment has survived the change in leadership. This release of public data has allowed the users of public services to hold the providers to account. It has also opened up a number of ideas about streamlining data collection, with the expected issues of trust and privacy raised. A lot of the applications based on this information have at the heart of them, place as the central piece of information, and allows the ‘crowd-sourcing’ out of errors! Prof Shadbolt also introduced the idea of a “star rating” of data publishing, as a measure of quality ranging from the ‘1 star’ data (better that it’s on the web than not) to the ‘5 star’ data (with full linked data coding). A national infrastructure is being built at the moment that requires 5 star data, as a way for departments to interact, to allow for ‘back linking’ and well as ‘forward linking’, do investigate the levels of relationships that exist between information that might not have been obvious before. And if national linked data is possible could we extend it to a global network of linked data? And if we can have global linked data can we meaningfully compare the UK with other countries? Are the ontologies we use compatible?
Antoine Isaac from the Vrije Universiteit Amsterdam spoke next about SKOS (Simple Knowledge Organisation System) and linked data. SKOS has been designed to remove ambiguity from the representation of terminology in RDF as possible but in a way that is easier to use than a formal, rigid ontology like OWL. SKOS has a number of basic features; concepts, lexical properties, semantic relations and documentation. Taking the example presented on the day: Cats is a SKOS ‘concept’ around which is built relationships and multilingual labels. For example ‘Cats’ has an rdf:type of skos:concept and a skos:prefLabel of ‘cats’, ‘chats’ and ‘коты’ in English, French and Russian respectively. It also has a skos:broader term of mammals and a skos:related term of ‘wildcats’. However any concept can only have one skos:prefLabel in any language e.g. you can’t use animals and beasts. Data coded in SKOS RDF can be used to infer relationships between things that might not explicitly be stated but this process needs monitoring e.g. for the broader/narrower functions you only need to code in one direction and the system assumes the reverse is true. Overall an interesting project that is a lot easier to use than other ontologies and can be used to create links between other ontologies thus making them more accessible.
Richard Wallis for Talis Information took us on a ‘Linked Data Journey’ next. This took us on a potted history of the development of the semantic web from the beginnings of the web to the present day. Along the way we stopped briefly at important developments such as the US’s data.gov and the UK’s equivalent data.gov.uk, the forthcoming legislation.gov.uk as well as the standards used to manage the data, SPARQL, RDF, SKOS and others. Richard mentioned the developments brought about by the data.gov.uk initiative that has started to involve government departments sharing identifiers to allow the data to be more easily retrieved and used. The overall message to people sitting on the fence about linked data was to get the data on the web as a first priority, the linked data will come afterwards.
Steve Dale spoke briefly on the Knowledge Hub, a social media project to aggregate the ‘communities of practice’ found in local government and the help staff to use those communities of practice to improve the working of government departments. He made the point that there is lots of data available and that it’s not always easily accessible, certainly not in a machine readable way. There is an increasing need to compare your performance with others and to identify the best places to find out the best and most appropriate measures on which to compare yourself. The Knowledge Hub is designed with the idea that communities + data intelligence = improvement and Innovation. Behind the scenes the system works on principles similar to those used by Amazon to personalise your recommendations.
The afternoon keynote was given by Prof Martin Hepp on the GoodRelations Ontology had a very practical perspective. This project has a very practical, commercial purpose which is very important as it is only going to be with commercial engagement that some of these principles are ever going to take off properly. In a similar way to the way the web itself exploded during the dot.com boom. The GoodRelations Ontology is based on leveraging linked data to start the process of ‘matchmaking’ in market economies. A good business lives and dies on its suppliers. Transaction costs are now estimated to account for at least 50% of the GDP of a modern market economy so the amount of money to be saved by making it easier and quicker for a company to find the best-fit is considerable. The root of the problem is that the WWW is the largest data shredder in history, taking structured data as it is added to the web and then removing the entire context. This now, unstructured data cannot be reassembled into structured data. In this project links are important but not the whole story, need to record data semantics, hold data in a structured manner, group links by type, and link to information within documents. The GoodRelations project has spent 8-10 years trying to create a global schema for commerce data on the web and now feels that it is getting close and with 16% of all current RDF triples having a basis in the GoodRelations project they might be right!
Andy Powell of Eduserve spoke about the work of the DCMI (Dublin Core Metadata Initiative) to align itself with the semantic web. This talk focussed particularly on the challenges of using linked data in a practical manner and the fact that linked data is not the only way for the web to develop. I short history of the Dublin Core metadata schema was given and an acceptance of the fact that some of the elements were very “fuzzy buckets” for people to put things into. Dublin Core is seen as having a bit of a ‘flat-world’ modelling approach, it can only deal with one thing at a time, there has been very little abstraction of the model since it was first proposed it was just moved from HTML to XML to RDF. If linked data is the future then linked data must be successful on the web and this means that RDF has to be successful and it hasn’t been so far. DC can be seen as providing a useful vocabulary of the core ‘classes’ and ‘properties’ that can be used in a linked data environment.
John Goodwin’s demonstrations on the use of linked data within the Ordnance Survey data were fascinating and raised some interesting questions. For example when you say Essex do you mean the town or the county and does the ‘machine’ you are using know that? What happens to geographic data when the boundaries of local government data change? The temporal aspect of geographic data is a continuing problem. Linked data within the BBC website is allowing news stories to be grouped geographically, also the problems of harmonising data across a number of formats. The final problem mentioned was that of vernacular geography, the Ordnance Survey has done the ‘real’ geography but the emergency services are more interested in knowing that people say when they say….
In the next talk we were introduced to PoolParty, thesaurus management software, by Andreas Blumauer. The idea behind PoolParty was to give people a tool to allow them to publish any knowledge as semantic linked data. The semantic web could be either the high level modelling of OWL or the lower level of the SKOS information. The knowledge people can publish can either be open or closed access enterprise information and uses the SKOS coding as a standard to take advantage of the open data available. Functions available include auto-complete for terms, text analysis, linked data mapping, drag-and-drop term adding and advanced reporting. The most useful way to use the system, I thought, was the ability to create a bespoke thesaurus and map it to existing schemas, something as a cataloguer I often wished I could do.
The final presentation of the day was from Bernard Vatant from Mondeca discussing the backend products offered by the company to align semantic web technologies with the real world needs of the people using them. He presented an interesting view of the web:
- Internet (1970’s) = network of identified, connected and addressable computers.
- Web 1.0 (ca. 1990) = network of identified, connected and addressable resources.
- Semantic web (ca. 2010) = network of identified, connected and addressable representations.
His view of the semantic web is that we needed an extra level than is currently being offered by thesaurus products, that of context. In the current process terms denote concepts and can be represented by things and it is this coordination of terms, concepts and things that creates the context. Bernard Vatant described this intersection as the ‘semiotic triangle’. This intersection of linguistics, cognition and technology is one of the areas that excite me the most about semantic web technology.
The day was rounded off with a full panel discussion that covered some very big questions: for example ‘can you really define a universal concept of anything?’ and ‘is linked data really the future?’ at the speakers of the day had a valiant attempt at answering. Some comments I particularly liked (paraphrased in most cases): ‘linked data allows you to circumvent many problems by allowing you to link vocabularies to each other’; ‘data is the new raw material’; ‘data is free/open, roll out is free, sell the services built over the system’; ‘the internet is already changing the traditional business models, this just takes it a little further; ‘take up is still determined on the discipline of the author’. All in all a fascinating day that may (or may not depending on who you believe) have given a sneak peek of the future.