Highlights of Repository Fringe 2010
Writing about web page http://www.repositoryfringe.org/
I'm just back from a trip to gloriously sunny Scotland (which was obviously breaking out the good weather for the festival) and the 2010 Repository Fringe Event.
Hosted at the National E-Science Centre (NESC), in the heart of Edinburgh the sessions began with Sheila Cannell (Director of Library Services University of Edinburgh) asking us to consider fireworks. She invited us to join in with the firework display at the end of the Edinburgh festival, which in her works were 'open fireworks' (paid for by a combination of public money and the 'subscriptions' of a few), and use thinking that would light up the sky. This nicely set up the tone for the next couple of days.
The keynote by Tony Hirst (Open University) followed where he presented us with an outsider's view of repositories on the theme of openness. The central theme of the talk was "content as data" and urged us to consider new ways to store and present the information in our repositories to our users. New ways to manipulate the data and new ways to present the data were central as well as information we might want to start recording but currently aren't doing so, such as 'open queries' showing users exactly how the charts in an article were generated from the underlying data. In a nice touch Dr. Hirst finished with a revisit to S.R. Ranganathan's 'Five Laws of Library Science' as he encouraged us to keep our repositories as living organisms rather than as a place research is dumped and forgotten about.
The following session by Herbert Van de Sompel (Los Alamos National Laboratory) introduced us to the Memento Project a way to provide web users with time travel! A clever way to allow your web browser to access the web as it would have been on a certain date using the same uri that you have for the current version and with as much of the functionality the page had originally as possible. This is one thing I'm looking forward to experimenting with, if you use Firefox the link above will lead you to the gadget to try it out for yourself!
Repo Fringe was my first experience of the Pecha Kucha style of presentations (for those not in the know, 20 slides, 20 seconds a slide, autorun for 6mins 40 per presentation) and the looked just a nerve wracking as you might expect! On the first day we had an update on the Open Access Repository Junction, beauty and the Jorum repository, Glasgow's Enlighten repository through the metaphor of cake, the problem of dataset identity, Research data management and the Incremental project and finally the Edina Addressing History project. I will admit I was hard pressed to choose my favourite when the time came to vote! I was also impressed at how many ways there are to approach these sessions and how much information you can pack into just under seven minutes.
The EPrints team reinforced their reputation for giving some of the more entertaining presentations that any conference is likely to see with their live demo of EPrints 3.3 and the new Bazaar functionality. A very interesting look at what is to come in terms of the software many of us in the audience is using!
The first round table of the conference for me was on the thorny issue of the relationship between an institutions' CRIS (Current Research Information System) and its institutional repository (IR). The talk was sparked by the work done on the CRISPool project which was aimed at creating a cross-institutional CRIS to cover the Scottish University Physics Alliance (SUPA) research group. The discussion invited us to consider whether the distinction is a false one or whether the issue is to consider what functionality best fits where in the system. Is it right that IR's exist when we could all have CRIS's? Could we create a centralised, national IR and all our CRIS's harvest from there? Should we be looking to integrate CRIS functionality with IR's? What impact does the REF have on the discussion? In all we didn't come to any definite answers (not that I think that that is the purpose of round tables of this sort) but we all took away something to think about.
Day two began with Chris Awre (University of Hull) discussing hangover cures through the ages (the Romans apparently favoured deep-fried canaries) before moving on to the main meat of his presentation on the Hydra Project, a collaboration between the Universities of Hull, Virginia, Stanford and Fedora Commons. This unfunded project is aimed at providing solutions to identified common problems on the understanding that no single institution can (or needs to) create a full range of content management solutions on their own. For Hydra collaboration is the key to the success of a project with each institution providing what they can to the project. The project makes use of ruby on rails technology and the work of Project Blacklight, an open source 'next-gen discovery tool' to allow a more sophisticated search function.
The second round table of the event was focused on linking data to research articles. This is an area that we are looking to move forward into in the future and so I was fascinated to hear some of the comments and opinions from places that already had systems running. Form the responses of the attendees I was not alone in this, many institutions seem to realise that this is an important area and that the implications of a project such as this can be huge. The keyword here was always going to be linking, but linking what to what? What is a dataset? As there is a clear difference between a dataset associated with an article and a working dataset can we pull out only the data that was used in the article and storing it with the article without loosing the meaning of the data? The point was made that the cost of storage (while large) pales to the cost of curating many small things as with curation you have the cost associated with each item. We discussed the fact that with archives the expectation is that you just put things inside it and with repositories you have the added issue of people trying to reuse the data. In the current age of research funding cuts the reuse of data is going to become critical as fewer and fewer institutions are going to be able to afford to run the experiment again from scratch! The issue of trust was discussed, can we trust a conversion of a dataset for preservation? Will it have maintained all of the formulae that are inherent in the dataset? The spectre of 'ClimateGate' was raised will the availability of the data safeguard against this in the future? If we are linking to things inside of a dataset do we have the functionality to 'cite' a small part of the larger whole without making the link meaningless? All this and metadata schemeas were touched upon in a stimulating discussion that could have run a lot longer than it did. Again we came to no conclusions but everyone I spoke to afterwards had gained at least one thing that they hadn't considered before to think about!
The second round of Pecha Kucha talks were as interesting as the first and covered: The Ready for REF project looking at the XML output needed for the REF reporting, JISC RePosit working to simplify the deposit process through use of research information systems like Sympectic, more on research data management this time from the Edina team and looking particularly at the creation of training tools, the JISC CERTIS services and their approaches to open educational resources, ShareGeo and the Digimap and finally the SONEX think tank on work done by this group.
Possibly the most challenging presentation of the event was from Michael Foreman (University of Edinburgh) introducing the concept of 'Topic Models'. The concept from a paper by Blei and Lafferty (2009) about their work with articles in JSTOR allows people to create maps of related documents based on the statistical analysis of the frequency of words within the article. A lot of the meat of the statistics did stretch my understanding to the limit but anyone (and everyone in the room certainly did) could see the value to be gained from work of this variety as we search for more and more automated ways to define the content of items in our repositories and the way they relate to others.
The closing presentation from Kevin Ashley (Digital Curation Centre) gave us a round up of the presentations that had gone before it, a round up of the development of the repository world as a whole and as a way of looking forward revisited the idea of citing data. He urged us to be aware that we are "Standing on the shoulders of Giants" and also to remember that sometimes fireworks are a good way to burn a lot of money very quickly! Curation issues were raised; what to keep? How long do we keep it for? The fact that repositories have not yet had to consider throwing things away and that we may have to at some point! The concept of the value of data being unknowable was also raised, with the example being given of the data from ships logs were used three times, first to navigate, secondly to tell historians about economic and trade conditions and finally most recently to discover evidence of climate change. Again we came back to the idea of the 'data behind the graph' the information in the article that we just can't get hold of. As well as the fact that people don't always realise that data can be changing all the time, nothing is truly static.
Overall the two days in Edinburgh were packed with many interesting things but the thing I took away from it most was the fact that there is always a different way of looking at something but that you should never forget your foundations.