All 2 entries tagged Eprints
No other Warwick Blogs use the tag Eprints on entries | View entries tagged Eprints at Technorati | There are no images tagged Eprints on this blog
October 07, 2008
For a long time now, our Data Services manager Stuart Hunt has been saying that E-prints and SWAP don't fit together. Stuart should know: he's a metadata expert and he made all the changes to our E-prints configuration files to try to make it capable of hosting SWAP metadata records for us. I've just been talking to Stuart to get my head around some of the problems we're experiencing, trying to marry SWAP and E-prints.
SWAP expects a hierarchy and E-prints is flat. There's this thing called FRBR (pronounced "ferber" by those in the know!) that SWAP is supposed to follow. That stands for "Functional Requirements for Bibliographic Records", I believe. But the point of it is that SWAP refers to a "work" that is the concept from which all "expressions" (versions) are derived. SWAP metadata describes the relationships between such versions. When trying to do this in E-prints, the conceptual "work" is lost, as far as recording and presenting data is concerned, because there is no way to describe a "work". Each item in WRAP has its own metadata record, and if there are two versions of the same work, e.g. a conference paper and a later journal article of the same title and about the same topic, those items will each have their own metadata record, which would describe the relationship between those items. But there is no "work" actually described in WRAP, because E-prints simply isn't structured to describe it.
This is why we are not providing our academics with lists of works they have published, when WRAP is searched by their name (eg http://wrap.warwick.ac.uk/cgi/saved_search?savedsearchid=3). Our metadata records describe the items they have sent us and not the published works. They have sent us, or we have harvested, early versions of their works, because of copyright restrictions on the final published journal articles. I had thought that we would describe the published works and link the unpublished items to those descriptions, because that was my original understanding of SWAP, but of course that would be bad cataloguing practice if the version we have is a discussion paper. The metadata record must describe the actual item. The "work" is not described in WRAP because E-prints does not support such a hierarchical structure, but if it were, that would meet our academics' needs better than our current set up.
I'm concerned that the latest version of Eprints (we have just upgraded to 3.0.5, whilst 3.1 is now released) diverges from our SWAP model further. Eprints developers talk about the "e-print" which is the metadata record, as far as I'm concerned, and the "document". In their model, several "documents" might be attached to one "e-print". 3.1 allows more information to be attached to the document itself. So, if we were to start again with a SWAP implementation and Eprints, would we want to edit the "e-print" record to contain minimal metadata, and add lots of lovely rich SWAP elements to the "document" metadata? Would it even be possible to enrich the "document" metadata so much? And Stuart tells me that this still wouldn't meet all the hierarchical description that SWAP actually provides for. I lost him at that point, I confess... time to read up some more about SWAP. But even if it would still not be a complete solution, would it be one that would meet some of SWAP's features, just like our current implementation, whilst also meeting the need of the academic to see all their published work in one list?
Does it matter whether we input or store SWAP, if what is coming out of the repository is SWAP? That is the approach that the Eprints developers are taking, as far as I can tell, since they wrote a SWAP export plugin. But our problem with that is that if you don't describe all the relationships at the input stage, you can neither store nor export those relationships, so all that you end up with is the same metadata re-packaged to fit a different application profile. The point of using SWAP for WRAP is the completeness and richness that it allows us to describe about the items in WRAP, because that demonstrates quality, which is important to the University of Warwick's image.
SWAP for WRAP is not all about others' harvesting our data or what we get out of the repository from a technical point of view. It is about what we can say about the articles that our academics have written. There is also an element of future proofing in our motivation, in terms of how we might be able to use our metadata to link between citations in the future, for example, and to present WRAP records alongside records from other data sources such as the library catalogue. The catalogue, incidentally, describes monographs, whilst WRAP does not, so a search with results from both sources would provide academics with a more complete record of everything they have published...
It will be interesting to see which SWAP elements we're actually using at the record creation stage. Because although we have SWAP metadata elements in our workflow, we often don't have enough information to create records that are any richer than an ordinary E-prints metadata record. Or at least that is my impression so far, and I may be wrong. To describe a relationship between two items, you need to know about both those items. Most authors are only prepared to supply us with just one version of their work, usually the most up to date one that they can, if not the final version itself. Often they're vague about what version they have supplied to us, as well. That may change in the future, but for now, that's the case.
Also, our workflow in E-prints is pretty long and off-putting for those not used to ignoring the irrelevant elements for each item. With expert metadata librarians (cataloguers) creating our records, that's fine: they get to know the schema well, and which elements they will want to use to describe each item. But if you're following an author self-archiving model, that's not fine. They want minimum key strokes and simple processes for depositing, although of course we will want to prompt them to describe everything they know, because it is only they who know what version they are supplying. Is SWAP preventing us from following an author self-archiving model? Not really, it's all about how we present SWAP to the authors, and E-prints doesn't make it look pretty at the moment.
The question of how long it takes to edit and polish author-created records so that they meet our quality requirements is another matter entirely. The problem with aiming for high quality records is that they do take time to create, and if authors are self-archiving, they will want to see their items appearing live in the repository as soon as possible. So there may well be a tension between SWAP and author self archiving from that perspective.
The matter of presentation is true for the metadata record view and search results views in E-prints as much as it is for the deposit workflows. It's not about SWAP itself and it's not about E-prints itself. It's about how we get them to work together, and although we have an example of that in WRAP as it is at present, I wouldn't say that we necessarily have the best possible solution for Warwick's needs. Just that we did the best possible at the time with the resources that we had, and E-prints is moving on. So the lessons that our funders JISC might learn from WRAP, as regards SWAP and Eprints will most likely not apply to all future implementations. And that is what happens when you're pioneering...
June 12, 2008
Writing about web page http://www.eprints.org/software/training/
I've been learning the hard way that the most useful documentation on Eprints is their training material, which I've linked to. We've had various struggles with trying to get our configuration working, and I think that's largely due to a lack of dedicated technical expertise on the project. We have our systems admin support from IT Services, but for all their help, IT Services have made it clear that they are not responsible for the configuration. The library's own Data Services manager has done much to edit the Eprints files to implement our metadata schemas and more, and I'm learning more and more all the time about the gap in between that requires knowledge of the way Eprints works in order to make it work!
We're not making that many changes beyond a bit of branding and the metadata set up, but we've struggled from not having finished the installation correctly (and being unaware that that was the case) and so finding that it did not index our full text documents, and from not understanding where all the places are that our metadata changes will affect.
Tim Miles-Board at Southampton has been extremely helpful in finding out the places we went wrong. You can't beat the help of an expert! He pointed out the training materials that I've linked to, which do seem to be very useful. It's not that I haven't been reading Eprints documentation on the web for ages. It's just that I'm finally beginning to understand it better, and to recognise where the more helpful stuff actually is :-)