All entries for January 2009
January 27, 2009
Repositories are good because they save money.
Writing about web page http://www.jisc.ac.uk/publications/publications/economicpublishingmodelsfinalreport.aspx
A report has just been released, describing research on publishing models for journal articles, comparing the costs of our traditional subscription based system with open access funded by authors' fees and open access publishing via online repositories. In theory, it ought to be significantly cheaper for UKHEs if we go down the open access route, according to the report.
The SherpaROMEO website lists the costs of journals which allow authors to pay a fee to publish under open access, and the costs seem to be very high, in some instances over £2000 per article: http://www.sherpa.ac.uk/romeo/PaidOA.html
I haven't read the report in full, but I'm pretty sure that the costs it predicts and the savings that might be possible would only work if we went for open access in a wholesale way, as a whole sector, rather than piecemeal situation we have at the moment, which is reflected in Sherpa's list.
So, repositories could be a way to save a huge fortune. But they're not doing it yet, and we've got a long way to go in terms of engaging the authors and showing that it is worth their while to deposit on open access, before we can make that saving a reality.
January 19, 2009
Reports and Statistics I need from WRAP
1) 6 monthly report of data changes on WRAP to show which records have been altered since the date they were added into the live repository. (For sharing data with Warwick’s Research Support Services.) Not currently possible.
2) A graph to show how the pattern of new record creation/repository growth has gone, over the last x months/year. I can get this from ROAR. (http://www.roar.org)
3) Monthly report of all records added since last month, with data in specific formats to suit RSS’ InfoEd system (and/or other departments at Warwick). Key issues with sharing with RSS: need to store staff number (or key to call up staff number) for each Warwick staff member amongst the authors, and lack of security for such data in WRAP. Also, page range is currently exported as, eg 51-72, whereas RSS need it as "start page 51, end page 72". More investigation into the technical possibilities for data sharing needs to be done. It may be significant that InfoEd attaches information to a person’s profile, relating to publications (& other activities). Whereas WRAP attaches information about authors to a record describing a publication.
4) Statistics on visitors to WRAP and what they are clicking on, where they come from, etc. Google Analytics does this well enough for me: I can see where they’re clicking, what keywords brought them to WRAP, to where in WRAP, and who their network provider is, (which is a clue to some academic interest, and also helps to identify internal interest). I can see what countries visitors are in, and what cities, etc. I can do all this at a per paper level, but I have to know which paper(s) I want to look at.
5) To look at features like those listed above, for a set of data (eg all by one author, or all for a particular department). Departments and authors may well want to know who is looking at their work in WRAP. I can look at particular paper, but not at a set: I would have to collate reports for each paper, in some way. IRStats should be able to do this, if we were to install it successfully on WRAP… although it may require some change in our workflow. At the moment, most papers are added to WRAP by our very own administrator, since authors use a separate (& simple) submission form. Authors do not upload data about their own publications and therefore the papers are not attached to separate accounts in WRAP. I believe that IRStats would need separate accounts to be used for each author’s papers, in order to produce reports on all of an author’s papers. Our administrator could create accounts in authors’ names and then log in as the author before creating the record… but that all presupposes that we can get IRStats to work, and that it does work as I expect.
6) It would also be better for me (and for those interested in the data) if I did not have to look up statistics such as those already provided by GA myself, but if those interested could just look them up, on demand. In theory, I can grant access to the GA reports to anyone with a Google account… although this requires some intervention from me. And Google Analytics is great for those who know how to use it, but I can see academics being put off learning how to use it. There are barriers to authors getting data about all the wonderful good WRAP is doing in bringing an audience to their work!
7) GA is great for looking at the site and our html files, but tells us nothing about pdf/word document downloads. The difference between “the most downloaded document” and “the most looked at record” could be very important indeed, if any correlation with citations is to be explored. Also, I can tell from GA if someone has followed the link to the DOI on a particular record. I can’t tell whether anyone has followed the link from within the pdf file to the full text, published version, though.
8) What are people searching for from the repository's own search form: which fields do they search by? GA can only tell me whether people click through from our Advanced form to the Simple search one, and indeed whether people follow the link to search the repository in the first place from our home page. Thus far, there aren’t so many people searching, and we expect that people will not search through our form but on search engines like Google, with keywords which GA does record and tell us, so this isn’t particularly crucial.
I’m also not sure of how to make GA discount visits from members of the WRAP team… but I expect that’s something I ought to look into.
I’ve learnt a lot about what GA can tell me about WRAP and its visitors. I find it fascinating to delve in every now and again and see what brings people to us. It can be used as a website management tool, to see how to make important links more visible and hence more clicked upon. It can be used in advocacy to authors, explaining why they might want to put work into WRAP, showing that others do look at it.
What I would like to do is to compare our statistics with those of other repositories, at other institutions. It’s not easy to find other repositories that are comparable with ours in their features (full text, mediated metadata, voluntary deposit), never mind such repositories at comparable institutions. But it is possible to find those who are much further ahead of us, and it would be good to see where we might be heading, in terms of visitor profiles, whether most visitors came from search engines (as now) or direct links, etc. I would like to know whether the most popular content in others’ repositories is journal articles or unpublished content, and whether there is a particular subject that gets heavier attention than others. So, I would like to be sure that, whatever statistics package we use for WRAP, it is one that would enable us to compare our repository with others. There isn’t such a package or method of using a combination of statistics packages, yet.
January 08, 2009
Backwards in time
The New Year begins, and I'm pleased to say that we're getting more deposits trickling in. I'm currently struggling with an issue that relates to the identity of WRAP as a collection, as well as our collection development policy.
How do we define “Warwick research”? Many of our researchers have come from or go to other institutions. Their citations and profiles will be raised by work they may have done in the past, so should WRAP also host work written from earlier in their career? Future publications will no doubt include expertise developed whilst the authors were at Warwick.
A policy of including work written even whilst not employed at Warwick would enable us to add prestigious articles to WRAP that might add to the “Google juice” of the entire collection. It would also enable us to go further in meeting the wishes of authors to have all their work presented in one place.
Authors need to be clear about what they can deposit to WRAP. At present, we are accepting whatever they submit to us, and don’t check dates of employment against submitted items. We have checked such dates when writing to authors to invite submission, however, because such personal invitations have taken us time to write, especially in checking the copyright agreements of each of the authors’ articles, and we have needed to apply a limit to our checks before inviting deposits, because our invitations have not all led to deposits. Another limit we applied in our invitations was to only look at the five most recently published articles, for the same reasons.
If every author was to send us a version of everything they have ever published, we would be inundated with work, but that is pretty unlikely to happen! The value of making early published works available through WRAP is not so easy for WRAP staff to ascertain: some early articles will valuable to the academic community and therefore to WRAP, others less so. There will be institutional repositories at those other institutions who might already make the work available on open access, so a WRAP deposit would be of limited value to the academic community. We can therefore only rely on academics’ own decisions about what to deposit to us, and hope to keep up with the total level of deposit.
What I need to know, in essence, is: Is it important for Warwick to be able to identify the outputs of research it has hosted? Should we limit WRAP’s collection according to employment dates?
It may be possible to add a metadata field “Written whilst the author was employed at Warwick? – Yes/No”. (NB The question asked is about the writing rather than the publication - is that the right criterion?) However, adding metadata fields to WRAP’s schema is not easy, and the checks would take extra processing time, so it is crucial to know the importance of such information. If it is not important to separate Warwick research from research by Warwick people, then we might also ask for articles written by high profile authors who have recently arrived at Warwick.
Policy thus far: For the time being, we ask all authors to send us everything they can, dating back as far as they like. It may become necessary to prioritise processing of more recently published articles if we become inundated with deposits.
We ought not to go on building the collection much further without a clear policy about what we intend the collection to represent. It feels to me like it is evolving away from an original intention to present Warwick research, and I wonder whether I should attempt to reign it in, or go with the flow and make the most of the advantages of a wider collection policy. Something for our steering group, I feel...