May 05, 2009

What should we be measuring …and why

More thoughts on repository statistics!

My basic reasons for looking at repository statistics are:

-1 Can assess and demonstrate that you are meeting aims/targets (& set such targets).
-2 Can gain interest/approval/support on the back of large numbers!
-3 Providing authors with information about who is looking at their work could motivate them to deposit.
-4 Might generate some competitive spirit!
-5 Identifying popular content might help in measuring citation impact of repository deposit.

Looking back at the basic reasons:

1) Aims and targets need to be set for the next 12 months, as we have emerged from our JISC funded project. I can only aim for things that I can measure so this becomes a circular argument! Ideally, I would like to be able to ensure that we are getting deposits of all appropriate items, across the whole University - and to know that we can handle such numbers. So what I really need to be able to do is to measure what the University's authors are actually producing.

I need to know about numbers of visitors to WRAP, and whether or not these can be boosted, in order to meet the goal of WRAP being a showcase for Warwick research.

Measuring how people get to WRAP is important, because if they all come via Google and bypass our metadata entirely, then this might cause us to review our metadata creation workflows. The value of metadata goes beyond bringing visitors to the repository, however, and that also needs to be documented.

2) Shouting about large numbers is fairly crude as a way of getting attention, so a crude measurement such as GA is probably appropriate. Having said that, the Apache logs record higher numbers, so I should be reporting those numbers rather than GA ones!

3) Providing information to authors. Well, GA is entirely inappropriate for that. I can provide some information for some authors, and that has been welcome. But the ideal scenario would be for authors to be able to access such information for themselves, whenever they want. And it really is a huge gap in the knowledge that I can share with authors, if I can't tell them about accesses to the actual pdf files. I'm not sure what authors' interest in statistics is in those repositories who do help authors to check for themselves. Authors here aren't clamouring for figures about who is accessing their work: some are pleased when I write to them with figures, but that is probably because I only write to our top content authors so I'm always spreading good news! 

Generally, authors want to know if visitors are indeed academic, which is often very difficult to tell but GA does give me some clues. Being able to tell authors a little bit about visitors to WRAP is reassuring for them, and whilst addressing their every concern is more than I can manage, not knowing about pdf file visitors is a huge gap.

Authors are also concerned about their publishers, and it would be great to be able to demonstrate that repositories like WRAP don't harm publisher interests. This would not only reassure authors, but also perhaps reassure publishers and it would make the business of populating a repository so much simpler if publishers were supportive.

4) The competitive spirit could be between individuals or departments or even institutions. It could be based upon numbers of items in the repository, or numbers of visits or all sorts of different criteria. The competitive spirit ought to be directed towards appropriate measures. Focussing on numbers of items in the repository is probably enough for now: our main goal is to grow the repository.

Some element of benchmarking against other institutions is also going to be important, when it comes to resourcing decisions. This will mean measuring how many items we have, of what type and whether of full text or metadata only. Measuring how fast the collection is growing will help us to plan our workflows accordingly, and also be useful for benchmarking.

5) Measuring impact on citation: this is something that we claim as a benefit of repository deposit. I am always very cautious to claim this only in as far as it is common sense that more readily accessible work will be read more, and that more widely read research will be cited more. Even so, departments are asking me for evidence that repository deposit will boost citations. The repository does seem to fit into departmental meetings along with departments' concerns to raise citations so it is no surprise that the two are so closely associated. Evidence of this sort of impact would be highly influential in terms of encouraging deposit, if I could find it. I believe that the problem is that, by the time a repository has had its effect, it will be one of a number of factors influencing higher citations.

What I can hope to do, is to prove that the most highly visited items in the repository become the most highly cited.  I need to know which items are most highly visited, and to look at the reasons why that might be.

Blog archive

