All entries for May 2009
May 26, 2009
Writing about web page https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=JISC-REPOSITORIES
Do open access repositories impact on the number of readers of a paper? I believe that they do: I believe that they bring more visitors to a paper through making it more accessible and visible on the web. All the activity I have been recording and analysing definitely implies an impact of some kind. But what is that impact? And is the impact for the good of the institution (who pay for it!) and of the academic community (who add content to it)?
I have tagged this post "ROI" which stands for "return-on-investment". There are many ways that an IR can be valuable to its institution, but one that is particularly important to Warwick, given the nature of our repository and concerns of our management, is a demonstration that repository deposit will raise citations. This is not an easy thing to prove...
But what made me ask the question I began with, is the impact on the academic community at large. The concerns of academic authors matter hugely to advocacy work and amongst their many concerns is one for their publishers. Academics want to be sure that their existing communication model will continue without damage, i.e. that their publishers will continue to support academic journals.
The "version of record" (i.e. the published version) is enormously important to academic researchers, and I think that is why there are relatively few visitors to WRAP reading our pdf files. It is also why I think that publishers and authors need not be concerned that open access repository deposit will destroy the existing journal publishing system.
What I need to be able to prove is that WRAP will bring more visitors to authors' work, and that it will not detract from visitors to the version of record.
A recent posting to the jisc-repositories list describes the lack of impact on inter-library loan requests, of open access IR availability. ILL is just one of the existing routes to the version of record, but this is a potentially significant piece of evidence because it suggests that repository visits are indeed extra to the existing ways in which people come across authors' work - according to the summary posted to the jisc-repositories list, anyway. I'm not sure how relevant the findings are, because the study was carried out at nine or so US Universities, and I have not read the actual publication. But it sounds promising. The publication referenced in the posting is:
Primary Research Group has published Profiles of Best Practices in Academic
Library Interlibrary Loan, ISBN # 1-57440-122-X
May 18, 2009
Paying open access publication fees
Writing about web page http://www.rin.ac.uk/openaccess-payment-fees
I've been reading the RIN publication about paying for open access publication charges. The appendix has some sensible recommendations for authors, institutions, funders and publishers. The appendix also has a very clear and helpful description of the University of Nottingham's central open access fund, which notably mentions that the first part of their institutional policy is to encourage repository deposit.
The weakness of the publication, in my view, is that it does only deal with one way to comply with funders' requirements - that of paying for open access publishing, as opposed to repository deposit. This is stated early on, but I don't know that the message about the two possible routes is reaching authors/researchers very clearly. Or senior HEI staff: repositories are not mentioned in the summary recommendation for HEIs in this booklet.
I don't think it's helpful to separate out the advice to authors about securing funding for open access publishing, without first explaining that they can also meet the requirement by publishing in the traditional way and also depositing a version in an open access repository. I suppose I would say that: I'm a repository manager!
I'm not sure how funders are communicating the message about their requirements to their researchers. I'm not sure what researchers understand about those requirements, but I do believe that when authors are aware of open access, they nearly always think of paying to publish or entirely free journals, rather than repository deposit of early versions. I'm not sure why that is. In some disciplines/with some journals, there is never an author's own version created, after peer review, and authors do not want to share their earliest versions that lack the polished rigour that the peer review process adds to their work. So perhaps they are just more comfortable with the concept of paying to publish the final version in an open access way.
Why should an author choose to pay for open access publication? There are a few reasons that occur to me straight away:
1) The journal that the author wishes to publish in is funded entirely by open access author fees, so it is the only way to get the work published in the most appropriate journal. Or if the publisher does not allow repository deposit of any version of the work without a fee being paid. Authors should be able to choose the best channel for communicating their research regardless of what funding model the publisher applies.
2) The publisher will ensure that the funder is informed of the researchers' compliance with the requirement. That is a service worth paying for: open access publishers who keep PubMed updated with Wellcome Trust funded articles are performing such a service. Publishers can only do that for authors if the authors inform them of their funding details, though: do authors know that this is a part of what they are paying for?
3) The article should be made available on open access immediately, without any embargo period because there is so much interest in the work it describes.
Should an author have to pay for open access publishing in order to meet a funder mandate? I don't think so, but are funders saying that authors should choose to publish only in journals which support open access repository deposit or at least offer an author-pays model of open access publishing? This booklet makes me think that funder mandates are creating a world where fees for open access publishing must be met!
How are funders communicating their message to authors? How will funders be measuring compliance? How will funders define the outputs from the work they have funded? I don't think this message is reaching researchers all that clearly, and the suspicion that was apparent in a recent THES article is the result of a lot of confusion, I believe.
Repository managers are trying to get the message out about repository deposit, and to use the funders' mandates as a part of their message. Are funders expecting us to do this? Are we getting the message right?
Each institution will no doubt have a different type of repository with a different way of depositing, so no doubt it is appropriate that this part of the message be delivered at institutional level.
I believe that guidance should also be provided for repository managers, to complement this publication, and that future work on this topic should build in the repository deposit part of the open access message at HEI level, just as the University of Nottingham case study describes.
I know what advice and support I believe that we (repository staff) should be giving to our researchers/authors. We should explain that authors should make themselves aware of the specifics of their funders' requirement (& help them to do that) and that they should consider direct repository deposit as a route to open access publishing & therefore meeting a funder requirement. We (the institutional repository) should be supporting them in making that deposit, in the appropriate repository/ies (a weak point, I believe: I don't know of any IR depositing works in PubMed on authors' behalf, for instance... ), and then authors can consider paying for open access publishing as a separate matter... and seek funding accordingly.
May 12, 2009
Is Google Analytics accurate?
Writing about web page http://repositoryman.blogspot.com/2009/04/google-and-repositories.html
I think yes it is, in as far as it goes.
I'm not a techie and I don't have access to the logs for WRAP, to look at who has been visiting our server. But someone from our IT Services department has done this for me, just looking at last week's visitors. This has helped me to have a better picture of who is looking at the pdf files, which Google Analytics can't tell me, and it has also given me something to check GA's figures for accesses of our metadata records.
This is what I was told about last Wednesday's pdf files, which was apparently typical for the week:
"...there were 7,445 PDF requests. However, when I removed:
* search crawlers
* other robots
* partial downloads (Acrobat will download PDFs a page-at-a-time, resulting in many requests for the same file)
the results for the number of PDFs downloaded by actual real humans was a less-than-whopping 142.
Of these, about 100 came from the metadata pages in WRAP, with the remainder more-or-less equally split between google scholar, google search, warwick search, and MSN search. Google scholar just about has the edge on the other search engines, with 12 referrals."
This is a relief to me in a way, because it is what I was expecting to find, and I had been startled by the high access figures reported last week: it's a shame that these were not real people, but I never expected that level of interest, and I was slightly alarmed about how I could find out anything about such an unexpected visitor pattern. I'm relieved to know that I was right in the first place, and there are indeed far more people looking at the metadata records in WRAP than looking at the full text files!
It is interesting to see that most people are reaching the pdf files from our metadata records, when they do go there. I expect that at least part of the reason for this pattern of behaviour is the way in which Google presents results from WRAP to its users. It always puts a metadata record first. I've linked to Les Carr's recent blog entry about Google's presentation of repository results because I think it's quite important for us to take Google's practice into account when trying to understand repository visitors' behaviour.
My friendly techie also looked at the access logs for the metadata records, to give me something to compare what G Analytics was telling me:
"Once I'd filtered out the bots as above, I ended up with 479 requests for metadata...
Of these, the referrers were overwhelmingly (388 hits) google. Another 80 or so came from internal referrals within WRAP (people using the browse/search pages), and the remainder were distributed between MSN search, links from www2.warwick.ac.uk. and other links from around the internet."
This picture matches very well with what G Analytics tells me. If anything, GA puts the number slightly lower, so perhaps its filtering is even more accurate.
I am presenting to another department tomorrow, so this is just in time to bolster my confidence in speaking about visitors to WRAP. My concern is to tell authors that I believe people will read the final version in preference to the repository pdf anyway - I don't have any evidence to suggest the contrary and they all seem to tell me that they would prefer to read the final published version of others' articles, so I expect that their peers do likewise. I would like to be able to tell them that there are large numbers of visitors to their content in WRAP, all with a scholarly interest, and some of whom will cite their work. I can do no such thing, but I can prove that at least some of the visitors are of scholarly background, and that will have to do for now. Of course, they could also follow my tips for how to attract more visitors to their paper in WRAP and how to raise citations of their work... but that is a whole different story!
May 11, 2009
Differentiation between G Scholar and Google itself
Follow-up to What Google is doing with WRAP! from WRAP repository blog
I've had confirmation from the Google Help forum, that Google Analytics does differentiate between Google and G Scholar. So none of those visitors who come to us from Google have come via Google Scholar... Google itself is our main source of visitors to the metadata records and other pages on WRAP.
Watch this blog for news of whether I find out how people get to the pdf files!
May 05, 2009
What should we be measuring …and why
More thoughts on repository statistics!
My basic reasons for looking at repository statistics are:
-1 Can assess and demonstrate that you are meeting aims/targets (& set such targets).
-2 Can gain interest/approval/support on the back of large numbers!
-3 Providing authors with information about who is looking at their work could motivate them to deposit.
-4 Might generate some competitive spirit!
-5 Identifying popular content might help in measuring citation impact of repository deposit.
Looking back at the basic reasons:
1) Aims and targets need to be set for the next 12 months, as we have emerged from our JISC funded project. I can only aim for things that I can measure so this becomes a circular argument! Ideally, I would like to be able to ensure that we are getting deposits of all appropriate items, across the whole University - and to know that we can handle such numbers. So what I really need to be able to do is to measure what the University's authors are actually producing.
I need to know about numbers of visitors to WRAP, and whether or not these can be boosted, in order to meet the goal of WRAP being a showcase for Warwick research.
Measuring how people get to WRAP is important, because if they all come via Google and bypass our metadata entirely, then this might cause us to review our metadata creation workflows. The value of metadata goes beyond bringing visitors to the repository, however, and that also needs to be documented.
2) Shouting about large numbers is fairly crude as a way of getting attention, so a crude measurement such as GA is probably appropriate. Having said that, the Apache logs record higher numbers, so I should be reporting those numbers rather than GA ones!
3) Providing information to authors. Well, GA is entirely inappropriate for that. I can provide some information for some authors, and that has been welcome. But the ideal scenario would be for authors to be able to access such information for themselves, whenever they want. And it really is a huge gap in the knowledge that I can share with authors, if I can't tell them about accesses to the actual pdf files. I'm not sure what authors' interest in statistics is in those repositories who do help authors to check for themselves. Authors here aren't clamouring for figures about who is accessing their work: some are pleased when I write to them with figures, but that is probably because I only write to our top content authors so I'm always spreading good news!
Generally, authors want to know if visitors are indeed academic, which is often very difficult to tell but GA does give me some clues. Being able to tell authors a little bit about visitors to WRAP is reassuring for them, and whilst addressing their every concern is more than I can manage, not knowing about pdf file visitors is a huge gap.
Authors are also concerned about their publishers, and it would be great to be able to demonstrate that repositories like WRAP don't harm publisher interests. This would not only reassure authors, but also perhaps reassure publishers and it would make the business of populating a repository so much simpler if publishers were supportive.
4) The competitive spirit could be between individuals or departments or even institutions. It could be based upon numbers of items in the repository, or numbers of visits or all sorts of different criteria. The competitive spirit ought to be directed towards appropriate measures. Focussing on numbers of items in the repository is probably enough for now: our main goal is to grow the repository.
Some element of benchmarking against other institutions is also going to be important, when it comes to resourcing decisions. This will mean measuring how many items we have, of what type and whether of full text or metadata only. Measuring how fast the collection is growing will help us to plan our workflows accordingly, and also be useful for benchmarking.
5) Measuring impact on citation: this is something that we claim as a benefit of repository deposit. I am always very cautious to claim this only in as far as it is common sense that more readily accessible work will be read more, and that more widely read research will be cited more. Even so, departments are asking me for evidence that repository deposit will boost citations. The repository does seem to fit into departmental meetings along with departments' concerns to raise citations so it is no surprise that the two are so closely associated. Evidence of this sort of impact would be highly influential in terms of encouraging deposit, if I could find it. I believe that the problem is that, by the time a repository has had its effect, it will be one of a number of factors influencing higher citations.
What I can hope to do, is to prove that the most highly visited items in the repository become the most highly cited. I need to know which items are most highly visited, and to look at the reasons why that might be.