All entries for Thursday 30 April 2009
April 30, 2009
What Google is doing with WRAP!
A (very) quick investigation into what Google knows about WRAP...
An "Advanced Search" through Google, on the WRAP site, for pdf files tells me that Google is indexing 659 pdf files. This is more than our total number of records (580-odd) but it figures, because we have more than one file per record, sometimes.
Remove the filetype criterion and you get 2,440 results. The top one is our homepage, but the second one is what seems to be a random record within WRAP. It's not one of our Google Analytics' "Top Content" items. It's not the first one added to the repository (no. 381). Not sure what is happening there.
Doing the same searches on Google Scholar gives me considerably lower results. I had thought that Google Scholar was indexing our content, and it is: some of it. 210 pages & files in total, to be precise. This is something that was raised on the UKCoRR discussion list recently as a concern amongst repository managers. We just don't know what Google Scholar is doing with our content. Most of the 210 pages are pdf files, though - 145 of them, and the top results are again fairly random in their order, as far as I can tell.
Google Analytics is great at telling me lots of interesting stuff about visitors to WRAP. But it's biggest weakness is that it can't measure accesses to pdf files. I was reminded today of just how big that weakness is, when our IT Services department looked at the access logs for me and told me that we're averaging about 5000-6000 pdf files accessed per day, whilst the metadata records are only accessed around 700-1000 times a day.
...and I had thought that no-one was looking at the pdf files. Silly me!
But it is surprising that Google Analytics was trying to tell me anything about pdf file accesses at all. I looked at my "Top Content" today, and did a search for URLs containing "pdf". And I found that the top 10 pdf files for the last month had had 134 accesses between them. The most popular of these I investigated further, and the source included one access through Google. What is Google/Google Analytics doing?!!
I have not yet had a reply to my query on one of the Google forums, about whether Google Analytics differentiates between Google Scholar and Google referrals to our pages.
Why am I so hung up on Google? I think most of our visitors come from there.
Why am I so keen to find out what Google Analytics can tell me about our visitors? I can't tell academics to put their work in the repository if I can't prove that:
a) more people will read their work as a result.
b) those people are likely to be the kind of people who will cite their work & further academic knowledge.
c) visitors to WRAP are not going to adversely affect journal publishers.
Of all of these, point c is probably the most contentious. Academics are very protective over their journal publishing model and I can see why: WRAP certainly can't do everything that their publishers can. I am keen to hunt out whatever evidence I can find, to reassure our academics and to back up claims that repository deposit leads to more citations. It is citations that Universities are seeking, in time for the REF!