Is Google Analytics accurate?
Writing about web page http://repositoryman.blogspot.com/2009/04/google-and-repositories.html
I think yes it is, in as far as it goes.
I'm not a techie and I don't have access to the logs for WRAP, to look at who has been visiting our server. But someone from our IT Services department has done this for me, just looking at last week's visitors. This has helped me to have a better picture of who is looking at the pdf files, which Google Analytics can't tell me, and it has also given me something to check GA's figures for accesses of our metadata records.
This is what I was told about last Wednesday's pdf files, which was apparently typical for the week:
"...there were 7,445 PDF requests. However, when I removed:
* search crawlers
* other robots
* partial downloads (Acrobat will download PDFs a page-at-a-time, resulting in many requests for the same file)
the results for the number of PDFs downloaded by actual real humans was a less-than-whopping 142.
Of these, about 100 came from the metadata pages in WRAP, with the remainder more-or-less equally split between google scholar, google search, warwick search, and MSN search. Google scholar just about has the edge on the other search engines, with 12 referrals."
This is a relief to me in a way, because it is what I was expecting to find, and I had been startled by the high access figures reported last week: it's a shame that these were not real people, but I never expected that level of interest, and I was slightly alarmed about how I could find out anything about such an unexpected visitor pattern. I'm relieved to know that I was right in the first place, and there are indeed far more people looking at the metadata records in WRAP than looking at the full text files!
It is interesting to see that most people are reaching the pdf files from our metadata records, when they do go there. I expect that at least part of the reason for this pattern of behaviour is the way in which Google presents results from WRAP to its users. It always puts a metadata record first. I've linked to Les Carr's recent blog entry about Google's presentation of repository results because I think it's quite important for us to take Google's practice into account when trying to understand repository visitors' behaviour.
My friendly techie also looked at the access logs for the metadata records, to give me something to compare what G Analytics was telling me:
"Once I'd filtered out the bots as above, I ended up with 479 requests for metadata...
Of these, the referrers were overwhelmingly (388 hits) google. Another 80 or so came from internal referrals within WRAP (people using the browse/search pages), and the remainder were distributed between MSN search, links from www2.warwick.ac.uk. and other links from around the internet."
This picture matches very well with what G Analytics tells me. If anything, GA puts the number slightly lower, so perhaps its filtering is even more accurate.
I am presenting to another department tomorrow, so this is just in time to bolster my confidence in speaking about visitors to WRAP. My concern is to tell authors that I believe people will read the final version in preference to the repository pdf anyway - I don't have any evidence to suggest the contrary and they all seem to tell me that they would prefer to read the final published version of others' articles, so I expect that their peers do likewise. I would like to be able to tell them that there are large numbers of visitors to their content in WRAP, all with a scholarly interest, and some of whom will cite their work. I can do no such thing, but I can prove that at least some of the visitors are of scholarly background, and that will have to do for now. Of course, they could also follow my tips for how to attract more visitors to their paper in WRAP and how to raise citations of their work... but that is a whole different story!