All 16 entries tagged Google Analytics
View all 22 entries tagged Google Analytics on Warwick Blogs | View entries tagged Google Analytics at Technorati | There are no images tagged Google Analytics on this blog
February 22, 2010
Referring sites
Today I've been writing up some handover notes on statistics for the next E-Repositories Manager at Warwick.
One thing that has been interesting me for a while is the "Referring sites" information on Google Analytics. Most of our visitors come from Google itself, and the great blue wedge on the pie chart that is search engine referrals resembles a pac-man shape: it has been swallowing up all other sources of visitors, month on month...
Ideally, we'd like for people to be linking to documents in the repository, and for people to be following these links: this would increase our "Google juice"... and perhaps such an effect would result in more visitors from search engines, and thus my pie-chart of visitor sources will always look like a blue pac-man character!
The referring site that brings us most visitors is Warwick's own, and within the Warwick domain, the page we created under the University's "Research" page brings us most visitors. This is good news because it shows the importance of us having this page, and not only linking to the repository within the library's pages.
The next most important pages are the ones from within the library's website, which is fine. Our next most important source of visitors is from the profile page of one particular academic who is very good at linking to his papers in WRAP!
It would probably be a good advocacy tactic to write to authors to say how many visitors have come to WRAP by following links on their pages... if we had the time to go through all these stats! Given that many of the profile pages which are bringing visitors to WRAP are those generated by the University's "MyProfile" system, then it would also serve as good advocacy for MyProfile.
(NB for non-Warwick people: MyProfile is what we call the part of InfoEd which documents academics' work and is used by our Research Support Services department. It is used well by some departments and not very well by others, and not all departments choose to have staff profile pages driven by its data. It serves as a kind of publications database for Warwick and is one of the reasons why WRAP remains full text only. We share our data with MyProfile through a report sent every month and Warwick authors can update WRAP by uploading a file through MyProfile.)
February 15, 2010
Ranking repositories
Writing about web page http://repositories.webometrics.info/methodology_rep.html
Webometrics have published their rankings for repositories, and their methodology is described online. This is the first time they've actually listed WRAP and we're at no. 273. They are primarily focussed on repositories like WRAP that are all about research content. Their criteria for measurement are listed as:
"Size (S). Number of pages recovered from the four largest engines: Google, Yahoo, Live Search and Exalead.
Visibility (V). The total number of unique external links received (inlinks) by a site can be only confidently obtained from Yahoo Search and Exalead.
Rich Files (R). Only the number of text files in Acrobat format (.pdf) extracted from Google and Yahoo are considered.
Scholar (Sc). Using Google Scholar database we calculate the mean of the normalised total number of papers and those (recent papers) published between 2001 and 2008."
But if you decided that the Webometrics ranking were an important one (a whole other issue!) then you might want to work on influencing these...
50% of the ranking is given to Visibility, so you'd want to concentrate on getting people to link in to your content from other sites. This is not only good for Webometrics, but reputedly also for your "Google Juice" (ie how high your content appears in Google results lists). I've yet to investigate whether we can find any stats out for ourselves from Yahoo Search or Exalead. However, sending this message out to your authors that they should link in to your content and encourage others to do so could cloud the main issue, which is about getting them to send us content in the first place. I think that this kind of a message is one for a mature repository to focus on, where there is already a culture of high deposits. Because the main priority for a repository is surely to make lots of content available on OA, not to score well in a repository ranking!
20% is dependent upon size. So getting lots of content and focussing on this message with your authors is important too. It is my highest priority in any case...
15% is dedicated to "Rich files" which seems to be if there are pdf files... this isn't necessarily the best thing for a repository from a preservation angle, nor if you would like to allow data-mining on your content. It might not even be the best display format for all types of content. So it would seem to me to be the least important metric to focus on, if I understand it correctly.
The final 15% is dependent on Google Scholar... Google Scholar does not currently index all of WRAP's content. I have written to them about this, and I know that other repositories have the same issue but I still haven't go to the bottom of it. My theory is that, if you read their "about" pages, they are indexing our content but not presenting it in their results sets because they de-duplicate articles in favour of final published versions: they present these rather than repository results, so if I look for all content on the wrap domain through GScholar I won't get as many results as I have articles in the repository. If my theory is right then it could be significant to learn whether Webometrics is using their raw data before any such de-duplication. I might be wrong, though!
Also note the dates of publication that are relevant to the GScholar data. We have said to authors that as far back in time as they feel is important/significant is fine with us (helps to win them over, useful for REF data and web-pages driven by RSS feeds from WRAP). But if you wanted to be more strategic in raising your ranking on Webometrics then you'd need to change the policy to focus on content published in the last 10 years...
I don't think we shall be playing any such games! But it is interesting to see what ranking providers consider to be important...
November 09, 2009
Telling a success story
Writing about web page http://go.warwick.ac.uk/repositories
WRAP reached its 2000th live item last week. We made a little news article about it for the library's home page and news feeds, including on Twitter and Facebook. We asked the University's central communications team to publicise it for us, as we did when we reached 1000 items. But whereas that was a big splash at a time of not many other news stories so we got a front page link on the University home page, this time we only got a mention on a news and events page that is buried inside the staff area of the Intranet.
We reached 1000 items in mid July when there weren't many staff around to read about it, which we thought was a shame. But now, even though there are plenty of people around, there is so much else competing for their attention that our 2000th item has passed with hardly a mention. Perhaps reaching 1000 over the summer was a blessing after all!
The 2000th item news went out on Thursday last week, and so far as I can tell the impact so far has been...
1) Two deposits from authors we've not previously been in touch with and a very slightly higher number of visitors to our deposit form - 24 in one day when the highest peak in previous weeks was 21. NB we still don't get anywhere near as many deposits as we do visitors to that form!
2) A handful more visitors to our main page about repositories at Warwick (linked). By which I mean a peak of 53 visits on Thursday when previous high peaks were 40/41.
3) Our approach towards 600 visits a day on WRAP itself was maintained last week. On Monday we reached 595 visits which was great. By Wednesday we'd slumped to 567 (still pretty high!) and on Thursday we were back at 588 visits.
4) The 2000th item which was linked to directly from our news stories attracted 15 visits, to date.
I shall continue to monitor visitor numbers to WRAP and to the 2000th item. Because I was busy last week, I didn't have time to write to the authors to explain to them, which I still intend to do. I also didn't find time to blog about our achievement here, which I've now done...
When we reached 1000 items we were also mentioned in a press release from the JISC, so that would no doubt have had more of an impact. Perhaps we will get more publicity from the JISC and from our University communications team when we reach 10,000!
There are many success stories to tell about a repository. The total number of items in it is a simple one, and reaching a first significant milestone is important.
The speed and consistency of growth is another type of success story. We took 11 months to reach our first 1000 items, and it took us just under 4 months to reach 2000. The story that tells is that it took us a relatively long time to sort out all our processes until they were optimised. I believe that it will take us longer than another 4 months to reach 3000 items (for various reasons but partly because of the need to process theses which take longer), but not significantly so. We can keep growing WRAP's collection in much the same way as we have been doing.
The quality of the items we have collected is a more difficult story to tell but I think that the most interesting number in this blog posting is that we are getting almost 600 visits daily. With only just 2000 items for people to be looking at. My feeling is that this is a significant part of WRAP's success story. People are reading the content and that is what gives us the motivation to go out there and get more!
October 30, 2009
Reporting on statistics
Writing about web page http://writetoreply.org/actually/2009/10/28/thinking-about-user-tracking-on-writetoreply/
I just asked on the UKCoRR list about Google Analytics, after forwarding a link to Tony Hirst's blog, as recommended by Andy McGregor of the JISC.
The replies got me thinking about how we use the statistics that we get from GA. Some repo managers are writing regular monthly reports for managers, as blogged by the CADAIR team: http://welshrepositorynetwork.blogspot.com/2009/10/statistics.html
I look at the stats at least once a month, in order to write to our "top content" authors. I use that e-mail as a way of promoting WRAP to the authors, especially those who might not be aware of WRAP or that their article is in the repository. (Deposited by co-authors or administrators on their behalf.) It has resulted in raised awareness, some goodwill and conversations about WRAP but has never led directly to further deposits - yet. I have copied the heads of department in to some of these e-mails, when I know the author is already comfortable and happy with WRAP, although I've no idea whether they pay any attention to the e-mails!
What do I say to our top content authors? Here's a template, which I don't often have to vary much....
I'm writing to inform you that your paper in WRAP: (REFERENCE)
Is the most popular paper in WRAP in the last month. I'm keeping our highly read authors informed of what I can about the visitors to their content. I should point out that it is actually the record that is being visited/read, rather than the full text itself. There have been NUMBER pageviews of the record describing your paper from DATE to DATE. All visitors came from a search engine, the vast majority from Google. Most looked at the record and went away again, but some explored the subject area in WRAP.
There is a great variety of keywords that have led visitors to your paper, including the following: (LIST KEYWORDS)
Visitors came to your record from NUMBER different networks, so it is not all Warwick people looking at your work. Noticeably academic/educational networks that your visitors came from include:
The vast majority (NUMBER) of visits were from within the UK, but your paper's record had visits from....PLACES.
There have been no great peaks and troughs of activity: visits come every day and remain at or under NUMBER per day.
I did a quick Google search for PAPER'S TITLE and your paper's WRAP record is Xth in the results list.
Whilst looking at the stats I might spot something interesting, which I would usually blog about here and write to people in the library who I think ought to know: managers and subject librarians, or even our internal e-mail newsletter to all staff.
I know that our library management group are interested in big numbers, like how many pageviews there have been since we went live, from how many hundreds of countries/territories, etc. They want to illustrate the success story that we're gaining in visitors every week as we grow in content ever more rapidly! In compiling such a news piece, I might look at our growth chart on ROAR as well, or at the number of items we hold for a particular department, to provide further background information about the interesting pattern.
I also send out a "newsletter" once a term, by e-mail to people who are interested in hearing more about WRAP. I know that they're interested because I introduced an "I would like to hear more" tick box onto the deposit form and they ticked it!
Otherwise, statistics might make their way into my presentations to departments or articles that I write to raise awareness of WRAP, or onto our web pages about the repository. They are something to say when we talk about WRAP and it's important to be able to give the detail and context that they provide, to keep people interested in our work.
September 29, 2009
Visitor pattern over the summer
I'm just back from a lovely break in Australia, and I'm pleased to see that our visitor numbers are up again, as the Autumn term is about to start here and is no doubt underway elsewhere. Looking at the graph of visits to WRAP over the last year, there is a clear sag over the summer down to about 200 visitors on a week day, but we're back up to around 400 again now and I hope that we continue to grow in visitor numbers as the amount of content also grows.
We're way past the 1000 mark that we reached at the beginning of the summer break: over 1600 items now, and growing fast!
July 09, 2009
All new visitors
I had another little play with Google Analytics this morning, just because I happened to be signed in to Google (looking at the Australian repository managers Google group). Today I explored the "Advanced Segments", available in Beta.
I compared "New visitors" to "All visits" in the last month, and the trend is remarkably the same pattern, just slightly lower. Very few people come back to WRAP after having visited once: it is not a site that people wish to visit and explore. They dip in for the content that they want, and they don't necessarily come back.
Our "Bounce rate" is also just over 71% and this also confirms the pattern of visitors going in and then back out again. Their landing page is therefore the one they want, and they follow our link to the published item (which is what we want). Or else they were looking for something else and they go back to their search results. They might have read our abstract and learnt enough about the paper to know that they want something else. Or they might not have wanted an academic paper at all. We can't actually tell where visitors go when they leave WRAP.
Looking at which pages are visited most often in WRAP tells me a little bit more. Just under 5% of all visits were to the WRAP home page. The "non-bounce" rate for the home page closely follows the total number of visits to that page, so the home page is not putting visitors off! Our second most popular page is our advanced search form. Which is interesting because our home page links to the simple search form, so I'm not sure how the advanced form became more popular than the simple one: must be the Google effect! The bounce rate for this page is just over 57%, which is quite high. I don't particularly like the advanced search form in terms of user-friendliness, so this doesn't surprise me.
Our highest bounce rate amongst the top content pages is our Information page (80%), which is also to be expected because the main way we use that page is to link to content on the Library's website, which is far more easy for us to edit and keep up to date. The next highest is for one of the papers, which is 72%, and so far as I can tell at a glance, this is typical of the bounce rates of our papers. Bounce rate brings in an interesting different dynamic for when I write to our top content authors... one with a high number of visitors who also remain in WRAP (about 54%) is also one which for we do not make full text available. I looked at the Navigation summary for this page, to try to get a feel for where the visitors to that paper were coming from and going to. 25% had come from a previous page in WRAP, and 29% went on to another page in WRAP. Mostly, people seem to have been clicking to see the pdf. But the locks are all working well, so people must be just testing them. The paper will be released from embargo at the end of this month. Other visitors did go on to view another paper by the same author, and some of them were clicking on our subject heading. I rather suspect that this anomaly is the result of my using this author's RSS feed on my example page, to show people how easy it is to incorporate WRAP results into a Warwick web page! I chose this author's content because it was already popular, but I will keep an eye on top contents' bounce rates in future.
The general trend of visitors dipping in and out of WRAP rather confirms my impression that most people will find repository content when searching elsewhere, rather than searching in the repository itself. This may change as our body of content grows. But for now, even though we're pleased to be approaching 1000 items, our content is pretty disparate across the subjects and as such, not the most useful of collections to researchers.
Referral traffic on WRAP is slightly higher than direct traffic, and both are bouncing around that zero line but again, it is search traffic that follows the overall pattern of visitors. Perhaps academics will get used to sharing WRAP URLs in the future, and we will get more referrals.
The pattern of visitors to WRAP pretty much confirms our dependency on Google and its ilk to bring us visitors. Which is OK, but a slide that I saw yesterday from Tom Abbott, who co-ordinates all Warwick's iTunesU output demonstrated the difference that being a destination site (or promoted on one) can make. The usage graph he showed, for a file on Warwick's own website, which was then published on iTunesU showed a vast increase. Visitor numbers for the podcast on Shakespeare's portrait seem to have been astronomical as a result of Apple promoting it amongst their collection (millions of visits rather than hundreds, as with WRAP!). iTunesU is a destination site. Repositories like WRAP are not, but perhaps repository cross-searching sites could be. That's a part of the reason for making our metadata such high quality and harvestable.
Looking at the highest referring sites to WRAP gives some clues as to who might become more important. The Index to Theses (http://www.theses.com/) is likely to be a destination site for some. They link to us from: http://www.theses.com/idx/registered_users/etd/96.aspbut as yet don't link to our metadata records from theirs. Fair enough, 'cos we've been a bit slow off the mark with the theses. But we're adding them now, and as we come to add more theses, this might be a source of more visitors: they do have an example of a Cranfield thesis with a link to the full text in the repository from their record. Although I didn't think that the link was all that obvious: it's in the left hand margin, rather than in the text of the record describing the item. So many academic resources seem so un-user-friendly!
Other referrers to WRAP include bing.com, which is actually a UK specialist search engine (and has recently become the second highest search engine source of visitors to WRAP), google itself (in various guises), Warwick's own research information system (whose records link to ours) and the NHS Evidence Health Information Resources website. I'm not sure why Bing and the Google guises reported in this section are not included in the search engine hits... but at any rate, I consider them search engines!
OpenDoar and ROAR brought us three visits each... and I think that at least one of those visits would have been me, checking our record with them! Repositories and repository cross-searching tools are not destination sites in their own right yet (or at least WRAP isn't one: perhaps larger repositories see a different pattern). Repository cross-searching sites are not bringing us visitors. Search engines are...
I don't know of anyone who does research by visiting a repository cross-searching tool. I don't believe that they're even promoted by academic librarians in their liaison with departments. OpenDOAR and ROAR seem to me to be more tools for those in the repository and information management sector. My personal favourite cross-searching tool is OAIster, but I haven't investigated them all thoroughly, and I think it's a side of repositories that deserves further investigation. Now that we've got content in them, with high quality metadata, how do we ensure that people will find it in ways convenient to them? Google is great but it only goes so far...
Once we have tools that we might recommend people to use, to find repository content, that's a whole other advocacy/information literacy journey we would need to go on, persuading researchers of how to find and use repository records. Our Psychology librarian has made a start: her tutorial for undergraduates, which she created in collaboration with tutors in that department, recommends that students might like to look at their tutors' papers in WRAP, so that they can learn about their areas of expertise and interest and get ideas for their projects. But there is a long way to go!
July 06, 2009
Summertime brings fewer visitors
As term drew to an end, we noticed a drop in visitor numbers at WRAP. Towards the end of May, we had around 350 or 400 visitors a day (depending on which source you look at!). G. Analytics was telling me about 350, with the usual weekend dip going below 200. Two weeks ago (last week of summer term here), we had in the region of 280 visitors per day, with a weekend dip no lower than usual. Things seem to have picked up a little last week and we're now getting 300 visits a day on a weekday...
It's hard to know exactly what this means, except that it reflects the academic year cycle. I'm trying to guess whether we've been getting primarily students looking at our content, or whether these are authors and researchers who might cite the work. It could be argued that the peak in the summer term was caused by cramming students, and that the tail-off was caused by students disappearing for the vacation. The pattern certainly matched student behaviour in terms of library visits: by the last week of term, hardly any were to be found in the library, but in the week prior to that we couldn't move for student bodies! But academics' work is also governed by the term time because they have teaching and marking to do and they also like to disappear for conferences, study tours and holidays in the summer, so are not so likely to be at their desks reading online journal articles...
So I can't really conclude anything from our visitor numbers, except to anticipate such seasonal fluctuations in future years.
I do note that it hasn't made a difference that we're growing in content: I would expect that additional material would bring us additional visitors, but even though we've been growing faster than ever lately, as we approach the big 1000, I'm not sure how fast a growth would be needed to counteract the summer down-turn in visitors!
May 12, 2009
Is Google Analytics accurate?
Writing about web page http://repositoryman.blogspot.com/2009/04/google-and-repositories.html
I think yes it is, in as far as it goes.
I'm not a techie and I don't have access to the logs for WRAP, to look at who has been visiting our server. But someone from our IT Services department has done this for me, just looking at last week's visitors. This has helped me to have a better picture of who is looking at the pdf files, which Google Analytics can't tell me, and it has also given me something to check GA's figures for accesses of our metadata records.
This is what I was told about last Wednesday's pdf files, which was apparently typical for the week:
"...there were 7,445 PDF requests. However, when I removed:
* search crawlers
* other robots
* partial downloads (Acrobat will download PDFs a page-at-a-time, resulting in many requests for the same file)
the results for the number of PDFs downloaded by actual real humans was a less-than-whopping 142.
Of these, about 100 came from the metadata pages in WRAP, with the remainder more-or-less equally split between google scholar, google search, warwick search, and MSN search. Google scholar just about has the edge on the other search engines, with 12 referrals."
This is a relief to me in a way, because it is what I was expecting to find, and I had been startled by the high access figures reported last week: it's a shame that these were not real people, but I never expected that level of interest, and I was slightly alarmed about how I could find out anything about such an unexpected visitor pattern. I'm relieved to know that I was right in the first place, and there are indeed far more people looking at the metadata records in WRAP than looking at the full text files!
It is interesting to see that most people are reaching the pdf files from our metadata records, when they do go there. I expect that at least part of the reason for this pattern of behaviour is the way in which Google presents results from WRAP to its users. It always puts a metadata record first. I've linked to Les Carr's recent blog entry about Google's presentation of repository results because I think it's quite important for us to take Google's practice into account when trying to understand repository visitors' behaviour.
My friendly techie also looked at the access logs for the metadata records, to give me something to compare what G Analytics was telling me:
"Once I'd filtered out the bots as above, I ended up with 479 requests for metadata...
Of these, the referrers were overwhelmingly (388 hits) google. Another 80 or so came from internal referrals within WRAP (people using the browse/search pages), and the remainder were distributed between MSN search, links from www2.warwick.ac.uk. and other links from around the internet."
This picture matches very well with what G Analytics tells me. If anything, GA puts the number slightly lower, so perhaps its filtering is even more accurate.
I am presenting to another department tomorrow, so this is just in time to bolster my confidence in speaking about visitors to WRAP. My concern is to tell authors that I believe people will read the final version in preference to the repository pdf anyway - I don't have any evidence to suggest the contrary and they all seem to tell me that they would prefer to read the final published version of others' articles, so I expect that their peers do likewise. I would like to be able to tell them that there are large numbers of visitors to their content in WRAP, all with a scholarly interest, and some of whom will cite their work. I can do no such thing, but I can prove that at least some of the visitors are of scholarly background, and that will have to do for now. Of course, they could also follow my tips for how to attract more visitors to their paper in WRAP and how to raise citations of their work... but that is a whole different story!
May 11, 2009
Differentiation between G Scholar and Google itself
Follow-up to What Google is doing with WRAP! from WRAP repository blog
I've had confirmation from the Google Help forum, that Google Analytics does differentiate between Google and G Scholar. So none of those visitors who come to us from Google have come via Google Scholar... Google itself is our main source of visitors to the metadata records and other pages on WRAP.
Watch this blog for news of whether I find out how people get to the pdf files!
May 05, 2009
What should we be measuring …and why
More thoughts on repository statistics!
My basic reasons for looking at repository statistics are:
-1 Can assess and demonstrate that you are meeting aims/targets (& set such targets).
-2 Can gain interest/approval/support on the back of large numbers!
-3 Providing authors with information about who is looking at their work could motivate them to deposit.
-4 Might generate some competitive spirit!
-5 Identifying popular content might help in measuring citation impact of repository deposit.
Looking back at the basic reasons:
1) Aims and targets need to be set for the next 12 months, as we have emerged from our JISC funded project. I can only aim for things that I can measure so this becomes a circular argument! Ideally, I would like to be able to ensure that we are getting deposits of all appropriate items, across the whole University - and to know that we can handle such numbers. So what I really need to be able to do is to measure what the University's authors are actually producing.
I need to know about numbers of visitors to WRAP, and whether or not these can be boosted, in order to meet the goal of WRAP being a showcase for Warwick research.
Measuring how people get to WRAP is important, because if they all come via Google and bypass our metadata entirely, then this might cause us to review our metadata creation workflows. The value of metadata goes beyond bringing visitors to the repository, however, and that also needs to be documented.
2) Shouting about large numbers is fairly crude as a way of getting attention, so a crude measurement such as GA is probably appropriate. Having said that, the Apache logs record higher numbers, so I should be reporting those numbers rather than GA ones!
3) Providing information to authors. Well, GA is entirely inappropriate for that. I can provide some information for some authors, and that has been welcome. But the ideal scenario would be for authors to be able to access such information for themselves, whenever they want. And it really is a huge gap in the knowledge that I can share with authors, if I can't tell them about accesses to the actual pdf files. I'm not sure what authors' interest in statistics is in those repositories who do help authors to check for themselves. Authors here aren't clamouring for figures about who is accessing their work: some are pleased when I write to them with figures, but that is probably because I only write to our top content authors so I'm always spreading good news!
Generally, authors want to know if visitors are indeed academic, which is often very difficult to tell but GA does give me some clues. Being able to tell authors a little bit about visitors to WRAP is reassuring for them, and whilst addressing their every concern is more than I can manage, not knowing about pdf file visitors is a huge gap.
Authors are also concerned about their publishers, and it would be great to be able to demonstrate that repositories like WRAP don't harm publisher interests. This would not only reassure authors, but also perhaps reassure publishers and it would make the business of populating a repository so much simpler if publishers were supportive.
4) The competitive spirit could be between individuals or departments or even institutions. It could be based upon numbers of items in the repository, or numbers of visits or all sorts of different criteria. The competitive spirit ought to be directed towards appropriate measures. Focussing on numbers of items in the repository is probably enough for now: our main goal is to grow the repository.
Some element of benchmarking against other institutions is also going to be important, when it comes to resourcing decisions. This will mean measuring how many items we have, of what type and whether of full text or metadata only. Measuring how fast the collection is growing will help us to plan our workflows accordingly, and also be useful for benchmarking.
5) Measuring impact on citation: this is something that we claim as a benefit of repository deposit. I am always very cautious to claim this only in as far as it is common sense that more readily accessible work will be read more, and that more widely read research will be cited more. Even so, departments are asking me for evidence that repository deposit will boost citations. The repository does seem to fit into departmental meetings along with departments' concerns to raise citations so it is no surprise that the two are so closely associated. Evidence of this sort of impact would be highly influential in terms of encouraging deposit, if I could find it. I believe that the problem is that, by the time a repository has had its effect, it will be one of a number of factors influencing higher citations.
What I can hope to do, is to prove that the most highly visited items in the repository become the most highly cited. I need to know which items are most highly visited, and to look at the reasons why that might be.