All 6 entries tagged Google Analytics
View all 22 entries tagged Google Analytics on Warwick Blogs | View entries tagged Google Analytics at Technorati | There are no images tagged Google Analytics on this blog
April 30, 2009
A (very) quick investigation into what Google knows about WRAP...
An "Advanced Search" through Google, on the WRAP site, for pdf files tells me that Google is indexing 659 pdf files. This is more than our total number of records (580-odd) but it figures, because we have more than one file per record, sometimes.
Remove the filetype criterion and you get 2,440 results. The top one is our homepage, but the second one is what seems to be a random record within WRAP. It's not one of our Google Analytics' "Top Content" items. It's not the first one added to the repository (no. 381). Not sure what is happening there.
Doing the same searches on Google Scholar gives me considerably lower results. I had thought that Google Scholar was indexing our content, and it is: some of it. 210 pages & files in total, to be precise. This is something that was raised on the UKCoRR discussion list recently as a concern amongst repository managers. We just don't know what Google Scholar is doing with our content. Most of the 210 pages are pdf files, though - 145 of them, and the top results are again fairly random in their order, as far as I can tell.
Google Analytics is great at telling me lots of interesting stuff about visitors to WRAP. But it's biggest weakness is that it can't measure accesses to pdf files. I was reminded today of just how big that weakness is, when our IT Services department looked at the access logs for me and told me that we're averaging about 5000-6000 pdf files accessed per day, whilst the metadata records are only accessed around 700-1000 times a day.
...and I had thought that no-one was looking at the pdf files. Silly me!
But it is surprising that Google Analytics was trying to tell me anything about pdf file accesses at all. I looked at my "Top Content" today, and did a search for URLs containing "pdf". And I found that the top 10 pdf files for the last month had had 134 accesses between them. The most popular of these I investigated further, and the source included one access through Google. What is Google/Google Analytics doing?!!
I have not yet had a reply to my query on one of the Google forums, about whether Google Analytics differentiates between Google Scholar and Google referrals to our pages.
Why am I so hung up on Google? I think most of our visitors come from there.
Why am I so keen to find out what Google Analytics can tell me about our visitors? I can't tell academics to put their work in the repository if I can't prove that:
a) more people will read their work as a result.
b) those people are likely to be the kind of people who will cite their work & further academic knowledge.
c) visitors to WRAP are not going to adversely affect journal publishers.
Of all of these, point c is probably the most contentious. Academics are very protective over their journal publishing model and I can see why: WRAP certainly can't do everything that their publishers can. I am keen to hunt out whatever evidence I can find, to reassure our academics and to back up claims that repository deposit leads to more citations. It is citations that Universities are seeking, in time for the REF!
April 27, 2009
Some more trawling through the Google Analytics Help section today has turned up some discussion on their forum, which suggests that GA doesn't count clicks on links to external web pages, only clicks on links to elsewhere within the site.
This is something of a relief: I am not surprised that people are not clicking to read the pdf files of articles in WRAP, because I expected them to be interested in reading the "version of record", which is the one on the publisher's website which we always link to. I was very surprised that GA's site overlay reported no clicks from our records to publishers' web pages. Why would so many people be looking at the record in WRAP, while none were looking at the article itself? The only answer I could think of was that they must have gone back to their Google results that brought them to WRAP in the first place, or that they were satisfied by reading the article record in WRAP. Yet some must surely have wanted to read more, especially given that so many visitors seem to have come from academic networks... and now it all makes sense that people could indeed be clicking on our links to publishers' pages, but we simply can't measure those clicks.
So I can reassure authors that no-one will be reading the pdf versions we hold in WRAP(!) unless they have no other option because either they don't have a subscription to the published version or the published version is no longer available. Which is kind of what they want: many authors don't really feel comfortable with making their own early versions available. Now all I need to do is to convince authors of why we want the full text in WRAP at all, given that I know no-one is looking at it! My usual list of reasons is:
1) Google indexes the full text file, bringing visitors to your work in WRAP.
2) There will be those without subscription access who will be glad to read the earlier version. (This will include those in the commercial world but also those in academia in less wealthy countries).
3) This will be a back-up version of the work, for times when the publisher might be unable to make the work available - either temporarily, due to a technical hitch, or for the long term.
4) It is the long term nature of a repository that makes it different to putting your article on your own web page. Putting your article into an institutional repository is like libraries of old having copies of books and journals on shelves for future generations to consult.
As a librarian, my concern is to collect scholarly works and to make them available when they are needed. WRAP may be an electronic collection, but what we (the library) are trying to achieve with WRAP is very much our traditional role: we're just finding different ways to do that, as technology changes the possibilities for us.
April 16, 2009
I've been thinking about all the things that might lead to an author becoming highly cited, or raising citations for a particular paper, in ways other than just WRAP deposit. I often feel pressured to prove that WRAP deposit can raise citations: it is one of the claims that we make, so we should be able to prove it. Yet it seems to me to be impossible to do: all I can do is point to articles which say that open access publishing raises citations, and say that WRAP deposit is a form of open access publishing. It stands to reason that if more people can find and read your article, then more people will cite it, in the long run. But authors seem to want better evidence than that, and they would prefer to have evidence that WRAP itself will raise their citations, not just about repositories generally.
So, I've been looking at what Google analytics can tell me (again!) and matching that to tactics that are rumoured to raise citations. One such tactic is to use a key phrase repeatedly in article titles, or to publish consistently on/around a particular theme, so that you get known as an expert for something in particular. I'm not sure whether anyone ever does this in such a calculated way, and it's probably more likely that a particular phrase is associated with an expert on account of the fact that it was his/her work which invented the concept. But anyway, GA can tell me which keywords have led people to WRAP.
This month, the highest keyword search leading to visits to WRAP is "interracial sex", and other keyword phrases that people are searching for when they come to WRAP are: "street slang", "leishmaniasis recombinant vaccines" and "educational leadership theories". Other phrase searches include entire article titles.
What do such phrase searches tell me? Well, in the case of article titles, it is clear that it is the academics' work that is being sought. In the case of keyword phrases, it could be that "social searching" is leading visitors to WRAP as much or as well as academic searching, in some cases. Looking at the papers that these keywords led to, and at the "content overlay" feature of GA, which tells me where people clicked when they visited that page, I can't see that people are clicking on the pdf or the publisher's link. They appear to be looking at the WRAP record and then looking away again: this might mean that they read the abstract and learnt enough, or that they were indeed looking for something else entirely. The most popular papers in WRAP correspond with the keyword phrases that are leading most visitors to WRAP. I've looked in some detail at the profile of visitors to those popular papers, and from the network locations of the visitors, many are indeed on identifiably academic networks. Even those on commercial networks could be academics working from home.
In short, what I can say is that keyword phrases will bring visitors to your paper in WRAP - if your paper is there. At least some of those will be the kind of visitor that you will want to have. It really doesn't take that much effort to deposit: visit a web page, upload a file, tick a couple of boxes (literally 2!) and paste a reference in. Time will tell whether all that effort is worth it, because the business of becoming highly cited takes a very, very long time and a lot more than just repository deposit.
If I can possibly prove that WRAP deposit will raise citations, I will do. But in the meantime, there needs to be work in the repository for me to look at the statistics for... it's early days for WRAP still, and even for repositories.
April 03, 2009
Writing about web page http://www2.warwick.ac.uk/services/library/main/research/instrep/erepositories/
I've just noticed that it's almost a month since I last posted to this blog. It's been incredibly busy as we've just concluded the JISC funded project of WRAP, and now we're entering a new era in which WRAP is the repository but no longer a project.
Our end of project report is now on the web (see link above to project page), and for those with an interest in repository statistics, I recommend the appendix on what we can tell about WRAP through Google Analytics, ROAR and the University's web tool known as Sitebuilder.
The future direction of WRAP is uncertain as the University's steering group are considering what to do with Publications data, but we will certainly be carrying on with WRAP as we have done, increasing our deposits of full text journal articles and handling PhD theses.
We just announced to the University of Warwick staff that we have surpassed 500 items in WRAP (through a University-wide electronic newsletter known as "inbox insite"), and to date we have 536 items, so the collection is most certainly growing, and with it our visitor numbers are increasing.
As we've had more interest in WRAP we've had more enquiries and our FAQs have been expanded beyond what I think is a reasonable number, so I just introduced an index, using a feature of Sitebuilder that creates a "table of tags" from the keywords in the metadata for the pages... I just had to revise the keywords for all those pages, to make sure I used consistent terms, applied consistently throughout the collection. I like to call this a "tagsonomy", and if you're at Warwick and would like to read more about these tags, please see these other blog postings: http://search.warwick.ac.uk/blogs?q=tagsonomy
January 19, 2009
1) 6 monthly report of data changes on WRAP to show which records have been altered since the date they were added into the live repository. (For sharing data with Warwick’s Research Support Services.) Not currently possible.
2) A graph to show how the pattern of new record creation/repository growth has gone, over the last x months/year. I can get this from ROAR. (http://www.roar.org)
3) Monthly report of all records added since last month, with data in specific formats to suit RSS’ InfoEd system (and/or other departments at Warwick). Key issues with sharing with RSS: need to store staff number (or key to call up staff number) for each Warwick staff member amongst the authors, and lack of security for such data in WRAP. Also, page range is currently exported as, eg 51-72, whereas RSS need it as "start page 51, end page 72". More investigation into the technical possibilities for data sharing needs to be done. It may be significant that InfoEd attaches information to a person’s profile, relating to publications (& other activities). Whereas WRAP attaches information about authors to a record describing a publication.
4) Statistics on visitors to WRAP and what they are clicking on, where they come from, etc. Google Analytics does this well enough for me: I can see where they’re clicking, what keywords brought them to WRAP, to where in WRAP, and who their network provider is, (which is a clue to some academic interest, and also helps to identify internal interest). I can see what countries visitors are in, and what cities, etc. I can do all this at a per paper level, but I have to know which paper(s) I want to look at.
5) To look at features like those listed above, for a set of data (eg all by one author, or all for a particular department). Departments and authors may well want to know who is looking at their work in WRAP. I can look at particular paper, but not at a set: I would have to collate reports for each paper, in some way. IRStats should be able to do this, if we were to install it successfully on WRAP… although it may require some change in our workflow. At the moment, most papers are added to WRAP by our very own administrator, since authors use a separate (& simple) submission form. Authors do not upload data about their own publications and therefore the papers are not attached to separate accounts in WRAP. I believe that IRStats would need separate accounts to be used for each author’s papers, in order to produce reports on all of an author’s papers. Our administrator could create accounts in authors’ names and then log in as the author before creating the record… but that all presupposes that we can get IRStats to work, and that it does work as I expect.
6) It would also be better for me (and for those interested in the data) if I did not have to look up statistics such as those already provided by GA myself, but if those interested could just look them up, on demand. In theory, I can grant access to the GA reports to anyone with a Google account… although this requires some intervention from me. And Google Analytics is great for those who know how to use it, but I can see academics being put off learning how to use it. There are barriers to authors getting data about all the wonderful good WRAP is doing in bringing an audience to their work!
7) GA is great for looking at the site and our html files, but tells us nothing about pdf/word document downloads. The difference between “the most downloaded document” and “the most looked at record” could be very important indeed, if any correlation with citations is to be explored. Also, I can tell from GA if someone has followed the link to the DOI on a particular record. I can’t tell whether anyone has followed the link from within the pdf file to the full text, published version, though.
8) What are people searching for from the repository's own search form: which fields do they search by? GA can only tell me whether people click through from our Advanced form to the Simple search one, and indeed whether people follow the link to search the repository in the first place from our home page. Thus far, there aren’t so many people searching, and we expect that people will not search through our form but on search engines like Google, with keywords which GA does record and tell us, so this isn’t particularly crucial.
I’m also not sure of how to make GA discount visits from members of the WRAP team… but I expect that’s something I ought to look into.
I’ve learnt a lot about what GA can tell me about WRAP and its visitors. I find it fascinating to delve in every now and again and see what brings people to us. It can be used as a website management tool, to see how to make important links more visible and hence more clicked upon. It can be used in advocacy to authors, explaining why they might want to put work into WRAP, showing that others do look at it.
What I would like to do is to compare our statistics with those of other repositories, at other institutions. It’s not easy to find other repositories that are comparable with ours in their features (full text, mediated metadata, voluntary deposit), never mind such repositories at comparable institutions. But it is possible to find those who are much further ahead of us, and it would be good to see where we might be heading, in terms of visitor profiles, whether most visitors came from search engines (as now) or direct links, etc. I would like to know whether the most popular content in others’ repositories is journal articles or unpublished content, and whether there is a particular subject that gets heavier attention than others. So, I would like to be sure that, whatever statistics package we use for WRAP, it is one that would enable us to compare our repository with others. There isn’t such a package or method of using a combination of statistics packages, yet.
November 10, 2008
Well, my thoughts on the topic so far stretch to:
1) Numbers of visits/visitors, which you can get as a whole since you launched and/or as a month on month comparison. Ours don't tell us too much, except that people don't visit WRAP much at weekends and we're growing more visitors since we launched. As we're also growing content, this just confirms that there's nothing I should be worried about! I'm not altogether sure what is the best way to measure these using Google Analytics: should I be looking at page visits or visitors? Should I be looking at the Unique Visitors if I'm going to look at Visitors? At the moment we're talking pretty small scale differences and there is no difference in the pattern, so for my own needs, any of these would be appropriate. But what if I wanted to benchmark against another repository? (GA does have a "benchmark" feature which supposedly benchmarks your website against other sites of the same size. I don't fully understand it and it makes WRAP look really good, but I don't believe it's all that useful to benchmark WRAP against unknown websites!)
Information about visitors includes looking at which countries and networks they have come from. I can drill down further within the UK visitors to find out which cities the visitors came from. Of course the largest contingent of our visitors were from Coventry, and from withing the Warwick network. But there are other academic networks appearing in the list, including Southampton, Durham, Birmingham, Edinburgh and others.
2) Traffic sources. The latest beta "Advanced segments" option shows me very nicely the whole number of visits as a line graph, with different coloured lines for the traffic sources, be they direct visits (eg bookmarks, someone types in the URL), search engine referrals or web page referrals. The pattern seems to be remarkably similar across all three, although the search engines are by far the largest traffic sources. Looking further at which web pages link to WRAP is an interesting exercise... likewise for looking into which keywords were typed into the search engines that led to a visit on WRAP. Mostly the web pages are Warwick Uni ones. At first the keywords were nearly all general enough to suggest that peopl were looking for WRAP itself, or something like it. But now that we have more content, the keywords are getting much more specific.
3) Content: Pageviews give you a lovely big number, if that's what you need to show! But I hardly think it is more useful than the number of visits or visitors. TopContent tells me that the pages in WRAP that are visited most are the home page, search pages, admin pages and the browse pages, etc. This is the closest to telling me which are the most visited papers in the repository, which is useful for advocacy. Except that I can't possibly know whether the papers themselves were read, only that their records were read... The site overlay feature might show this for a single record, but I can't compare papers on such popularity of pdf download. And I cannot tell much information about the visitors to an individual paper: I can see which keywords led to that paper, which sources linked to it. But not whether the visitors were on an academic network or not, from the UK or not.
The Top Landing pages tell me which pages within WRAP people are reaching WRAP through. Our most important page is our home page, but after that are actual article records. I can use this in advocacy work, to claim that "the paper that has had most direct hits to it within WRAP is...." But of course that would not necessarily be the most popular paper in WRAP. Just the one that more people are following links from elsewhere to. Academics could easily boost this statistic for their paper just by sharing the WRAP URL for their work.
The Top Exit pages provide a nice balance to those, so presumably our visitors are looking at precisely what they wanted to find and not hanging around (also described in the high Bounce Rate). However, people are exiting from our search page, browse by department page and latest additions page as well. I am a little concerned about people who don't make it past the search page: we link directly to the advanced search form, but I might want to change that if there is a real problem with this. But I'm not worried yet, it's just something to watch.
Site Overlay looks like a great feature but I don't understand what on earth all those percentages mean! If I'm right, when I look at the record for an article and I can see the link for the pdf, if it says "0%" then that means that no-one has clicked on it. But I'm not sure I've got that right.
But that's only all about what I can do with Google Analytics. The list of what I would really like to be watching/providing to authors is most likely to be entirely different.