All entries for April 2009

April 30, 2009

What Google is doing with WRAP!

A (very) quick investigation into what Google knows about WRAP...

An "Advanced Search" through Google, on the WRAP site, for pdf files tells me that Google is indexing 659 pdf files. This is more than our total number of records (580-odd) but it figures, because we have more than one file per record, sometimes.

Remove the filetype criterion and you get 2,440 results. The top one is our homepage, but the second one is what seems to be a random record within WRAP. It's not one of our Google Analytics' "Top Content" items. It's not the first one added to the repository (no. 381). Not sure what is happening there.

Doing the same searches on Google Scholar gives me considerably lower results. I had thought that Google Scholar was indexing our content, and it is: some of it. 210 pages & files in total, to be precise. This is something that was raised on the UKCoRR discussion list recently as a concern amongst repository managers. We just don't know what Google Scholar is doing with our content. Most of the 210 pages are pdf files, though - 145 of them, and the top results are again fairly random in their order, as far as I can tell.

Google Analytics is great at telling me lots of interesting stuff about visitors to WRAP. But it's biggest weakness is that it can't measure accesses to pdf files. I was reminded today of just how big that weakness is, when our IT Services department looked at the access logs for me and told me that we're averaging about 5000-6000 pdf files accessed per day, whilst the metadata records are only accessed around 700-1000 times a day.

...and I had thought that no-one was looking at the pdf files. Silly me!

But it is surprising that Google Analytics was trying to tell me anything about pdf file accesses at all. I looked at my "Top Content" today, and did a search for URLs containing "pdf". And I found that the top 10 pdf files for the last month had had 134 accesses between them. The most popular of these I investigated further, and the source included one access through Google.  What is Google/Google Analytics doing?!!

I have not yet had a reply to my query on one of the Google forums, about whether Google Analytics differentiates between Google Scholar and Google referrals to our pages.

Why am I so hung up on Google? I think most of our visitors come from there.

Why am I so keen to find out what Google Analytics can tell me about our visitors? I can't tell academics to put their work in the repository if I can't prove that:

a) more people will read their work as a result.
b) those people are likely to be the kind of people who will cite their work & further academic knowledge.
c) visitors to WRAP are not going to adversely affect journal publishers.

Of all of these, point c is probably the most contentious. Academics are very protective over their journal publishing model and I can see why: WRAP certainly can't do everything that their publishers can. I am keen to hunt out whatever evidence I can find, to reassure our academics and to back up claims that repository deposit leads to more citations. It is citations that Universities are seeking, in time for the REF!

April 27, 2009

Site overlay: clicks on external links not counted

Follow-up to Keyword phrases from WRAP repository blog

Some more trawling through the Google Analytics Help section today has turned up some discussion on their forum, which suggests that GA doesn't count clicks on links to external web pages, only clicks on links to elsewhere within the site.

This is something of a relief: I am not surprised that people are not clicking to read the pdf files of articles in WRAP, because I expected them to be interested in reading the "version of record", which is the one on the publisher's website which we always link to. I was very surprised that GA's site overlay reported no clicks from our records to publishers' web pages. Why would so many people be looking at the record in WRAP, while none were looking at the article itself? The only answer I could think of was that they must have gone back to their Google results that brought them to WRAP in the first place, or that they were satisfied by reading the article record in WRAP. Yet some must surely have wanted to read more, especially given that so many visitors seem to have come from academic networks... and now it all makes sense that people could indeed be clicking on our links to publishers' pages, but we simply can't measure those clicks.

So I can reassure authors that no-one will be reading the pdf versions we hold in WRAP(!) unless they have no other option because either they don't have a subscription to the published version or the published version is no longer available. Which is kind of what they want: many authors don't really feel comfortable with making their own early versions available. Now all I need to do is to convince authors of why we want the full text in WRAP at all, given that I know no-one is looking at it! My usual list of reasons is:

1) Google indexes the full text file, bringing visitors to your work in WRAP.
2) There will be those without subscription access who will be glad to read the earlier version. (This will include those in the commercial world but also those in academia in less wealthy countries).
3) This will be a back-up version of the work, for times when the publisher might be unable to make the work available - either temporarily, due to a technical hitch, or for the long term.
4) It is the long term nature of a repository that makes it different to putting your article on your own web page. Putting your article into an institutional repository is like libraries of old having copies of books and journals on shelves for future generations to consult.

As a librarian, my concern is to collect scholarly works and to make them available when they are needed. WRAP may be an electronic collection, but what we (the library) are trying to achieve with WRAP is very much our traditional role: we're just finding different ways to do that, as technology changes the possibilities for us.

April 16, 2009

Keyword phrases

I've been thinking about all the things that might lead to an author becoming highly cited, or raising citations for a particular paper, in ways other than just WRAP deposit. I often feel pressured to prove that WRAP deposit can raise citations: it is one of the claims that we make, so we should be able to prove it. Yet it seems to me to be impossible to do: all I can do is point to articles which say that open access publishing raises citations, and say that WRAP deposit is a form of open access publishing. It stands to reason that if more people can find and read your article, then more people will cite it, in the long run. But authors seem to want better evidence than that, and they would prefer to have evidence that WRAP itself will raise their citations, not just about repositories generally.

So, I've been looking at what Google analytics can tell me (again!) and matching that to tactics that are rumoured to raise citations. One such tactic is to use a key phrase repeatedly in article titles, or to publish consistently on/around a particular theme, so that you get known as an expert for something in particular. I'm not sure whether anyone ever does this in such a calculated way, and it's probably more likely that a particular phrase is associated with an expert on account of the fact that it was his/her work which invented the concept. But anyway, GA can tell me which keywords have led people to WRAP.

This month, the highest keyword search leading to visits to WRAP is "interracial sex", and other keyword phrases that people are searching for when they come to WRAP are: "street slang", "leishmaniasis recombinant vaccines" and "educational leadership theories".  Other phrase searches include entire article titles.

What do such phrase searches tell me? Well, in the case of article titles, it is clear that it is the academics' work that is being sought. In the case of keyword phrases, it could be that "social searching" is leading visitors to WRAP as much or as well as academic searching, in some cases. Looking at the papers that these keywords led to, and at the "content overlay" feature of GA, which tells me where people clicked when they visited that page, I can't see that people are clicking on the pdf or the publisher's link. They appear to be looking at the WRAP record and then looking away again: this might mean that they read the abstract and learnt enough, or that they were indeed looking for something else entirely. The most popular papers in WRAP correspond with the keyword phrases that are leading most visitors to WRAP. I've looked in some detail at the profile of visitors to those popular papers, and from the network locations of the visitors, many are indeed on identifiably academic networks. Even those on commercial networks could be academics working from home.

In short, what I can say is that keyword phrases will bring visitors to your paper in WRAP - if your paper is there. At least some of those will be the kind of visitor that you will want to have. It really doesn't take that much effort to deposit: visit a web page, upload a file, tick a couple of boxes (literally 2!) and paste a reference in. Time will tell whether all that effort is worth it, because the business of becoming highly cited takes a very, very long time and a lot more than just repository deposit.

If I can possibly prove that WRAP deposit will raise citations, I will do. But in the meantime, there needs to be work in the repository for me to look at the statistics for... it's early days for WRAP still, and even for repositories.

April 08, 2009

Personal e–mail invitations

We've had 329 items through our submission form since Christmas 2008. Of these, our Repositories Assistant, Marie has submitted 221 records through that form. Which means that we've had 108 articles sent through administrators or by authors themselves, through our form.
Most of the articles that Marie has submitted have come from e-mails sent by authors to her, again most of which were generated through personal invitations as she scanned their web pages before writing to authors.
Some of those personal invitations will have been based on Zetoc alerts. Out of 124 alert-based emails sent to authors since Jan 2008 we have had 29 articles deposited, which makes a 23% success rate for those e-mail invitations. At first, very few alert-based requests were made, but this year we have sent out about ten a month, as a rough average. We are sent alerts by helpful subject librarians who monitor lists of names for a single department, and are able to quickly delete those which don't corresond to their department's subject. For this reason, one person can only monitor alerts for a single department so only a handful of departments' authors are being monitored, and authors with common names cannot be monitored at all. Subject librarians investigate alerts for items which appear to be journal articles in the correct area of interest for the Warwick author of that name, and forward alerts of Warwick authors' articles to Marie.
Marie's processing of the alert involves checking whether we've already got the article in WRAP (it has happened once or twice!) and then looking at the publisher's policy, to see if she can just put the final version into the repository and let the author know, rather than writing to the author to ask them for a post-print. This check also avoids asking for an article that cannot be put into the repository in full text.
Looking back through our alerts, I can see that we have written to some authors more than once: one author has written five articles, and we have written to him every time. Although our statistics show that that author has not deposited as a result of those alerts, I do know that individual since he followed our first alert invitation up with an enquiry, which resulted in deposits of other articles to WRAP. However, it might be worth our while considering whether to continue monitoring alerts for authors who never respond to our invitations (particularly those with the most common names, who generate a larger number of "non-Warwick" alerts which must be gone through every morning), or at least following up our e-mail invitations with a phone call. We have never chased anyone about an item we have invited deposit of. At least one other author has sent us other articles than the one we requested through an alert, from amongst his back catalogue of published work.
We are also looking at using other alerting services, and are currently investigating PubMed with its author affiliation searching.
Our Zetoc alert based requests would seem to be the most successful approach to authors to date, although we are now in a position to use other advocacy tactics, since the repository is well established and growing in size. We will, of course, continue to monitor the effectiveness of all our advocacy tactics...

April 03, 2009

A new era

Writing about web page

I've just noticed that it's almost a month since I last posted to this blog. It's been incredibly busy as we've just concluded the JISC funded project of WRAP, and now we're entering a new era in which WRAP is the repository but no longer a project.

Our end of project report is now on the web (see link above to project page), and for those with an interest in repository statistics, I recommend the appendix on what we can tell about WRAP through Google Analytics, ROAR and the University's web tool known as Sitebuilder.

The future direction of WRAP is uncertain as the University's steering group are considering what to do with Publications data, but we will certainly be carrying on with WRAP as we have done, increasing our deposits of full text journal articles and handling PhD theses.

We just announced to the University of Warwick staff that we have surpassed 500 items in WRAP (through a University-wide electronic newsletter known as "inbox insite"), and to date we have 536 items, so the collection is most certainly growing, and with it our visitor numbers are increasing.

As we've had more interest in WRAP we've had more enquiries and our FAQs have been expanded beyond what I think is a reasonable number, so I just introduced an index, using a feature of Sitebuilder that creates a "table of tags" from the keywords in the metadata for the pages... I just had to revise the keywords for all those pages, to make sure I used consistent terms, applied consistently throughout the collection. I like to call this a "tagsonomy", and if you're at Warwick and would like to read more about these tags, please see these other blog postings:

April 2009

Mo Tu We Th Fr Sa Su
Mar |  Today  | May
      1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30         

Visit the WRAP repository

Twitter Feed

Search this blog



Most recent comments

  • @Jackie, thanks! I'm very proud of the team and everything we have achived in the past year. Looking… by Yvonne Budden on this entry
  • That's an impressive amount of full text Yvonne. Congratulations to everyone at Warwick. by Jackie Wickham on this entry
  • In my opinion the DEA is a danger to digital liberties and should be thrown out, period Andy @ Lotto… by Andy on this entry
  • Has anyone tried an assessment using the suggested PIs– including the author of the paper? It seems … by Hannah Payne on this entry
  • Hi Yvonne I came across this article myself recently. And I was wondering how much of an issue this … by Jackie Wickham on this entry

Blog archive

Not signed in
Sign in

Powered by BlogBuilder