All entries for February 2010
February 23, 2010
Last Friday I was at the UKCoRR members' meeting. As their Chair, I reported on my activities and announced speakers. As a repository manager, I learnt a lot from the other participants.
Louise Jones introduced the day, as the University of Leicester library were our hosts. They have recently appointed a Bibliometrician at Leicester and they're acquiring a CRIS to work alongside their repository. They have a mandate for deposit and Gareth Johnson's presentation later in the day about the repository at Leicester mentioned that they have more than enough work coming in, without the need for advocacy work to drum up deposits. I guess that the CRIS will come in handy for measuring compliance with the mandate!
Gareth's presentation also included some nice pie charts showing what's in their repository by type, and what's most used from the repository, by type and then again by "college" (their college is like a faculty). Apparently he had to hand-count the statistics for the graphs... well done Gareth!
Nicky Cashman spoke about her work at Aberystwyth and I found it interesting that one of their departments' research projects on genealogy has hundreds of scanned images of paper family trees that they are looking for a home for, at the end of their project. They don't require a database to be built around their data as they already built one, and they want to link from it to the scanned images. This sounds like a great example of the kind of work that the library/repository can do to support researchers with their research data. The problem is, though, that in order to host that kind of material in a repository there will be substantial costs, (cataloguing each item, storing it and preserving it) and these costs perhaps ought to have been included in the original research bid. Researchers ought to be thinking about such homes for their data at the beginning of their projects, rather than at the end.
Nick Sheppard spoke about his work on Bibliosight and using the data provided through Web of Science's Web Services. There was some discussion about the fact that you can't get the abstract out of WoS because they don't own the copyright in it in order to grant that we might use it...
Jane Smith of Sherpa demostrated some of the newer and more advanced features of RoMEO. I think that the list of publishers who comply with each funders' mandate is something that might be of use to researchers looking to get published. Also, the FAQs might be useful for new users of RoMEO.
I would like to see the Sherpa list of publishers who allow final version deposit enhanced to include which of them will allow author affiliation searching as well, so that we can find our authors' articles in final versions and put them into the repository. And another column to say whether the final versions are already available on open access or not, because I'd prioritise those not already available on open access.
One development that has been considered for SherpaRoMEO is that it should list the repository deposit policy at journal title level, because publishers often have different terms for different titles. However, in trying to develop such a tool, it has transpired that one journal might appear to have many copyright rights owners, when looking at the different sources of information about journal publishers. For instance, the society or the publisher who acts on their behalf might each claim the rights and each have different policies. Which rights owner's policy ought SherpaRoMEO to display?
Hannah Payne spoke about the Welsh Repository Network who have a Google custom search for all the welsh repositories which I like but would wish to see a more powerful cross-searching interface, and in the afternoon we did a copyright workshop that had also been run at one of the WRN events.
So there is plenty I can take away from the day.
February 22, 2010
Today I've been writing up some handover notes on statistics for the next E-Repositories Manager at Warwick.
One thing that has been interesting me for a while is the "Referring sites" information on Google Analytics. Most of our visitors come from Google itself, and the great blue wedge on the pie chart that is search engine referrals resembles a pac-man shape: it has been swallowing up all other sources of visitors, month on month...
Ideally, we'd like for people to be linking to documents in the repository, and for people to be following these links: this would increase our "Google juice"... and perhaps such an effect would result in more visitors from search engines, and thus my pie-chart of visitor sources will always look like a blue pac-man character!
The referring site that brings us most visitors is Warwick's own, and within the Warwick domain, the page we created under the University's "Research" page brings us most visitors. This is good news because it shows the importance of us having this page, and not only linking to the repository within the library's pages.
The next most important pages are the ones from within the library's website, which is fine. Our next most important source of visitors is from the profile page of one particular academic who is very good at linking to his papers in WRAP!
It would probably be a good advocacy tactic to write to authors to say how many visitors have come to WRAP by following links on their pages... if we had the time to go through all these stats! Given that many of the profile pages which are bringing visitors to WRAP are those generated by the University's "MyProfile" system, then it would also serve as good advocacy for MyProfile.
(NB for non-Warwick people: MyProfile is what we call the part of InfoEd which documents academics' work and is used by our Research Support Services department. It is used well by some departments and not very well by others, and not all departments choose to have staff profile pages driven by its data. It serves as a kind of publications database for Warwick and is one of the reasons why WRAP remains full text only. We share our data with MyProfile through a report sent every month and Warwick authors can update WRAP by uploading a file through MyProfile.)
February 15, 2010
Writing about web page http://repositories.webometrics.info/methodology_rep.html
Webometrics have published their rankings for repositories, and their methodology is described online. This is the first time they've actually listed WRAP and we're at no. 273. They are primarily focussed on repositories like WRAP that are all about research content. Their criteria for measurement are listed as:
"Size (S). Number of pages recovered from the four largest engines: Google, Yahoo, Live Search and Exalead.
Visibility (V). The total number of unique external links received (inlinks) by a site can be only confidently obtained from Yahoo Search and Exalead.
Rich Files (R). Only the number of text files in Acrobat format (.pdf) extracted from Google and Yahoo are considered.
Scholar (Sc). Using Google Scholar database we calculate the mean of the normalised total number of papers and those (recent papers) published between 2001 and 2008."
But if you decided that the Webometrics ranking were an important one (a whole other issue!) then you might want to work on influencing these...
50% of the ranking is given to Visibility, so you'd want to concentrate on getting people to link in to your content from other sites. This is not only good for Webometrics, but reputedly also for your "Google Juice" (ie how high your content appears in Google results lists). I've yet to investigate whether we can find any stats out for ourselves from Yahoo Search or Exalead. However, sending this message out to your authors that they should link in to your content and encourage others to do so could cloud the main issue, which is about getting them to send us content in the first place. I think that this kind of a message is one for a mature repository to focus on, where there is already a culture of high deposits. Because the main priority for a repository is surely to make lots of content available on OA, not to score well in a repository ranking!
20% is dependent upon size. So getting lots of content and focussing on this message with your authors is important too. It is my highest priority in any case...
15% is dedicated to "Rich files" which seems to be if there are pdf files... this isn't necessarily the best thing for a repository from a preservation angle, nor if you would like to allow data-mining on your content. It might not even be the best display format for all types of content. So it would seem to me to be the least important metric to focus on, if I understand it correctly.
The final 15% is dependent on Google Scholar... Google Scholar does not currently index all of WRAP's content. I have written to them about this, and I know that other repositories have the same issue but I still haven't go to the bottom of it. My theory is that, if you read their "about" pages, they are indexing our content but not presenting it in their results sets because they de-duplicate articles in favour of final published versions: they present these rather than repository results, so if I look for all content on the wrap domain through GScholar I won't get as many results as I have articles in the repository. If my theory is right then it could be significant to learn whether Webometrics is using their raw data before any such de-duplication. I might be wrong, though!
Also note the dates of publication that are relevant to the GScholar data. We have said to authors that as far back in time as they feel is important/significant is fine with us (helps to win them over, useful for REF data and web-pages driven by RSS feeds from WRAP). But if you wanted to be more strategic in raising your ranking on Webometrics then you'd need to change the policy to focus on content published in the last 10 years...
I don't think we shall be playing any such games! But it is interesting to see what ranking providers consider to be important...