All 18 entries tagged Bibliometrics

View all 29 entries tagged Bibliometrics on Warwick Blogs | View entries tagged Bibliometrics at Technorati | There are no images tagged Bibliometrics on this blog

November 17, 2011

Google Scholar Citations metrics tool

Writing about web page

Authors can have Google create metrics reports for them. I've not heard of the i-10 index before, but there seem to be new measures every 5 minutes. Read all about it on the GScholar blog!

October 17, 2011

Microsoft Academic Search: get another h–index calculation!

Writing about web page

Microsoft is creating “Academic Search” as a rival to Google Scholar. I’ve been meaning to investigate it properly for a while!

I see that it has listings of top authors by subject, eg:
… though as with all things citation data related, researchers need to investigate the validity of the data set on which they are calculated. I am not actually sure what they mean by "top author" on these lists, so this is one for further investigation...

Microsoft's site states that it is in beta and I'm not too sure where the data comes from except that the site says that they are using data from open access repositories, publishers and web crawling.

There are also some great features on there like the AcademicMap where you can find many Warwick Authors. It appears from the map that you can contribute (on an orange bar at the bottom), so maybe authors can make sure it picks up on their work properly. Authors' profiles appear with h-index and g-index scores, no.s of papers, no.s of citations, etc. Sometimes they even feature a photo!

October 04, 2011

The UK REF 2014 – my notes on what citation measurement we might expect

Writing about web page

These are my notes upon reading the REF 2014 document: “Assessment framework and guidance on submissions” - page 22 onwards.

Each member of staff who is eligible for inclusion in the REF submission will have up to four outputs submitted.


In summary, each output must be:

- “The product of research, briefly defined as a process of investigation leading to new insights, effectively shared.” There is a full definition of research in the documentation.

- “First brought into the public domain during the publication period, 1 January 2008 to 31 December 2013 or, if a confidential report, lodged with the body to whom it is confidential during this same period”

NB if you get an online pre-print available within these dates but the official publication date is after 31 December 2013, you can still submit the item so long as you have evidence to prove the public domain availability of the item. One form of evidence relating to web content which is acceptable is: “a date-stamped scanned or physical printout or evidence derived from web-site archiving services.” (See Paragraph 111) Likewise though, if your publication was actually in the public domain prior to 1 January 2008 then it won’t be eligible.

- “Produced or authored solely, or co-produced or co-authored, by the member of staff against whom the output is listed, regardless of where the member of staff was employed at the time they produced that output.”

Although the official documentation expands on this more. (Paragraph 105 onwards)

Examples of output types given are:

“new materials, devices, images, artefacts, products and buildings; confidential or technical reports; intellectual property, whether in patents or other forms; performances, exhibits or events; work published in non-print media”, and other types can be included. The documentation goes on to say: “Reviews, textbooks or edited works (including editions of texts and translations) may be included if they embody research…”

“A confidential report may only be submitted if the HEI has prior permission from the sponsoring organisation that the output may be made available for assessment.” (para 115)


Editorships or other activities are not outputs, and so should not be included in the submission. Theses and “items submitted for a research degree” won’t count, although it does seem that published items and other eligible outputs based on your research degree can be listed.

Panels might choose to assess two outputs which are based on the same research and so have “significant material in common” as a single output, or to assess just the content which is distinct in each. Panels own judements will also be used in the instance of a publication which is a version of one published prior to 1 January 2008, as to whether the publication is eligible and how it is to be assessed: “Submissions should explain where necessary how far any work published earlier was revised to incorporate new material” (Paragraph 113)

“HEIs may not list as the output of a staff member any output produced by a research assistant or research student whom they supervised, unless the staff member co-authored or co-produced the output.” (para 110)


Output types are to be categorized in the institution’s submission, into:

i. Books (or parts of books).

ii. Journal articles and conference contributions.

iii. Physical artefacts.

iv. Exhibitions and performances.

v. Other documents.

vi. Digital artefacts (including web content).

vii. Other.

(Para 118) and there will be different data requirements for each of these categories.

Paragraph 119 says:

“Each of the following is required where applicable to the output:

a. Co-authors: the number of additional co-authors.

b. Interdisciplinary research: a flag to indicate to the sub-panel if the output embodies interdisciplinary research.

c. The research group to which the research output is assigned, if applicable. This is not a mandatory field, and neither the presence nor absence of research group is assumed.

d. Request for cross-referral: a request to the sub-panel to consider cross-referring the output to another sub-panel for advice (see paragraph 75d).

e. Request to ’double weight’ the output: for outputs of extended scale and scope, the submitting institution may request that the sub-panel weights the output as two (see paragraphs 123-126).

f. Additional information: Only where required in the relevant panel criteria, a brief statement of additional information to inform the assessment (see paragraph 127).

g. A brief abstract, for outputs in languages other than English (see paragraph 128-130).”

And there is much more information about how information on each of these features can be provided, in the document.


“Some sub-panels will consider the number of times that an output has been cited, as additional information about the academic significance of submitted outputs.” (Paragraph 131) They won’t be interested in the impact factors of the journals as such, but the number of citations accrued by the outputs themselves. The citation data is to be provided to the panels by the REF team, and submissions may not include details of citations in additional information for outputs.

“In using such data panels will recognise the limited value of citation data for recently published outputs, the variable citation patterns for different fields of research, the possibility of ‘negative citations’, and the limitations of such data for outputs in languages other than English.” (para 132)

I’ve not read criteria from each panel, but David Young’s blog entry at neatly summarises the expected use of citation data across the different panels (although note that consultation is still underway at present):

* “Different panels and UoAs will use citation data to differing degrees, and they will also differ in what kinds of outputs are acceptable
* All sub-panels of Panel A (life sciences and allied health disciplines) will use citation data “where it is available, as an indicator of the academic impact of the outputs, to inform its assessment of output quality”
* Just under half of the sub-panels in Panel B (physical sciences) will use citation data: Earth systems & Environment, Chemistry, Physics, and Computer Science.
* Two sub-panels in Panel C (social sciences) will make use of citation data: Geography, Environmental Studies and Archaeology (although not all of this UoA will use citations) and Economics and Econometrics.
* None of Panel D (arts and humanities) will use citation data.
* Physical sciences will be able to submit a larger range of outputs, including patents, computer algorithms and software. In contrast life scientists will be more restricted, and can only include some kinds of outputs, such as databases or textbooks “exceptionally”.”

Where sub-panels are using citation data, it is being made available to them, matched to outputs by the REF team, using DOIs and other bibliographic data. Institutions will be able to verify these matches and to view the citation counts provided to the panels. Citations made after the submission deadline will continue to be counted and provided to the panels. (See para 133)


Journal articles and conference papers will be accessed by the REF team via the publishers, so will require DOIs in the submission. Other output types can be provided in an electronic format, or a physical copy, or as appropriate evidence, seemingly in that order of preference. We await the submission system software in autumn 2012.

August 22, 2011

Academic Search Engine Optimisation

Writing about web page

I've linked to this article: J. Beel and B Gipp, 'Academic Search Engine Spam and Google Scholar's Resilience Against it', Journal of Electronic Publishing, 13 (3), 2010

The article discusses possibilities for academic search engine optimisation, and what happens when this becomes spamming activity. It has a very neat description of how people go about spamming search engines, and it considers some of the ways that scholars can manipulate academic search engines.

If (like me) you aren't already aware of all these nefarious techniques, then the article will be an eye-opener! The researchers experimented on Google Scholar, using the following approaches:

  • "When creating an article, an author might place invisible text in it. This way, the article later might appear more relevant for certain keyword searches than it actually is."
  • "A researcher could modify his own or someone else’s article and upload it to the Web. Modifications could include the addition of additional references, keywords, or advertisements."
  • "A manipulating researcher could create complete fake papers that cite his or her own articles, to increase rankings, reputation, and visibility."

I find it interesting that three of the sites they used in their manipulation were Mendeley, and ResearchGate. I've blogged about these sites and their ilk before and I've suggested to researchers that having details about their work on these sites would help them raise their profiles on the Web. The article says that only the papers uploaded to were crawled and indexed by Google Scholar... which is kind of good news for the robustness of Google Scholar. And also an indication that for those searching for full text versions of articles on the web that they should go directly to the kinds of sites which might hold them (I do recommend Mendeley), and not only rely on search engines.

After reading this article, I want to know how to go about modifying a journal article after it has been published (including those not your own!), in order to add references. The authors didn't go into detail about how to do that, but you can imagine the havoc it would play with Google Scholar's citation scores if we were all doing it!

I note that the authors described the journal 'Epidemiology' as "a reputable journal by the publisher JSTOR". JSTOR is not a publisher, it is a content aggregator. 'Epidemiology' is published by Wolters Kluwer. It probably takes a librarian to know this, and I wonder whether it is relevant anyway. It could be a deliberate faux pas on the part of the authors, because it kind of illustrates their point that people don't know where content online is coming from! And the authors are right that a journal available on JSTOR is a reputable academic title.

The discussion section of the paper describes that it is a lot of effort to spam academic search engines, that the benefit is not immediate or measurable for academics, and that academics are unlikely to undertake such work because their reputation is so valuable and could be permanently damaged if a search engine were to ban all his/her articles once spamming activity was discovered. The authors raise the matter of whether a journal or conference might engage in search engine spamming: they don't mention academic institutions, but I believe that Universities could also have a motivation.

I do worry about where we draw the line between authors or journals raising their profile in legitimage ways, and where spamming begins. I have long advised authors to include key words in their article titles because of the way journal indexing tools work in ranking results, and this seems to me to make good sense from both a "discovery optimisation" point of view and from an academic accuracy perspective. I also believe that self-citation is a good idea, in that I think false modesty is pointless and potentially damaging, but authors ought to know whether their earlier work is relevant to their latest article or not, and how well such practice is accepted in their own field and therefore be able to self-cite with caution.

The big question about all such profile raising practices, for me, is how far should we go? This article doesn't give the answer, but it describes an awful lot more about what could be done. They also conclude by suggesting: "the academic community needs to decide what actions are appropriate and when academic search engine optimization ends and academic search engine spam begins."

July 04, 2011

How close are you to gaining one more point on your h–index score?

Writing about web page

Last week I found out about a Mozilla Firefox extension which I've linked to from this post. It looks very useful in that it calculates the h-index and various other index scores for the results of any search you perform on Google Scholar, once you've installed it. If you're an author wanting to know your own h-index then the trick is to get your results set to include all of your own works. The advanced analysis feature of the extension allows you to un-tick certain results from the calculations presented in the panel at the top of your results set.

Only 100 results are processed in the analysis, so it isn't going to be a great tool for those with hundreds of publications to their name.

The tool presents not only the h-index but also the g-index, which gives extra weighting to citations from papers which are highly cited themselves, and an e-index which counts "excess citations". You can read more about the e-index on PLoS One article published in 2009 at:

It also presents a “delta-h” and "delta-g" score which looks really useful for authors who want to know how close they are to raising their index scores.

June 06, 2011

Alt–metrics – aren't bibliometrics enough? What are you talking about?!

Writing about web page

The blog entry I've linked to is a really good summary of reasons we need to move on from bibliometrics alone and the changes that are happening in the scholarly publishing processes. A worthwhile read!

May 16, 2011

RePEc rankings

Writing about web page

RePEc is a database for Economics papers and outputs, and citation analysis has been carried out, along with other measures, to provide rankings for items, series, authors, institutions and regions. This page lists these rankings and has links to information about how they were calculated.

It's a really great resource if you're looking to benchmark or compare one entity against another. RePEc's IDEAS offers a selection of different measures and rankings for each type of entity. Item rankings offered include one by simple citation count and others which are citation counts weighted by various factors, eg the impact factor of the citing publication (termed "recursive" by RePEc) or a discount for the age of the citation. A ranking by item downloads within RePEc is also included, and even by abstract viewings.

Series rankings include journals by impact factor, which is calculated based on RePEc's own data and is offered in varieties like those for items, including a weighted or recursive score based on the impact measures of the citing sources. I like the simple feature that it includes journals by their full title, which Web of Science's JCR impact factor listings do not: I am not so familiar with journal titles or their abbreviations that I find the JCR list easy to interpret!

Authors are only ranked if they are registered with RePEc and the top 10% appear on their summary table. Even if you're not in the top 10% it seems worthwhile registering because you can access data about your own publications. This must be so much more meaningful than looking up your own h-index on Web of Knowledge, in isolation. Even though you can look at the profiles of highly cited authors on Web of Knowledge, these won't all be people from within your own field and comparing rankings based on citation scores across different disciplines is not all that meaningful because citation practices vary across the disciplines.

RePEc seems to make it easy for authors to use bibliometrics and other measures in an intelligent and balanced manner and in that sense Economists seem to be very well provided for. I'm not a registered author so I can't see what goes into the updates or reports that authors can see, and I know that citations are scraped and parsed by software so I wonder how accurate the data is in the first place, but presumably as a registered author you can correct any errors.

All this is great for Economists and probably those in related disciplines, but I wonder what effect such specialisation of both data source and methodology is going to have on the use and impact of bibliometric measures? I'm sure others will have compared Web of Knowledge rankings with those of RePEc, so I'll have to investigate further!

February 21, 2011

ResearcherID and the next version

Writing about web page

The next version of WoS is already available in Beta and I have blogged here in the past about some of its new features. This time, I have linked to a video showing how ResearcherID data will be searchable and presented within WoS, in the next version.

If you're a researcher with publications indexed by WoS, I'd highly recommend creating a ResearcherID profile, to claim all your articles as a set, so that others will be able to find out about your work when the new version of WoS goes live.

It will also be useful for you to be able to create citation reports on your own publications.

Note that anyone with a ResearcherID profile could claim any articles for their own, however... so you might or might not trust the profiles that others are setting up for themselves. As always, you need to be evaluative about what you find online!

January 26, 2011

Slides from my bibliometrics presentation last year – now on Slideshare

Just been experimenting with Slideshare again!

Subscribe to this blog by e-mail

Enter your email address:

Delivered by FeedBurner

Find out more...

My recently bookmarked sites

Tweet tweet

Search this blog

Most recent comments

  • Oh yes, I'm writing that too! And tidying up my paperwork, plastering each piece with post–it notes … by Jenny Delasalle on this entry
  • A useful list, thanks Jen. I would add "it's never too early to start writing your handover document… by Emma Cragg on this entry
  • Yes, Google does find things very fast: I use it a lot to find sites that I know and regularly visit… by Jenny Delasalle on this entry
  • Mac OS has the ability to share Safari www bookmarks and other data, securely across multiple machin… by Andrew Marsh on this entry
  • Hi Peter, I see that you practice what you preach… and indeed the point that you make about being … by Jenny Delasalle on this entry

Blog archive

RSS2.0 Atom
Not signed in
Sign in

Powered by BlogBuilder