No more posts here: This blog has moved!
I have today moved this blog to WordPress: the new URL is http://statgeek.wordpress.com .
There will not be any further posts made here, they will all be on the WordPress blog pages linked above.
I have today moved this blog to WordPress: the new URL is http://statgeek.wordpress.com .
There will not be any further posts made here, they will all be on the WordPress blog pages linked above.
We’re hosting the international useR! conference at Warwick this summer, and I thought it might be interesting to try to get some data on how the use of R is growing. I decided to look at scholarly citations to R, mainly because I know where to find the relevant information.
I have access to the ISI Web of Knowledge, as well as to Google Scholar. The data below comes from the ISI Web of Knowledge database, which counts (mainly?) citations found in academic journals.
Background: How R is citedThe “2003” part of the citation advice has changed with each passing year; for example when R 1.9.1 was released (in June 2004) it was updated to “2004”.
ISI Web of Knowledge: Getting the dataSo here is what I did: I looked up published papers in the ISI index which I knew would cite R correctly. [This was easy; for example my friend Achim Zeileis has published many papers of this kind, so a lot of the results were delivered through a search for his name as an author.] For each such paper, the citation of interest would appear in its references. I then asked the Web of Knowledge search engine for all other papers which cited the same source, with the resulting counts tabulated by year of publication.
It seems that the ISI database aims to associate a unique identifier with each cited item, including items that are not themselves indexed as journal articles in the database. This is what made the approach described above possible.
There’s a hitch, though! It seems that, for some cited items, more than one identifier gets used. Thus it is hard to be sure that the counts below include all of the citations to R: indeed, as I mention further below, I am pretty sure that my search will have missed some citations to R, where the identifier assigned by ISI was not their “normal” one. (This probably seems a bit cryptic, but should become clearer from the table below.)
Citation counts
As extracted from the ISI Web of Knowledge on 25 June 2011:
ISI identifier ↓ | 1998 | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | Total |
IHAKA R J COMPUTATIONAL GRAP 5 : 299 1996 |
5 | 15 | 18 | 43 | 131 | 290 | 472 | 528 | 435 | 419 | 449 | 378 | 396 | 3579 |
*R DEV COR TEAM R LANG ENV STAT COMP : 2003 |
39 | 123 | 91 | 57 | 39 | 25 | 14 | 388 | ||||||
*R DEV COR TEAM R LANG ENV STAT COMP : 2004 |
16 | 235 | 421 | 327 | 289 | 187 | 126 | 1601 | ||||||
*R DEV COR TEAM R LANG ENV STAT COMP : 2005 |
42 | 397 | 531 | 511 | 445 | 366 | 2292 | |||||||
*R DEV COR TEAM LANG ENV STAT COMP : 2005 |
5 | 39 | 75 | 41 | 25 | 10 | 195 | |||||||
*R DEV COR TEAM R LANG ENV STAT COMP : 2006 |
55 | 438 | 849 | 656 | 461 | 2459 | ||||||||
*R DEV COR TEAM R LANG ENV STAT COMP : 2007 |
92 | 714 | 962 | 733 | 2501 | |||||||||
*R DEV COR TEAM R LANG ENV STAT COMP : 2008 |
208 | 1402 | 1906 | 3516 | ||||||||||
*R DEV COR TEAM LANG ENV STAT COMP : 2008 |
7 | 21 | 44 | 72 | ||||||||||
*R DEV COR TEAM R LANG ENV STAT COMP : 2009 |
172 | 1363 | 1535 | |||||||||||
*R DEV COR TEAM R LANG ENV STAT COMP : 2010 |
205 | 205 | ||||||||||||
*R DEV COR TEAM R LANG ENV STAT COMP : |
1 | 12 | 14 | 25 | 36 | 81 | 93 | 262 | ||||||
Total | 5 | 15 | 18 | 43 | 131 | 290 | 528 | 945 | 1452 | 1964 | 3143 | 4354 | 5717 | 18605 |
For the “R Development Core Team (year)” citations, the peak appears about 2 years after the year concerned. This presumably reflects journal review and backlog times.
There are almost certainly some ISI identifiers missing from the above table (and, as a result, almost certainly some citations not yet counted by me). For example, the number of citations found above to R Development Core Team (2009) is lower than might be expected given the general rate of growth that is evident in the table: there is probably at least one other identifier by which such citations are labelled in the ISI database (I just haven’t found it/them yet!). If anyone reading this can help with finding the “missing” identifiers and associated citation counts, I would be grateful.
The graph below shows the citations found within each year since 1998.
© David Firth, June 2011
To cite this entry:
Firth, D (2011). R and citations. Weblog entry, University of Warwick, UK; at URL http://blogs.warwick.ac.uk/davidfirth/entry/r_and_citations/.
The graph shows the citations found within each year since 1998.
[Click on the graph to view it at a larger size.]
Citations to Ihaka and Gentleman (1996) and to R Core Development Team (any year) are distinguished in the graph, and the total count of the two kinds of citation is also shown.
I have to go to London tomorrow, so I thought I’d check how much the price of my normal rail ticket has increased in the new year. I didn’t ask for the first class fare, but they told me it anyway. Having picked myself up off the floor, I’m a bit curious about that last digit. (Click on the image to see it more clearly.)
This article has now moved to https://statgeek.wordpress.com/2010/02/07/rae-how .
This article has now moved to https://statgeek.wordpress.com/2009/11/25/rae-2008-assessed .
This blog’s title Let’s Look at the Figures is inspired by the book of the same name written by Bartholomew and Bassett (Penguin, 1971) — highly recommended if you can find a copy.
I had the pleasure of being a member of one of the assessment sub-panels for RAE 2008, the “Research Assessment Exercise” for UK universities. I’m going to start things off here by posting a few bits and pieces of simple data analysis related to that. (It should be stressed that all of the data used here are in the public domain; none of what appears here is privileged information arising from my RAE sub-panel membership.)
After that, who knows what else?