March 29, 2005

Quantities of University web pages

*WARNING - UNSCIENTIFIC TESTING BELOW*

I was just wondering how much content there is publically viewable and searchable by Google there was out there for different Universities. So, taking the top 20 Universities according to the Times, I did the following simple searches.

Times ranking:
Rank. University: search term = number of results

  1. Oxford: site:ox.ac.uk = 1,040,000 results
  2. Cambridge: site:cam.ac.uk = 1,430,000
  3. Imperial College: site:imperial.ac.uk = 124,000
  4. LSE: site:lse.ac.uk = 316,000
  5. Warwick: site:warwick.ac.uk = 990,000
  6. UCL: site:ucl.ac.uk = 611,000
  7. York: site:york.ac.uk = 533,000
  8. Durham: site:dur.ac.uk = 532,000
  9. St Andrews: site:st-andrews.ac.uk = 175,000
  10. Loughborough: site:lboro.ac.uk = 199,000
  11. Bath: site:bath.ac.uk = 296,000
  12. Bristol: site:bris.ac.uk = 457,000
  13. Edinburgh: site:site:ed.ac.uk = 682,000
  14. Nottingham: site:nottingham.ac.uk = 370,000
  15. Royal Holloway: site:rhul.ac.uk = 91,100
  16. Kings College: site:kcl.ac.uk = 279,000
  17. Machester: site:man.ac.uk = 559,000
  18. Newcastle: site:ncl.ac.uk = 526,000
  19. SOAS: site:soas.ac.uk = 51,000
  20. Birmingham: site:bham.ac.uk = 566,000

Ranking by number of web pages:
Rank. (Times ranking) University: search term = number of results

  1. (2) Cambridge: site:cam.ac.uk = 1,430,000 results
  2. (1) Oxford: site:ox.ac.uk = 1,040,000
  3. (5) Warwick: site:warwick.ac.uk = 990,000
  4. (13) Edinburgh: site:site:ed.ac.uk = 682,000
  5. (6) UCL: site:ucl.ac.uk = 611,000
  6. (20) Birmingham: site:bham.ac.uk = 566,000
  7. (17) Machester: site:man.ac.uk = 559,000
  8. (7) York: site:york.ac.uk = 533,000
  9. (18) Newcastle: site:ncl.ac.uk = 526,000
  10. (8) Durham: site:dur.ac.uk = 532,000
  11. (12) Bristol: site:bris.ac.uk = 457,000
  12. (14) Nottingham: site:nottingham.ac.uk = 370,000
  13. (4) LSE: site:lse.ac.uk = 316,000
  14. (11) Bath: site:bath.ac.uk = 296,000
  15. (16) Kings College: site:kcl.ac.uk = 279,000
  16. (10) Loughborough: site:lboro.ac.uk = 199,000
  17. (9) St Andrews: site:st-andrews.ac.uk = 175,000
  18. (3) Imperial College: site:imperial.ac.uk = 124,000
  19. (15) Royal Holloway: site:rhul.ac.uk = 91,100
  20. (19) SOAS: site:soas.ac.uk = 51,000

Is there a correlation? Not particularly. However, it is quite a mix. To make this a more accurate study I guess you'd need to take a lot of other factors into account (which I don't have the time to do), however, they make for some interesting numbers.

There seems to be a real gap between the top 3 and everyone else. I'd love to know quite why these numbers are so huge (I personally can't believe these numbers, I think google is lying), but if the ratios are correct, then it is still useful. I know we have SiteBuilder and Warwick Blogs to thank for a huge proportion of our pages, but I know we don't have a million pages.


- 5 comments by 3 or more people Not publicly viewable

  1. Mavan

    How do you know that? Presumably your search catches everyone's private pages as well?

    29 Mar 2005, 12:23

  2. Mathew Mannion

    I think we probably do have a million pages with all the different blog pages (and dates etc)

    29 Mar 2005, 12:46

  3. John Dale

    • 49,900 pages on www.warwick.ac.uk
    • 120,000 pages on www2.warwick.ac.uk
    • 280,000 pages on blogs.warwick.ac.uk (though this is indexing the same content many times; we should use the robots metatag to fix this)
    • 18,800 pages on forums.warwick.ac.uk
    • 39,400 pages on dcs.warwick.ac.uk
    • 28,000 pages on maths.warwick.ac.uk
    • 8,800 pages on eng.warwick.ac.uk

    There must be more servers I've forgotten or don't know about. I can easily believe that the total approaches a million pages.

    29 Mar 2005, 13:00

  4. Huh, I'd not bothered to add it up like that, I guess it really is possible. Blimey.

    As for how do we know…this is just a Google search and would not include private pages, we could well have 10's of 1000's more of those.

    29 Mar 2005, 13:38

  5. tadalafil

    search like that will not give the actual results as google doesn't index all the pages all the time. You might try yahoo as well to do a comparison. Even thoug yahoo sux, u wil get an idea

    25 May 2005, 22:43


Add a comment

You are not allowed to comment on this entry as it has restricted commenting permissions.

March 2005

Mo Tu We Th Fr Sa Su
Feb |  Today  | Apr
   1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31         

Tags

Search this blog

Most recent comments

  • One thing that was glossed over is that if you use Spring, there is a filter you can put in your XML… by Mathew Mannion on this entry
  • You are my hero. by Mathew Mannion on this entry
  • And may all your chickens come home to roost – in a nice fluffy organic, non–supermarket farmed kind… by Julie Moreton on this entry
  • Good luck I hope that you enjoy the new job! by on this entry
  • Good luck Kieran. :) by on this entry

Galleries

Not signed in
Sign in

Powered by BlogBuilder
© MMXXII