All entries for Wednesday 29 November 2006
November 29, 2006
A Short Walk in the Blogistan
The paper for discussion in presentation number 3 is quite generalized in terms of its subject matter. The term ‘Blogistan’ simply refers to the entire collection of blogs that exists. The paper’s abstract states that it will explore three aspects of the Blogistan:
- Its overall scope and size
- Identification of hot topics of discussion and link patterns
- Implications both to blogs and applications such as search

Gathering Data
The authors of the paper used data from websites that rated popularity of other sites. All URLs with containing the word ‘blog’ were taken and then duplicates and obvious non-blog URLs were removed. Also, blogs that had not changed within the preceding few weeks were removed. Blogs were fetched 5 times a day for one month.
In addition to the main body of information contained in the blogs, meta information was also gathered. In the second phase of URL gathering, as opposed to the first, links were also extracted from the blogs to see how blog collections differ from web sites.
The paper uses the terms ‘new URL’ and ‘old URL’. A ‘new URL’ is one that is referenced in any blog under examination at least 24 hours past the start date of data gathering. Other URLs are deemed to be old. Only new URLs were considered for emerging interests. Also, only URLs deemed to be ‘interesting’ were extracted. An ‘interesting URL’ is one that has a relatively large number of references. The number of references to a URL is called multiplicity.
A large amount of the paper is dedicated to data gathering without any conclusions. The authors seem interested in what data should be considered relevant within the Blogistan. Thus, they talk about only examining blogs with high multiplicity and other such things.

Inferences
Of those blogs examined in the first-phase, 33.5% had not been updated in 2 months, perhaps suggesting that a fairly large fraction of current blogs have actually died.
Unlike with web sites, millions of blogs are distributed across not that many hosting domains. The data gathered showed around 180,000 domains, but only 11,870 IP addresses are associated with these, due to aliases. This is a suprisingly small amount considering the huge number of active blogs that exist.
It is noted that: popular websites have more references than blogs; blogs have more references than less popular sites; and blogs have more self references than websites, which is perhaps not unsurprising.
A discussion is given of server issues that may be faced with regards to blogs. The importance of the HTTP Range request is emphasised. This header allows the request to consider just a portion of the data, so that only new data should be retrieved. Range requests should therefore reduce the traffic associated with popular blogs. However, the data collected showed only about 40% of blog servers are able to hangle range requests.
Having discussed the above, the presenters talked about the spamming problems with blogs. In addition to blogs themselves being victims to spamming, phenomenons known as ‘splogs’ have also emerged. These are fake blog pages generated with arbitrary content on them. However, systems are emerging that can detect splogs up to around 90% of the time.
Although the paper itself does not seem to mention this, the presentation mentioned the idea of there being three types of key bloggers: summarizers, agitators and topic-finders. Summarizers link to lots of other blogs and web sites. Agitators are those who create drastic changes in the topics within a thread. A definition of topic finders is hard to locate but they are presumably those who post based entirely upon certain topics of interest.
This paper is quite hard to get a grasp of due to its large number of data references. If anyone has anything to add please let me know, this one’s quite hard to summarize.
A Matter of Life and Death – Modelling Blog Mortality
This presentation, which took place on Friday November 24th, was to do with reasoning behind the death of blogs. Definitions for “death” in this context seem to varied but the term generally refers to a blog that has recieved its last entry and is now dormant, or has been removed by the provider. Initially, reasons behind starting a blog were examined:
- Creative Expression – Some people may start a blog to display their poetry, art or other such emotive creations.
- Journal – A personal record of someone’s experiences. This may be entirely private, which is an option given for each blog entry made.
- Communication Between Friends and Family – In this sense, a blog may be used as a private forum where family photos may be shared for example.
- Make Money – A blog may be used to display product information and act as a retail tool.
- Meet New People – Due to the way that blogs can be interconnected, via blogrolls for example, and the commenting systems therein, it is easy to meet new people and discuss certain topics with them. Looking around Warwick blogs, communities of people have clearly developed purely around this system.
- Income – Blogs can actually be used to generate income. For example through the use of banner ads.

The presenters mentioned LiveJournal as a source for blog data.
Within this paper, an expression is given for the number of active blogs, x, on any given day:
t is the number of active blogs the day before.
m is the chance of a blog not surviving the night.
d = 1-m is the chance of a blog surviving.
n is the number of new blogs created on the given day.
Therefore:
x = dt + n
It was noted by those presenting that the paper is based an a lot of generalisations and the accuracy of any conclusions is questionable.
The next topic presented was that of blog deaths and the reasons behind them. Examples were given to be:
- Lack of time to maintain – The blog owner has too much work to do (perhaps due to getting a new job) or family life takes over due to the birth of a new baby for example.
- Lack of results – No-one read the blog or comments on entries and so the owner does not see any point in carrying on (with the blog).
- Writer’s block – The blog owner runs out of things to write about and decides to give up.
- Rhythm break – If a writer has a certain rhythm to his post, perhaps because he always posts at a certain time of day, the readers of his blog will be accustomed to checking after this time to read any posts. If the posting rhythm is interrupted, the disruption to the readers may lead to less visitors. As a results of this, the owner may wish to stop.
- Thrill is gone – Some blogs are started for the novelty value and die once this wears off.
- Change of interest – Perhaps the owner becomes interested in another topic which does not relate to that of his blog. This change of interest may lead to the abandonment of the current blog.
- Unpleasant comments – As illustrated in the presentation, some bloggers get abused and spammed via their blogs and may wish to stop as a result.
- Other reasons – Priorities changing, people moving to other mediums, the loss of a username and password etc. This blog will probably die quite quickly after the presentations are over.
Please wait - comments are loading
Loading…