All entries for Thursday 30 November 2006
November 30, 2006
Mapping the Blogosphere in America
This presentation was largely about finding geographical information related to blogs and the topics they cover. The paper that was discussed is the beginning of a long-term research plan to investigate localized attitudes, political agendas, urban mentalities and other such social information in American cities. In order to do this it is necessary to extract geographical information about blogs to see where the authors are based. So how can this be done? Some methods were presented to the class:
- The registrants address may be located in the hosting domain’s registry.
- The location of the owner may be given in their profile, which can be created along with a blog.
- It may be that the owner has registered their blog with a some local host or website. For example, www.nycblogger.com is a blog host for people in the city of New York.
- The owner of the blog may have published a link to their CV or Biography which contains their address.
- There may also be links given to local weather information, schools or other geographically relevant sites.
It was mentioned that the authors of the paper are working on automatic methods for extracting geographical data from blogs. The methods the algorithm uses are as follows:
- Find GeoURL metadata if it exists – This is when data about the location has been embedded into the HTML of a site but is not visible.
- Whois query
- Profile Information – As aforementioned, this involves checking the registrant’s profile for geographical information.
- Blog Chalking – This is a way of categorising blogs based on interest, regional information and other such things. It is also done to register blogs with major search engines. Described as a ‘tattoo for your blog’ it can be used to find the geographical location of a blog owner.
- Text on index page – The text on the index page may contain references to local areas and landmarks.

The next issue raised is that of standardizing geolocation data. The information gathered from blogs may vary drastically in accuracy, with some down to a 9-digit zip code and others just a city name. The proposition made is to use the first 3 digits of the zip code to identify geolocations. The zip codes work in the following way: the first digit represents a general area of the US with 0 being the north-east and nine being the west. Subsequent digits divide the area right down to the nearest local post office. The first three digits will divide the area down into metropolitan areas such as Los Angeles or a cluster of small towns and villages.
The results obtained show, as expected, that the number of bloggers is proportional to population and areas of high socio-economic status. The paper does not give a relationship between topics and areas since this particular document is just a starting point for a long-term investigation.
So what are the limitations of the information found in this paper? Some were presented as below.
- People give inaccurate or false information in their profiles – This may not be deliberate. For example, I may say that I live in Birmingham when in fact I live about 20 minutes away.
- Using 3-digit codes overstates the number of bloggers in metropolitan areas.
- 3-digit codes group small towns into one unit although these areas may have no social cohesion whatsoever.
The suggestion is therefore made to divide the country into areas based on socio-economic profile. Thus, areas might be separated based on average household income.
The presentation then went on to discuss topics presented in the paper in more detail. This began with the notion of a Geographic Information System (GIS) which is a system for storing and displaying geographical information. The two issues associated with gathering information for a GIS were given as:
- Geoparsing – Identifying geographical information from text.
- Geocoding – Converting this information in geographical coordinates.
Some uses of finding the geolocations of bloggers were given to be:
- Finding sociological and political trends – Mapping the ‘buzz’ of what is going on in certain areas.
- For advertising – An example was given of a comparison between locations of a specific restaurant chain and locations of bloggers who mention that chain in their blogs. Thus the marketing department may wish to launch more advertising campaigns in those areas that are not discussing their restaurant.
Other mapping methods were then discussed, not necessarily in relation to geographical locations. One of these was the mapping of hyperlinks in hyperbolic space. This allows the investigation of outgoing and incoming links to websites and blogs via a visual representation.
Self-organising maps were also discussed. These maps, unlike those in hyperbolic space, are 2 dimensional and are built based on common links between blogs. This allows communities to be observed by looking at groups of blogs that link together.
Please wait - comments are loading
Loading…