#
All entries for July 2009

## July 23, 2009

### CBD Rules

And there I was, on the 21st of July, relaxing, enjoying my summer, having absolutely no plans for the next few days... when suddenly it dawned on me. Tomorrow was the 22nd of July, 2009, also written as 22.07.09. My Co-Birthdate.

Now, I only have 16 CBDs left (including this one, and assuming I don't live to a hundred), so the plan was to make something special out of every one of them. Furthermore, the 22nd of July is Pi Approximation Day, because 22/7 is approximately equal to the mathematical constant π. Unfortunately however, one day was not enough to organise something big, most people I know had gone on holidays, and the nearest supermarket would be closed the following day. Another CBD wasted. I ended up having a rather nice day: I went cycling, I bought a downloadable game called 'World of Goo', and in the evening I went to the cinema and saw 'Harry Potter and the Half-Blood Prince'. But it was a patheticly uneventful day compared to how epic it should have been.

Musings of my 12th CBD:

- World of Goo is a nice game. I have a felling it's going to take too short to finish it, but very long to reach 100% completion. It's basically a puzzle game in which the goal is to use balls of goo to construct structures in order to reach a certain pipe that sucks in the remaining goo balls. The gameplay is surprisingly varied and creative; the humour is excellent. The game was recommended to me by a friend. Thank you friend.

- The 6th Harry Potter movie is in my opinion the worst of the Harry Potter movies so far. It's not an unpleasant experience; the others were just better. The special effects are, as always, amazing, but the story is sloppily presented. Then again, the 6th book was not my favourite. And besides, it's almost not a question of the movie being good or not: if you've read the entire series and seen all the movies so far, you don't have much choice but to go and see it.

- I don't know what you do on a CBD. I also don't know what you do on Pi Approximation Day, but given that you eat pies on Pi Day, I suppose you eat cakes (After all, cakes are good approximations to pies, no?). As I am probably the only person in the world celebrating CBDs, it must be up to me to determine how a true Co-Birthdate must be held. So here goes...

*On a CBD, the following rules must be obeyed:*

*#1: A present must be bought, by me, to me.
#2: Friends must be invited.
#3: A cake must be bought or made.
#4: A film must be watched.
#5: Cider must be drunk.
#6: Sleep must not be fallen into before 3am.
#7: The passive tense must be used excessively.*

There we go. Now all I have to do is wait until the 29th July, 2020. I'll make sure to have something prepared for next time.

## July 17, 2009

### Why is no–one watching North Korea?

I'm not usually very fussed about what's happening in the world, and I never thought I'd be writing about global news on my blog. But something has been bothering me.

Why does no-one seem to worry about North Korea?

Let's recapitulate:

- 29th January: North Korea scraps all political and military agreements it has with South Korea

- 4th April: North Korea successfully launches a long range rocket

- 25th May: North Korea conducts its second nuclear test (the first one being in 2006). This is also the first nuclear test to be confirmed by other nations.

- 26th May: North Korea fires two short range missiles.

- 27th May: North Korea withdraws from the armistice that ended the Korean War

- 15th June: North Korea announces that if it is provoked by the US or its allies, it will respond with a "thousand-fold" military retaliation.

I don't think it takes a mathematician to detect a pattern here. The question is: what's next? If I were a journalist, I think I'd be writing tonnes of articles about possible the outcomes of this crisis, as I think it ought to be called. This is seriously frightening stuff. The first thing I check when I open BBC's news website, is the Asia-Pacific section. However, everyone around me seems to be discussing all kinds of other things, such as the Tour de France or Michael Jackson. I fail to understand.

And while we're talking news, let me just insert a few words about the financial crisis. First of all, the worst isn't over yet. I don't know where people get that from, but it's BS. Secondly, and more importantly, stop calling it a "recession". You know what it is.

## July 06, 2009

### The Szekeres Chronicles: The Long Tail

Follow-up to The Szekeres Chronicles: Search Behaviour from The Missing N

So, what were we talking about last time? Oh yes, search behaviour. In the last post, we saw that one- and two-word searches are becoming less frequent, while longer searches are "gaining popularity". This is all very good, but why is this so important to some people, who seem to be collecting this data, plotting it, making tables about it,and all kinds of other numerical acrobatics? Well guess what, it turns out there's money involved. Imagine you own a website that somehow tries to make a profit, maybe by selling things. Then you want to attract as many visitors as possible - since more visitors presumably means greater profit - but this is not always easy, given that a lot of people discover your site simply by stumbling upon it after searching for something or other on a search engine. By using programs like Google Analytics, however, you can get an idea of which keywords bring the most traffic to your site, something which can give you crucial insight about how to adapt your site, to attract even more potential customers. Plus, there's something with publicity and advertisement involved, if you link your site to certain keywords or something. Go figure.

If we then plot each keyword against the number of "hits" that keyword gets (i.e. the number of visitors who visited your site after searching for that keyword), we might get something like this:

What this means is that there are a few keywords that attract most of the visitors (dark green), while the majority of keywords give very few hits each (light green). This kind of distribution, known as the "Pareto Distribution", pops up in all kinds of places - for example, the money made by an airline company, the wealth of a country's population, or the size of meteorites. Which is why I haven't labeled the axes.

Closely linked to the Pareto distribution is something called the "Pareto Principle" (aka the "80-20 Rule" aka the "law of the vital few"). It states that "roughly 80% of the effects come from 20% of the causes", as a rule of thumb. For instance, in most countries, about 20% of the population owns 80% of the country's wealth, and the same is true for the world population. In most businesses, 20% of the clients account for 80% of the sales. Microsoft is said to have noticed that usually, 20% of all the bugs cause 80% of all the problems. And one could easily imagine other applications: maybe 20% of the videos on YouTube get 80% of all the views, and maybe 20% of your actions cause 80% of your carbon foot print.

In our case, we can assume that the 20% most popular keywords generate 80% of the traffic our fictional site. And this is where our discovery from the previous post gets interesting. See, if more and more people are starting to use longer keywords, then the one- and two-word searches that used to generate the bulk of all incoming traffic, are no longer as important. Graphically, what is happening is that the head is turning smaller, while the tail is becoming bigger and longer (see above graph). And your website will probably have to adapt to this change, if so it may even benefit from it.

This change is not unique this particular situation, it is also a general business phenomenon that has appropriately been labeled "The Long Tail". I said earlier that in most businesses, 20% of the clients account for 80% of the sales; likewise it has traditionally been true a few popular items has accounted for most of the revenue. It turns out that this is slowly changing, especially in the online world: the focus is slowly shifting from the head to the tail. This is why Amazon, eBay, online bookstores and other such retailers look the way they do. Popular items are still being sold, but the market is now flooded with so-called "non-hit items", items that are not very popular and that you wouldn't find in a traditional you-have-to-leave-your-house-to-get-there kind of shop. And a considerable amount of sales are being made from this long tail, so this new business model that seems to work.

Chris Anderson, who coined the term "The Long Tail", has written a few books on this topic. They are, naturally, available on Amazon.

Also, if my posts have whet anyone's apetite for web analytics, a certain Avinash Kaushik maintains a blog named Occam's Razor, in which he writes about this subject in a clear and engaging way.

Finally, all this has reminded me of a short anecdote of mine: During a Quantitative Economics seminars (a dozen students in a small classroom), our teacher was telling us about Complementary Goods. "Perfect complementary goods are goods that have to be consumed together. They aren't worth anything alone; good A is nothing without good B and vice versa. A classic example is left shoe and right shoe - you need both of them together. What do you do if you have a right shoe only?"

The question was of course rhetorical but I couldn't help myself, and said: "You sell it on eBay..."

## July 02, 2009

### Covered Clocks

The Warwick Arts Centre has these four clocks:

(Picture taken from Dilip Mutum's blog)

My initial reaction upon seeing them for the first time was a feeling of bemusement. The function of a clock is to tell the time, so what is the point in hiding half of it? Also, why are the four clocks clustered in one place, rather than spread out nicely in the entire Arts Centre? I eventually realised: it is a work of art. And a clever one, in my opinion. One never needs the entire disc of the clock to be visible in order to tell the time; it is enough to know the position of the two hands. And by placing the clocks together in this fashion, the "artist" made sure that most of the time, it would be enough to look at one clock.

"Most of the time" is the key phrase here. A little thought reveals that sometimes, no single clock shows both hands. If one is only concerned with the hour and the minutes, it is also obvious that suffices to look at two clocks to see both hands. This leads to the question: *how often does does one have to look at two clocks to know the time?*

As a mathematician, I felt it was my duty to solve this. It turns out that the probability that the probability that two clocks are needed to tell the time, is 0.25. The following argument should make it clear:

A clock is divided into four quarters, or "zones". On every clock, two of these are visible, and two are hidden. By "adjacent zones" I mean zones that are next to each other; by "opposite zones" I mean zones that are not.

For any given hour, the hour-hand is in one of the four zones. If the minute-hand is in the same zone, or in one of the two adjacent zones, there will be a clock showing both hands. Only if the minute-hand is in the opposite zone, then no clock will show both hands. So during every hour, there will be a 15 minute time lapse (when the minute-hand is in the "opposite" zone) during which one has to look at two different clocks. It follows that 1/4 of the time, two clocks are needed. Hence the probability of 0.25.

We have made a few implicit assumptions along the way:

- One can only tell the time by looking at the long ends of the two hands. Seeing a bit of the short end of a the hour-hand on a clock on whcih the hour-hand is hidden by the black area, is cheating.

- A hand is never in two zones simultaneously, e.g. a hand cannot be in zone A and zone B in the same time. It will always tend a little to one side. Strictly speaking, this isn't totally true; when the time is, for example,
**exactly**3 o'clock, the hour-hand will be in zone A and B, while the minute hand will be in zone A and D. However, just one second later, the hour-hand is technically in zone A, and the minute-hand in zone B. My point is, the probabilities of the hands being in two zones are so small (1/3600 for the hour-hand and 1/60 for the minute hand) that we can neglect them. Also, if we view time as something continuous, these probabilities do in fact become zero, but I won't go into that, since it would only serve to confuse everyone.

- The time on the clocks is uniformly random. In other words, we are equally likely to view the clocks at any time of the day. This isn't realistic. The probability that we are looking at these clocks in the Arts Centre at 3:14 in the middle of the night, is close to nil. But heck, we're setting up an abstract model to figure out this probability, and assuming the time is uniformly random is the most sensible thing to do.

- The hand showing the seconds isn't needed to figure out what the time is. This assumption is reasonable, and it makes things easier. But it isn't necessary. If one includes the second-hand, we get a new situation which can also be analysed. In this case, the probability that one has to look at (i) one clock, is 0.4375, (ii) two clocks, is 0.5625. Can anyone show this?

But then, this year, something happened. The names of four important cities (New York, Moscow, Beijing and Coventry) were painted underneath the clock, and the time on each clock was then adjusted accordingly.

I'm sure the individual who made this happen felt that he or she had just shown the signs of a pure genius, and turned 3 redundant clocks into something more practical and business-like. In reality, he or she simply messed up [EDIT: I was wrong about this; see Sarah's comment for more details]. Before the names were painted, telling time was never too problematic, as we've seen. Now, if the hour-hand is not visible in the Coventry clock, one must know the time difference between Coventry and, say, Beijing. Even worse, there are times when the hour hand is visible only on a single clock. Indeed, the visibility of the our-hand goes as follows:

12.00-3.00: Visible only on the Beijing clock

3.00-5.00: Beijing, Coventry

5.00-6.00: New York, Coventry

6.00-9.00: Moscow, New York, Coventry

9.00-11.00: Moscow, New York

11.00-12.00: Beijing, Moscow

(This is during summer time, for the record)

This new layout leads to some new interesting questions:

- What is the new probability that one has to look at two clocks in order to see both hour-hand and minute-hand? What if we include second-hand?
- If time zones are chosen at random, what is the probability that, sometime during the day, no clocks show the hour-hand? What is the probability that one has to look at two clocks now?
- If the each clocks is set at random to
**any**time of the day (so that minutes and seconds don't necessarily match each other on the four clocks), what is the probability that one has to look at two clocks? What if we include the second-hand? - What if, instead of three hands, we have four? To generalise even more, what happens with N hands?

I leave these as an exercise for the reader.

***

Someone must have realised that people were having trouble deriving the actual time from the four clocks. So, as you can see above, each clock was set back to English time. All the clocks say 15:54 in the picture. The funny part is that they have to leave the names of the cities, unless they want to repaint the entire wall white. As a consequence, most people (or at least the ones I've asked) haven't noticed that the clock are all showing the same time.

I guess nowadays most people would just take their mobile phones out to figure out what time it is. I'm old-fashoned in the sense that I still wear a wristwatch.