## November 28, 2007

### Missing Data on 25 Million People

Since I first heard about it, I've been wondering just how much data on 25,000,000 people can fit on two "discs". I don't know for certain what type of "disc" they are, but for the purposes of the maths I'm going to assume they were single-layer DVDs.

Taking the capacity of a DVD as 4.7GB, this makes about 403 bytes per person. This might not sound much, but for reference the following block of text (labels included) is only 224 bytes:

Name: The University of Warwick
Address: University of Warwick, Coventry, CV4 7AL
Tel: +44 (0)24 7652 3523
Fax: +44 (0)24 7646 1606
Established: 1964
Dummy bank details: XXXXXXXX XX-XX-XX
URL: http://www.warwick.ac.uk/

So the amount of data on each person would be about double that.

However if the Government isn't keeping up with the times, then 2 CDs at - let's say - 700MB each would give only 58 bytes per person, which is about this much:

The University of Warwick, Coventry, CV4 7AL 024 7652 3523

That's assuming the data wasn't compressed, of course, which since it was unencrypted wouldn't be unreasonable to assume.

The second block (including spaces) should be exactly 58 bytes. Since I ran out of ideas for real data for the first block, here's a string of Xs and spaces that is exactly 403 bytes:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXX

So whatever details about you you can fit in that number of characters (403 including spaces, unsurprisingly) may have been misplaced by the Government in an easily-readable form. If they had any business rivals I'd consider taking my custom elsewhere...

### 4 comments by 1 or more people

1. #### Chris Doidge

From what I remember, they were saying they were ‘standard CDs’.

28 Nov 2007, 22:57

2. Fair enough. The BBC article I looked at and I think when Watchdog mentioned it tonight only said “discs”. If it was CDs then that’s hardly any information at all; only name and address, surely. They could get that much from phone books…

28 Nov 2007, 23:02

3. You could quite easily a lot of information about a person into a very small space, if you use it efficiently.

AB123456C12345612345678123AB1C2DE01234567890

This is a (fake) NI number, a bank sort code, a bank account number, an house number with postcode, and a phone number. 45 bytes, unencrypted, and when you put it in line with lots of others with similar code it can quickly make a lot of sense to somebody with the wrong intentions. Granted I don’t know this person’s name from this, but a lot of those I could plunder from the phone book. And before you know it, I know way too much about way too many people.

29 Nov 2007, 13:09

4. True. And if it was for/from a database, it would probably be separated by commas, which still leaves a few bytes spare and would be even easier to interpret by just reading.

29 Nov 2007, 17:50

