All entries for Friday 14 August 2009
August 14, 2009
Fun with files that are actually zip archives.
Earlier today someone sent me an image by embedding it in a Microsoft Word .docx file. An illogical (to my mind) and entirely superfluous but all too common practice which often makes the image much more difficult to view because it's had to be resized to fit within the margins. Anyway... I had to send this image on to someone else and so wanted to extact the image to send on it's own. After spending 30 seconds failing to see how to do that in OpenOffice I remembered that .docx files are Open Office XML files, (I think Open Office XML is a horrible disingenuous name but that's another story), which are actually zip archives. So I just unpacked it:
me@mine:/tmp> unzip foo.docx
Archive: foo.docx
inflating: [Content_Types].xml
inflating: _rels/.rels
inflating: word/_rels/document.xml.rels
inflating: word/document.xml
extracting: word/media/image1.png
inflating: word/theme/theme1.xml
inflating: word/settings.xml
inflating: word/webSettings.xml
inflating: docProps/core.xml
inflating: word/styles.xml
inflating: word/fontTable.xml
inflating: docProps/app.xml
And there in bold is the image file. Which is nice. A few minutes later I realised I was being very thick and the way you save an image from within OpenOffice is to right click on it and select 'Save Graphics'. But I like that you can unzip a docx file and get at the innards.
It reminds me of another blog post I was going to write but never got around to. A while back I had an OpenDocument Text file containing numerous images (and text, not just images!) all of which I needed to replace. After manually replacing the first few images I decided that changing them all manually was going to be far too irritating. OpenDocument files also zip archives. So I unpacked the .odt file, copied the new images over the old ones and zipped it all up again. Much faster than repeatedly deleting an image and then inserting a new one.
This sort of thing is also a good demonstration of how those three letters on the end of a filename mean nothing more than someone has decided to put those three letters on the end of the filename. Filename extensions are a useful convention but they don't necessarily reflect the type of file and they're not even required. Which is why, as I've found myself having to explain more than once, renaming a file to put a different set of letters on the end of it's name doesn't make the file a different type of file. Call it what ever you want and it's still the same type of file
me@mine:/tmp> file trans.png
trans.png: PNG image data, 1 x 1, 8-bit/color RGBA, non-interlaced
me@mine:/tmp> mv trans.png trans.jpg
me@mine:/tmp> file trans.jpg
trans.jpg: PNG image data, 1 x 1, 8-bit/color RGBA, non-interlaced
me@mine:/tmp> mv trans.jpg trans.whatever
me@mine:/tmp> file trans.whatever
trans.whatever: PNG image data, 1 x 1, 8-bit/color RGBA, non-interlaced
me@mine:/tmp> mv trans.jpg trans
me@mine:/tmp> file trans
trans: PNG image data, 1 x 1, 8-bit/color RGBA, non-interlaced