All entries for Friday 14 August 2009

August 14, 2009

Fun with files that are actually zip archives.

Earlier today someone sent me an image by embedding it in a Microsoft Word .docx file. An illogical (to my mind) and entirely superfluous but all too common practice which often makes the image much more difficult to view because it's had to be resized to fit within the margins. Anyway... I had to send this image on to someone else and so wanted to extact the image to send on it's own. After spending 30 seconds failing to see how to do that in OpenOffice I remembered that .docx files are Open Office XML files, (I think Open Office XML is a horrible disingenuous name but that's another story), which are actually zip archives. So I just unpacked it:

me@mine:/tmp> unzip foo.docx
Archive:  foo.docx
  inflating: [Content_Types].xml     
  inflating: _rels/.rels             
  inflating: word/_rels/document.xml.rels  
  inflating: word/document.xml       
 extracting: word/media/image1.png   
  inflating: word/theme/theme1.xml   
  inflating: word/settings.xml       
  inflating: word/webSettings.xml    
  inflating: docProps/core.xml       
  inflating: word/styles.xml         
  inflating: word/fontTable.xml      
  inflating: docProps/app.xml

And there in bold is the image file. Which is nice. A few minutes later I realised I was being very thick and the way you save an image from within OpenOffice is to right click on it and select 'Save Graphics'. But I like that you can unzip a docx file and get at the innards.

It reminds me of another blog post I was going to write but never got around to. A while back I had an OpenDocument Text file containing numerous images (and text, not just images!) all of which I needed to replace. After manually replacing the first few images I decided that changing them all manually was going to be far too irritating. OpenDocument files also zip archives. So I unpacked the .odt file, copied the new images over the old ones and zipped it all up again. Much faster than repeatedly deleting an image and then inserting a new one.

This sort of thing is also a good demonstration of how those three letters on the end of a filename mean nothing more than someone has decided to put those three letters on the end of the filename. Filename extensions are a useful convention but they don't necessarily reflect the type of file and they're not even required. Which is why, as I've found myself having to explain more than once, renaming a file to put a different set of letters on the end of it's name doesn't make the file a different type of file. Call it what ever you want and it's still the same type of file

me@mine:/tmp> file trans.png 
trans.png: PNG image data, 1 x 1, 8-bit/color RGBA, non-interlaced
me@mine:/tmp> mv trans.png trans.jpg
me@mine:/tmp> file trans.jpg
trans.jpg: PNG image data, 1 x 1, 8-bit/color RGBA, non-interlaced
me@mine:/tmp> mv trans.jpg trans.whatever
me@mine:/tmp> file trans.whatever
trans.whatever: PNG image data, 1 x 1, 8-bit/color RGBA, non-interlaced
me@mine:/tmp> mv trans.jpg trans
me@mine:/tmp> file trans 
trans: PNG image data, 1 x 1, 8-bit/color RGBA, non-interlaced

Mike Willis : 14 Aug 2009 13:25 | Comments (0) | Close comments | Report a problem