Quarter Life Crisis
The world according to Sven-S. Porst
« 1000km • Main • amazon MP3 »
891 words on Mac OS X
Mike recently ‘twittered’ about visualising the dependences among UTIs and eventually came up with the pretty hand-made graph based on Apple’s documentation presented in this blog post.
I have been amazed by UTIs since first seeing them as they look like a reasonable way to unify the legacy (Mac File and Creator type), internet (MIME-Type) and stupid (file name extension) ways of identifying file types. UTIs maintain backwards compatibility while opening the way for future development [I totally want to see integration with Unix’ file
command. But I fear we’ll have to wait for the day when technology has advanced enough to provide all the power of a PDP-11 in a pocket size package before we can expect to see that.] and a wider use [e.g. non-file data-types, like the clipboard]. Thus I consider them and their proliferation across the Mac APIs a good thing.
The basic idea for UTIs is quite simple: software can specify them in ‘reverse domain name notation’ (e.g.: public.text, public.xml, com.apple.quicktime-movie), it can specify which other methods of file type identification a UTI is equivalent to and it can assign other UTIs which the new UTI conforms to. Those conformances give a system of inherited types and you’ll find that public.xml → public.text → public.data. Graphing those relations is the task this text is about.
The first step towards achieving that is collecting the data making up that graph. Apple’s documentation only lists a few examples and the biggest bunch of ‘standard’ UTIs is listed in the ‘CoreTypes’ bundle in MacOS X’s system folder.
In principle Launch Services should know about all UTIs on the machine [considering the output of lsregister -dump
] but as we are in hacking mode here and I’m not familiar with Launch Services, I went for the dumb option: collect all Info.plist files on the system in which the UTIs are stored. Spotlight fails me for that, locate
gives quick and incomplete results (lacking non public readable folders) and this bit of command line junk runs slowly but yields what seems to be a complete list that I stored in a file:
sudo find -s / | grep Info.plist | sed 's/\(.*\)/"\1"/g' | xargs -L 5 grep --files-with-matches TypeDeclarations
Said file is then used by a PyObjC script which reads the property lists, collects the UTIs and the relations between them and creates a GraphViz file. A colleague recently introduced me to GraphViz which converts simple text input algorithmically into graphs. Those aren’t perfectly pretty and can be a bit hard to control, but with more than a dozen items, it seems to be a reasonable way of avoiding the tedium of manual graph layout.
I am then running the created GraphViz file through the twopi
tool using
twopi -Nfontname=Helvetica -Tpdf /tmp/UTIGraph.gv > /tmp/UTIs.pdf
which creates a reasonable layout.
Of course I had to start tweaking things a little, soon after seeing the graph the first time. As loads of UTIs conform to public.data or public.bundle or some sort of zip archive, that was making the graph quite messy. Hence I decided to clean things up a bit by suppressing those conformances and expressing them by the node’s shape or colour. I also removed all the single nodes and placed them separatedly. (An attempt to group things better seems to fail with the twopi
tool).
As I was using PyObjC anyway, I figured I could just use NSWorkspace
to grab the relevant icons. This resulted in a very heavy PDF file which creates all sort of scrolling FAIL in X.5’s Preview along with a less than impressive speed. (In fact, one version of the file even stumped Illustrator when opening it in there.) So slightly more control was needed, to scale down all icons to a 128×128 size and to avoid using the generic document icon for all files without custom icons.
While I’m at this: can anybody tell me a good way to compare NSImages? The only way I could find was using iEqualToData:
on a TIFFRepresentation
. But I couldn’t find a way to create a working hash of the image data. [In addition to that, the image nerds among you may enjoy comparing the PNG compression ratios achieved by Cocoa in different versions of OS X. Personally, I can only recommend using GraphicConverter for the task: with its best and slowest setting it routinely saves ¼ to ⅓ for the images I save in PNG format.]
A few more steps of manual intervention to single out all UTIs which do not conform to any ‘interesting’ UTIs - thus uncluttering the graph - gave this:
The PyObjC script isn’t exactly pretty and to work it requires (a) the folder /tmp/UTI-Icons, (b) the installed GraphViz Software and (c) a file named Info-Plists.text in the same folder as the script, each line of which is a path to an Info.plist file. As a consequence I wouldn’t recommend anybody to even look at it. But if you must, you’ll find it here. Furthermore, the full size PDF is available from my public DropBox folder and you can see a large, roughly annotated, bitmap at Flickr. Unfortunately I couldn’t find an easy way to turn the large image into a Google map. [And I’m somewhat annoyed that software and computers aren’t better at dealing with such files.]
I think we have the UTI diagram to end all UTI diagrams.
This should be a bit faster:
find -s / -type f -name “Info.plist” -exec grep —files-with-matches TypeDeclarations ‘{}’ +
Cool, thanks.
« 1000km • Main • amazon MP3 »