Quarter Life Crisis

The world according to Sven-S. Porst

« 1000kmMainamazon MP3 »

UTIs

891 words on

Mike recently ‘twittered’ about visualising the dependences among UTIs and eventually came up with the pretty hand-made graph based on Apple’s documentation presented in this blog post.

I have been amazed by UTIs since first seeing them as they look like a reasonable way to unify the legacy (Mac File and Creator type), internet (MIME-Type) and stupid (file name extension) ways of identifying file types. UTIs maintain backwards compatibility while opening the way for future development [I totally want to see integration with Unix’ file command. But I fear we’ll have to wait for the day when technology has advanced enough to provide all the power of a PDP-11 in a pocket size package before we can expect to see that.] and a wider use [e.g. non-file data-types, like the clipboard]. Thus I consider them and their proliferation across the Mac APIs a good thing.

The basic idea for UTIs is quite simple: software can specify them in ‘reverse domain name notation’ (e.g.: public.text, public.xml, com.apple.quicktime-movie), it can specify which other methods of file type identification a UTI is equivalent to and it can assign other UTIs which the new UTI conforms to. Those conformances give a system of inherited types and you’ll find that public.xml → public.text → public.data. Graphing those relations is the task this text is about.

The first step towards achieving that is collecting the data making up that graph. Apple’s documentation only lists a few examples and the biggest bunch of ‘standard’ UTIs is listed in the ‘CoreTypes’ bundle in MacOS X’s system folder.

In principle Launch Services should know about all UTIs on the machine [considering the output of lsregister -dump] but as we are in hacking mode here and I’m not familiar with Launch Services, I went for the dumb option: collect all Info.plist files on the system in which the UTIs are stored. Spotlight fails me for that, locate gives quick and incomplete results (lacking non public readable folders) and this bit of command line junk runs slowly but yields what seems to be a complete list that I stored in a file:

sudo find -s / | grep Info.plist | sed 's/\(.*\)/"\1"/g' | xargs -L 5 grep --files-with-matches TypeDeclarations

Said file is then used by a PyObjC script which reads the property lists, collects the UTIs and the relations between them and creates a GraphViz file. A colleague recently introduced me to GraphViz which converts simple text input algorithmically into graphs. Those aren’t perfectly pretty and can be a bit hard to control, but with more than a dozen items, it seems to be a reasonable way of avoiding the tedium of manual graph layout.

I am then running the created GraphViz file through the twopi tool using

twopi -Nfontname=Helvetica -Tpdf /tmp/UTIGraph.gv > /tmp/UTIs.pdf

which creates a reasonable layout.

Of course I had to start tweaking things a little, soon after seeing the graph the first time. As loads of UTIs conform to public.data or public.bundle or some sort of zip archive, that was making the graph quite messy. Hence I decided to clean things up a bit by suppressing those conformances and expressing them by the node’s shape or colour. I also removed all the single nodes and placed them separatedly. (An attempt to group things better seems to fail with the twopi tool).

As I was using PyObjC anyway, I figured I could just use NSWorkspace to grab the relevant icons. This resulted in a very heavy PDF file which creates all sort of scrolling FAIL in X.5’s Preview along with a less than impressive speed. (In fact, one version of the file even stumped Illustrator when opening it in there.) So slightly more control was needed, to scale down all icons to a 128×128 size and to avoid using the generic document icon for all files without custom icons.

While I’m at this: can anybody tell me a good way to compare NSImages? The only way I could find was using iEqualToData: on a TIFFRepresentation. But I couldn’t find a way to create a working hash of the image data. [In addition to that, the image nerds among you may enjoy comparing the PNG compression ratios achieved by Cocoa in different versions of OS X. Personally, I can only recommend using GraphicConverter for the task: with its best and slowest setting it routinely saves ¼ to ⅓ for the images I save in PNG format.]

A few more steps of manual intervention to single out all UTIs which do not conform to any ‘interesting’ UTIs - thus uncluttering the graph - gave this:

UTIGraph

The PyObjC script isn’t exactly pretty and to work it requires (a) the folder /tmp/UTI-Icons, (b) the installed GraphViz Software and (c) a file named Info-Plists.text in the same folder as the script, each line of which is a path to an Info.plist file. As a consequence I wouldn’t recommend anybody to even look at it. But if you must, you’ll find it here. Furthermore, the full size PDF is available from my public DropBox folder and you can see a large, roughly annotated, bitmap at Flickr. Unfortunately I couldn’t find an easy way to turn the large image into a Google map. [And I’m somewhat annoyed that software and computers aren’t better at dealing with such files.]

July 23, 2009, 1:19

Tagged as graph, mac, pyobjc, uti.

Comments

Comment by Jonathan Wight: User icon

I think we have the UTI diagram to end all UTI diagrams.

July 24, 2009, 12:12

Comment by carlo: User icon

This should be a bit faster:

find -s / -type f -name “Info.plist” -exec grep —files-with-matches TypeDeclarations ‘{}’ +

July 30, 2009, 10:54

Comment by ssp: User icon

Cool, thanks.

July 30, 2009, 11:04

Add your comment

« 1000kmMainamazon MP3 »

Comments on

Photos

Categories

Me

This page

Out & About

pinboard Links

People

Ego-Linking