Quarter Life Crisis
The world according to Sven-S. Porst
« That was the week that was • Main • Identifying text files »
« That was the week that was • Main • Identifying text files »
960 words on earthlingsoft
An updated version of UnicodeChecker is coming your way. And this update by a tenth of a version number has quite a few new and improved features in it. Ranging all the way from significant speed improvements to the ability to abuse Spotlight on X.4 systems. I am very happy with what Steffen did in this version… partly because quite a few parts were positive responses to feature requests and nagging I did.
The biggest universal change to Unicode Checker in this revision is once more its faster startup speed. We already had significant speed improvements in the update to version 1.6, some of which vanished again in version 1.7 for people who installed the huge Unihan data file. The new version reads the Unicode data even more quickly and is reasonably fast when starting up with the huge Unihan file, and speedy even when starting up in its default setup.
One big new feature is Spotlight abuse. I’ve been nagging Steffen about this for… well, almost as long as Spotlight had been announced. The evolution of this is as follows: First there was the find sheet in UnicodeChecker which lets you find the Unicode character you’re interested in by its name. Good for finding the checkmark character or ☃. By now, a similar feature is available in Apple’s character palette as well. The next step was rough AppleScript support for that feature so we could integrate with applications like LaunchBar which gave us system wide access to Unicode characters by their name.
The latest step along this path is to provide a similar solution for Spotlight, which more people should be able to use. Unfortunately, Spotlight isn’t able to just index a whole database you throw at it, but to get meaningful information from it (in a documented way) requires you to use a single file for every character you want indexed. This means that for quite a small amount of data (about 4MB) you’ll put a lot of junk on your hard drive (over 35000 files using about 150MB of space) – which in my opinion is neither elegant nor good style… but if Apple can do it, so can we. As not everybody will want to do that, generation of the index files is optional. You’ll also have to make sure that you regenerate UnicodeChecker’s Spotlight files whenever there’ll be new Unicode versions – as information in the files might extend.
But improvements have been made to the find sheet as well. Not only will it display the currently selected character of the find results now right there in the sheet, but its find capabilities have been extended to also include the information of the Unihan data if it is available. Unfortunately that causes a slight slowdown when opening the sheet for the first time, as we’re handling quite a large set of data here and some chinese fonts may have to be loaded so everything can be displayed. Searches are slower as well. One particular consequence of this, thanks to the rich Unihan database is that you can abuse UnicodeChecker to look up Asian characters for English words.
Finally, the filter field in the sheet will now work ‘Google style’ and not match the complete string you enter but split it up into words and match all characters with descriptions or Unihan definitions containing all of those strings. That’s quite useful, say, when you’re looking for the character for the number one in a language: You search for ‘Tamil one’, which in the previous version wouldn’t have found anything because the relevant glyph is called ‘Tamil digit one’. But it will find the relevant character ௧ in the current version (along with two others, which will be left as an exercise for the reader to find out…)
Another improvement is that you can now adjust the font size in UnicodeChecker’s main list. Particularly when exploring unknown scripts, UnicodeChecker’s default size of characters can be too small to properly distinguish the characters. UnicodeChecker will also try to figure out the version number of the Unicode data files it uses and display them at the bottom of the window. Not many people may be aware of that, but you’ve been able to drop other versions of the Unicode data files into UnicodeChecker’s bundle to make it use those files.
That way you could upgrade to the newest Unicode data set if one is published between releases of UnicodeChecker (or you can downgrade to previous versions if you need that). But replacing files in an application’s bundle isn’t good style – and can be downright impossible if you’re not an administrator user. So now, you can have a ‘UnicodeChecker’ folder in your Application Support folder which contains a ‘Unicode Data’ folder in which you can drop the replacement files to override the built-in ones. As this can theoretically become quite complex with the many files involved and the various Library folders, clicking the Unicode version number at the bottom of the window will open a list with all the files UnicodeChecker uses, their version numbers and their location.
Finally, there’s an improvement to the ‘Split Up’ utility. UnicodeChecker’s utility window may be one of its most underappreciated features. But a quick glance at the File menu will make it easy to find. It offers a number of conversion options (and you can write your own plugins if you wish), one of which is ‘Split Up’. All it does is to display a list of Unicode characters for whatever text you enter. That can be quite useful when looking at things like accented letters. While it worked pretty well it has been improved to let you edit the string that was entered by manipulating the table.