Quarter Life Crisis

The world according to Sven-S. Porst

« DurationsMainGood Bye America »

UnicodeChecker 1.6.1

580 words on

UnicodeChecker Icon There has been another update to our UnicodeChecker tool. The improvements Steffen made are few but significant: First, a bug was removed that killed spaces in your text when using the HTML to Unicode service. Thanks for the people who reported this. Second, the Unicode files were upgraded to use version 4.01.

And finally, the behaviour of UnicodeChecker’s font menu was improved. The font menu was only added in version 1.6 and had been a long standing request then. Basically, for any character, the menu will display all fonts that have a glyph for that character. This worked rather well from the beginning but was very slow. Presumably because the system needs a while to check all fonts for the availability of a given glyph – particularly on systems with a large number of fonts. So, to improve the speed, UnicodeChecker simply cached all the relevant information just after starting, giving a nice and fast menu. However this is a lot of information and gathering it seems to use a lot of memory as well. And while many people claim the VSIZE number in top isn’t relevant, having almost 300MB there for Unicode checker wasn’t good. Just by launching it, I could often generate new swap files on my computer. So I wasn’t too keen on that menu, in particular as I only rarely use it. Hence there’s now a new preference that lets you decide whether you want the font menu to be cached at application launch time or not. Depending on what you use UnicodeChecker for, you can decide now.

And that’s it. -1 bug, +1 upgrade and a new option that makes UnicodeChecker much more social on the system – at least in the way I use it.

And while I’m rambling about Unicode and characters, let me just add a few observations I made recently. On the difference between Unicode and (old-school, at least) TeX, as far as accents are concerned. Traditionally, in TeX, accents are generated by commands like "a for ä, \^o for ô and so on. Starting at U+300, Unicode has similar features to compose arbitrary accented characters. Many accents are provided, but unlike in TeX, the accents are composed by first giving the character and then the accent, i.e. in reverse order than that in TeX. One thing I found funny, though, was the following: If you want an accented i or j in TeX, you can’t simply type 'i to get í. That command will give you ı̇́. So you have to use ı, the undotted i, given by the TeX command \i to achieve that. While this isn’t too pretty or practical it’s at least very logical.

Now compare this to Unicode. There, the ‘accents’ seem to be quite clever. For example, í, will be decomposed as U+69 U+301, that is as a dotted i with an acute accent. Convenient for typing, perhaps but not exactly logical. But things start being really strange once you start being interested in doubly accented letters as the ? we’ve seen above. How do you get this in Unicode by composing things. We already know it can’t be an acute accent on the i as that is already í. So for this we have to use an acute accent on a dot accent on an undotted i (U+131 U+307 U+301), which is a bit strange.

Oh – and some doubly accented letters even seem to be included in Unicode: ǚ (U+1DA) or ị (U+1ECB). Fun.

November 2, 2004, 2:14

Tagged as earthlingsoft, UnicodeChecker.

Add your comment

« DurationsMainGood Bye America »

Comments on




This page

Out & About

pinboard Links


Received data seems to be invalid. The wanted file does probably not exist or the guys at last.fm changed something.