Quarter Life Crisis

The world according to Sven-S. Porst

UnicodeChecker 1.7¶

Another update to UnicodeChecker – one to smoothen a few of the existing aspects and to introduce a major new feature, particularly for those who deal with Asian characters.

UnicodeChecker Icon The big new feature is that Unicode Checker adds support for the ‘Unihan.txt’ data file of the Unicode Data repository. This is a huge (27MB) file which contains all sorts of information on the characters of various Asian languages. As the file is so huge and using it will increase the memory usage and startup time of the application, it remains an optional install that people will have to download separately.

The information in the file covers many different aspects, starting from technical bits like the encoding of the character in various Asian encodings, going on to practical aspects like information on how to enter the character in various input methods and also including information on the number of strokes in the character which is used in some dictionaries. Finally it contains the ‘kDefinition’ tag for many characters which gives an English translation of what the character can mean.

While some of the information will already be present in the system in some form for the Asian input methods and the character palette, I haven’t seen the complete information available yet – particularly not the word definitions. As you’re most likely not interested in all the information that’s given but will only need a subset for your current task, Steffen has come up with a clever way of filtering the available information: The filter field below the list will accept a number of words and will highlight all entries that match any of the words in the table and move them to the top row. While using OR rather than AND to join all the words in the filter field may be a bit unexpected in the days of Google dominance, it’s very useful in this case.

Another new feature is that where old versions of the applications only accepted input of a character’s Unicode number in decimal and hexadecimal notation, we now have an option to type in numbers in hexadecimal UTF-8 encoding – complete with explanations of why what you entered isn’t correct UTF-8 – as well. This has been because of users requests and Steffen even went a bit further and made this part of UnicodeChecker’s plugin system: input methods can now be supplied by plugins and if you think you really need an octal or binary input method, you can just go ahead and make one yourself. The necessary header files will be supplied.

Perhaps people aren’t quite aware that UnicodeChecker’s ‘Utilites’ window has been run by plugins as well for quite some time now. It’s quite a plugin-rich application at this stage…

The other improvements are there to fix minor glitches in conversions, display or AppleScript. Refer to the inlcuded readme file or the version history for more details.

May 10, 2005, 1:05

Tagged as earthlingsoft, UnicodeChecker.