
AppleScript
A number of UnicodeChecker's features are available for use in your AppleScripts:
- You can get references for individual code points and query several Unicode properties including Unihan tag/value pairs.
- You can ask UnicodeChecker to convert non-ASCII code points in strings to their HTML entity representation and the other way round. This may be useful if you want to generate HTML pages via AppleScript using data from sources like the MacOS Address Book or you want to extract data from HTML pages.
- You can have UnicodeChecker escape or unescape string sequences (see the section on Escaping for more information).
- You can convert strings to or from IDNA using the 2003 or the 2008 IDNA protocol. If an IDNA conversion fails, the AppleScript constant missing value is returned.
- You can have UnicodeChecker display code point information in its main window.
The documentation for scripting support can be viewed from Script Editor’s “Open Dictionary…” menu command.
To get an idea of what you can do, have a look at the AppleScript sample.
Additional Notes
Some additional information which can not be found in UnicodeChecker’s AppleScript dictionary.
Performance and Memory Usage
Currently, Unicode contains 1114112 code points. Querying the complete set of code points for specific properties (e.g. code points where unicode name contains "latin") may therefore be time and memory consuming. When waiting for a command to return, AppleScript times out after a specific amount of time. If your script times out you can use AppleScript’s with timeout statement to influence how long AppleScript waits. More information on the with timeout statement can be found in the AppleScript documentation here and here. However, when performing lengthy commands UnicodeChecker also uses quite a lot of memory – sometimes the memory requirement may reach 1 GB or more. To reduce time and memory requirements of your script you can try to restrict queries to smaller code point sets (e.g. querying either individual planes or blocks). If you need to query the whole codespace, it may be better to use a repeat statement to iterate over all planes and query each plane individually inside the repeat loop.
Class code point
- Keep in mind that ID and index are different things for code points. While the ID of a code point is identical to the actual code point value, the index is the code point value plus 1. For example the first code point specifies code point id 0.
- Be careful when using 'whose' clauses to filter code points in very large sets (such as the application’s set), as this will probably time out the script while UnicodeChecker evaluates the clause and also uses very much RAM (possibly > 1 GB ).
Class plane
Similar to code point, be careful not to confuse ID and index when referring to individual planes: plane id 0 is identical to plane 1, etc.
Unicode Strings in AppleScript on Mac OS X 10.6 and Later
As of Mac OS X 10.6 (Snow Leopard), the AppleScript-Editor correctly handles Unicode strings.
Unicode Strings in AppleScript on Mac OS X 10.5 and earlier
On Mac OS X 10.5 (Leopard) and earlier, using Unicode strings in AppleScript is not trivial: For example, when you enter set theText to "⅋" in Script Editor it will be replaced by set theText to "?". As a workaround you can enter the hexadecimal UTF-16 or UTF-8 codes to produce Unicode strings:
- set theText to («data utxt214B» as Unicode text)
- set theText to («data utf8E2858B» as Unicode text)
There is a caveat to this: Although AppleScript also accepts set theText to «data utxt214B» without the coercion as Unicode text, the resulting string is different on Intel and PowerPC Macs. This is due to the different endian-ness of Intel and PowerPC processors. So it is a good idea to always use («data utf8…» as Unicode text) which will produce identical strings on both architectures.
Are you using AppleScript?
We are interested in how you use UnicodeChecker's AppleScript support. So drop us a line and let us know which AppleScript features you find particularly useful and which you'd like to see in future versions.