UnicodeChecker Icon

Utilities

UnicodeChecker contains a number of Utilities for displaying and converting strings to and from a number of formats. They can be accessed using the “New Utilities Window” command from the File menu. Use the popup menu in the window to choose the Utility you want to use. Note that several Utility windows can be opened at the same time in case you need different conversions at once. UnicodeChecker comes with the following Utilities pre-installed:

Diff
Shows differences between two strings. For display purposes, new line and some other white space characters are replaced by a space (U+0020). However, the hexadecimal values will correctly represent the original input strings.
Escape
Escapes or unescapes strings to or from CSS 1, CSS 2, CSS 2.1, C99, Java or URL (UTF-8) percent escape codes. More information on Escaping.
HTML
Converts Unicode code points into the according HTML entity representation or the other way round. When converting to HTML entities you can specify if you want to convert code points from the low ASCII range (lower than U+0080) as well (this only affects the ampersand ‘&’, lower than ‘<’, greater than ‘>’ and double quote ‘"’ characters).
IDNA
This is an implementation of “Internationalized Domain Names In Applications (IDNA)”. See IDNA in UnicodeChecker Help for more information.
Length
Displays the number of codepoints in the input string along with the number of code units and bytes required to represent the input in UTF-8, UTF-16 and UTF-32 encoding. A code unit is defined to be 1 byte in UTF-8, 2 bytes in UTF-16 and 4 bytes in UTF-32.
Normalization
Normalizes any given string using the four Unicode Normalization Forms specified in Unicode Standard Annex #15. A disabled button with label ‘Equal’ is displayed next to normalized forms which are identical to the input string. If the normalized for differs from the input string, a button with label ‘Diff’ is displayed which will take you to the ‘Diff’ utility for comparing the differences.
Split up
Any text you enter is split up into its Unicode code points. You can double click the table cells to edit their contents (insert or remove characters) and by using drag and drop.
Unshredder
Checks whether the entered string may have been saved in UTF-8 and then erroneously been read in another encoding. For example “ç” is encoded as UTF-8 C3A7. In Latin-1 this decodes as “ç”. The tool helps you find out which encodings could have been involved in such a mix-up and recover the original string.

In case these Utilities do not meet your needs, note that additional Utilities can be added to UnicodeChecker as plugins. More information for developers.