Saturday, 15 July 2006
Unicode 5.0 was finally released yesterday (although it won't be published in book form until later this year), several months after its original anticipated date of release (see What's New in Unicode 5.0 for a sumary of what's new). This is a small triumph for me as I am responsible for the introduction of one of the new scripts now covered by Unicode, the historic 'Phags-pa script that was used for writing Chinese, Mongolian and other languages during the 13th and 14th centuries (there is a worthwhile story here about the long and sometimes fraught passage from initial proposal to final encoding of the script, but it will have to wait for another day).
A new version of Unicode inevitably means the release of new versions of my flagship software products, BabelMap and BabelPad, and so I am pleased to announce that BabelMap version 5.0.0.1 is now available for download. Up until a few days ago a new Unicode 5.0 enabled version of BabelPad was also ready for release, but as usual I couldn't leave things well alone, and decided to add in just one more feature; and of course this feature required me to entirely disembowel the code, so that it is now in a wretched and lifeless state (as my friends in the programming fraternity know, I am a keen exponent of the art of eXtreme reFactoring) ... but hopefully BabelPad (with many great new features) will be released before the end of the month.
My number one question from new BabelMap users is "Why is such-and-such a character displayed as a little square box ?" or "Why doesn't BabelMap support such-and-such a script ?" The reason for such questions is almost invariably that the characters they want to see are not available in the default font that BabelMap uses when it is first started (Tahoma), and they do not realise that they have to select an appropriate font to see a particular character. For me it is obvious that any given font only supports a particular subset of the Unicode repertoire (due to the 64K glyph limit for TrueType fonts, it is physically impossible for any font to cover the entire Unicode repertoire of 99,098 characters), and so you may need to select different fonts to display different characters; but for many people this is not at all evident. I have therefore changed BabelMap so that you can either select a single font to display all characters (good for seeing what a particular font covers) or use a user-defined virtual, composite font in which each Unicode block is mapped to a particular font on your system, with the result that different Unicode blocks will be rendered using different fonts (good if you are more interested in characters than fonts). By default BabelMap will use a composite font when run for the first time, so that most characters in the BMP should be displayed OK if you are running Vista, and hopefully I should get fewer questions from new users about little square boxes.
My OpenType Analysis Tool is still half-finished, and with no time to work on it, it won't be available until sometime year.
I am also planning to add in the ability to take a picture of a character as rendered using the selected font, which will be made available to the clipboard as a bitmap image ... useful if you want to display a character on a web page in situations where you doubt that the end user will have an appropriate font.
P.S. You may notice that I have done away with the arbitrary version numbering system that I previously employed (which never got beyond version 1 and never would have), and replaced it with a four-digit version number that is linked to the version of Unicode that the particular release of BabelMap/BabelPad supports. The first three digits of the version number now correspond to the Unicode version supported, and the last digit is the version of the BabelMap/BabelPad released for this version of Unicode. Thus, the new release of BabelMap is version 5.0.0.1, as it is the first release supporting Unicode 5.0.0.
Following hot on the heels of the announcement of the release of the Unicode 5.0 character database (but not the publication of the actual Unicode 5.0 standard) on 2006-07-14 comes the notice of publication of the corresponding ISO/IEC 10646: 2003 Amendment 2, two weeks earlier (on 2006-07-01).
It's a bit of a chicken and egg relationship between Unicode and ISO/IEC 10646, further confused by the fact that although ISO/IEC 10646 Amd.2 was published before Unicode 5.1, Unicode 5.1 includes four characters (U+097B, U+097C, U+097E and U+097F) from ISO/IEC 10646 Amd.3, which won't be published until next year ... along with Unicode 5.1. And by that time we'll be well into the work of Amd.4 (corresponding to Unicode 5.2 or 6.0), which should finally include Egyptian Hieroglyphs (or at least the Gardiner subset).
Index of BabelStone Blog Posts