Sunday, 27 April 2008
What's new in Unicode 5.2 ?
Previously discussed :
[2009-10-01 : Unicode 5.2 has now been released (Unicode Code Charts, BabelMap)]
As most of us are still trying to get to grips with Unicode 5.1, which was only released three weeks ago, it may seem a little premature to start talking about Unicode 5.2, but I'm blogging about it early this time because 5.2 promises to a very important release of Unicode, with 12,799 6,648 new characters and a record 16 15 new scripts, including the long awaited CJK Extension-C (4,149 characters) and major historical scripts such as Egyptian Hieroglyphs (1,071 characters) and Tangut (5,910 characters), as well as the famous woman's writing of southern China (Tangut and Nüshu were originally in Amd.6, but have since been removed for further study, and will not now be encoded until Unicode 6.0 at the earliest).
[This blog post has been updated several times since first published on 2008-04-27. The most recent update on 2009-08-10 reflects the final repertoires of ISO/IEC 10646:2003 Amdendments 5 and 6, which will be identical to the contents of Unicode 5.2 (Unicode 5.2 Code Charts).]
Unicode 5.2 will correspond to Amendments 5 and 6 of ISO/IEC 10646: 2003 (see Unicode Liaison Report for WG 2 meeting 52). Both these amendments have now completed their two rounds of technical balloting, and so no more changes will be made to their character repertoire. It is anticipated that Unicode 5.2 will be released at the end of September 2009 (which incidentally will be the first autumnal release of a new Unicode version since 3.0 in September 1999).
Amendment 5 (5,611 characters)
Amendment 5 has now been published (December 2008), and can be downloaded for free from the ISO Publicly Available Standards site.
New Scripts
Other New Blocks
Additions to Existing Blocks
Glyph Changes
Amendment 5 will also introduce changes to the representative glyph shape used in the code charts for the following characters (the new glyphs are given in N3465) :
- 04A8 CYRILLIC CAPITAL LETTER ABKHASIAN HA
- 04A9 CYRILLIC SMALL LETTER ABKHASIAN HA
- 04BE CYRILLIC CAPITAL LETTER ABKHASIAN CHE WITH DESCENDER
- 04BF CYRILLIC SMALL LETTER ABKHASIAN CHE WITH DESCENDER
- 11EC HANGUL JONGSEONG IEUNG-KIYEOK
- 11ED HANGUL JONGSEONG IEUNG-SSANGKIYEOK
- 11EE HANGUL JONGSEONG SSANGIEUNG
- 11EF HANGUL JONGSEONG IEUNG-KHIEUKH
- 1680 OGHAM SPACE MARK
- 19D1 NEW TAI LUE DIGIT ONE
Amendment 6 (1,037 characters)
Amendment 6 has now completed its two rounds or technical balloting (PDAM and FPDAM ballots), and after it has completed its final FDAM ballot it will be published. No more technical changes can now be made to the character repertoire, and so the character names and code points in the Amd.6 Code Charts can be relied on.
New Scripts
- Bamum @ A6A0..A6FF (88 characters) [originally in Amd.5, but removed for further study, and now added back to Amd.6]
- Imperial Aramaic @ 10840..1085F (31 characters)
- Inscriptional Pahlavi @ 10B60..10B7F (27 characters)
- Inscriptional Parthian @ 10B40..10B5F (30 characters)
- Javanese @ A980..A9DF (91 characters)
- Kaithi @ 11080..110CF (66 characters, including two section marks)
- Lisu [aka Fraser alphabet] @ A4D0..A4FF (48 characters)
- Meetei Mayek @ ABC0..ABFF (56 characters) [23 historical characters have been removed for further study following objections from India to the encoding of historical characters for this script]
Nushu [nüshu 女書 "women's script"] (389 characters) [removed for further study in light of concerns expressed by the UK]
- Old South Arabian [aka Sabaean] @ 10A60..10A7F (32 characters)
- Old Turkic [aka Orkhon-Yenisey] @ 10C00..10C4F (73 characters)
- Samaritan @ 0800..083F (61 characters)
Tangut (5,910 characters) [removed to Amd. 7 in light of concerns by UK, Ireland and Germany, as well as various Tangut experts; and now removed from Amd.7 for further study]
Other New Blocks
Additions to Existing Blocks
- Bengali : Ganda mark (1 character : U+09FB)
- CJK Unified Ideographs : additions for HKSCS (5 characters : U+9FC7..U+9FCB)
- Combining Diacritical Marks Supplement : Phonetic diacritic for describing Khoisan languages (1 character : U+1DFD)
- Currency Symbols : Livre Tournois sign, Spesmilo sign and Tenge sign (3 characters : U+20B6..U+20B8)
- Cyrillic Supplement : Cyrillic capital and small letter Pe with Descender for Abkhazian (2 characters : U+0524..U+0525)
- Devanagari : Vedic Extensions (5 characters : U+0900, U+094E, U+0955, U+0979..U+097A)
- Dingbats : Heavy exclamation mark (1 character : U+2757)
- Enclosed CJK Letters and Months : Japanese TV Symbols (12 characters : U+3244..U+324F)
- Latin Extended-C : phonetic symbols (3 characters : U+2C70, U+2C7E..U+2C7F)
- Miscellaneous Symbols : Japanese TV Symbols and a Soccer Ball symbol (59 characters : U+269E..U+269F, U+26BD..U+26BF, U+26C4..U+26CD, U+26CF..U+26E1, U+26E3, U+26E8..U+26FF)
- Miscellaneous Symbols and Arrows : heavy symbols (5 characters : U+2B55..2B59)
- Miscellaneous Technical : decimal exponent symbol (1 character : U+23E8)
- Myanmar : combining marks for Khamti Shan (4 characters : U+109A..U+109D)
- New Tai Lue : High Sua, Low Sua and number sign (3 characters : U+19AA..U+19AB, U+19DA)
- Number Forms : Japanese TV Symbols (4 characters : U+2150..U+2152, U+2189)
- Phoenician : number signs (2 characters : U+1091A..U+1091B)
- Tibetan : religious symbols (4 characters : U+0FD5..U+0FD8)
- Unified Canadian Aboriginal Syllabics : Additions for Woods Cree and Blackfoot (10 characters : U+1400, U+1677..U+167F)
Unicode 5.2 Fonts
The following are some free or shareware fonts that include some of the characters added in Unicode 5.2:
- Aboriginal Serif / Aboriginal Sans Serif (covers all the new Unified Canadian Aboriginal Syllabics characters)
- Aegyptus (includes the 1,071 characters in the new Egyptian Hireroglyphs block [13000..1342F], as well as many as yet unencoded hieroglyphs and other characters in the Supplementary Private Use Area-A) [NB Under Windows 7 Egyptian hieroglyphs and all the other Unicode 5.2 characters in the Supplementary Multilingual Plane render as two .notdef glyphs in Notepad and most other Windows applications — this is due to a problem with the version of Uniscribe that ships with Windows 7, which supports Unicode 5.1 but is not forwardly compatible with Unicode 5.2]
- HanaMin (includes the eight new characters in the main CJK Unified Ideographs block [9FC4..9FCB], all 4,149 characters in the CJK-C block, the three new characters in the CJK Compatibility Ideographs block [FA6B..FA6D], most of the characters in the Enclosed Ideographic Supplement block, and the four new characters in the Enclosed CJK Letters and Months block])
- New Athena Unicode (includes the seven new Coptic characters in the range 2CEB..2CF1)
- LisuTzimu (covers the new Lisu block)
- Padauk (covers Myanmar Extended-A)
- Quivira (includes various new Latin, Cyrillic and Coptic characters, as well as some of the new currency signs, fraction signs and symbols)
- Tai Heritage Pro (covers Tai Viet)
- Tibetan Machine Uni (includes the four svasti signs at 0FD5..0FD8)
- UnBatang (includes the new characters in the Jamo block, and all the characters in the new Hangul Jamo Extended-A and Hangul Jamo Extended-B blocks)
On Beyond Unicode 5.2
Tags:
Unicode
Index of BabelStone Blog Posts