BabelStone Blog

Monday, 13 June 2011

What's new in Unicode 6.1 ?

Previously discussed :

[2012-02-01 Update: Unicode 6.1.0 was released on 31 January 2012.]

Unicode 6.1 is scheduled for release in Spring 2012, and will be synchronized to the 3rd edition of ISO/IEC 10646 (see Unicode Liaison Report to WG2). Confusingly, the 3rd edition is actually the 5th iteration of the ISO/IEC 10646 standard, but it is the 3rd edition of the combined one-part standard first published in 2003 that superceded the original two-part standard (Part 1: Architecture and Basic Multilingual Plane; Part 2: Supplementary Planes) first published in 1993 (see Unicode and ISO/IEC 10646 for more details on the relationship between the Unicode and ISO/IEC 10646 standards). The first combined edition published in 2003 (corresponding to Unicode 4.0) underwent eight amendments in as many years, adding 41 new scripts, 84 new blocks, and 13,002 new characters (see How many Unicode characters are there ?), before a second edition (corresponding to Unicode 6.0) was published earlier this year. Due to technical issues with the CJK-B fonts, the CJK-B code chart was printed in single-column format rather than the multi-column format used for the other CJK blocks, and in order to rectify this deficiency a third edition will be published straight away (instead of first publishing a series of amendments to the second edition).

The 3rd edition of ISO/IEC 10646 has already completed two rounds of balloting, and will undergo one final (FDIS) ballot later this year, before being published sometime next year. The character repertoire, code points and character names are now stable, and highly unlikely to change before publication. Unicode 6.1 will correspond to the repertoire of this 3rd edition of ISO/IEC 10646.

The 3rd edition of ISO/IEC 10646 has 733 new characters compared with the 2nd edition, but as one these characters was fast-tracked into Unicode 6.0 (U+20B9 ₹ Indian Rupee Sign), Unicode 6.1 will include a total of 732 new characters, including seven new scripts, as detailed below. This will mean that Unicode 6.1 comprises a total of 110,116 graphic and format characters.

The final 3rd edition code charts are not yet ready, but an earlier version of the code charts showing the new additions (with some characters that have since been removed) is available.

New Scripts

Unicode 6.1 includes the following seven scripts, which are all encoded in the Supplementary Multilingual Plane (SMP). The Basic Multilingual Plane (BMP) is now almost full, and it is unlikely that any new scripts will be encoded in the BMP.

Meroitic Hieroglyphs {10980..1099F} : 32 characters for the 'monumental' form of the Meroitic script that was derived from Egyptian hieroglyphs [N3665]
Meroitic Cursive {109A0..109FF} : 26 characters for the 'cursive' form of the Meroitic script that was derived from Egyptian Demotic (40 fraction and number characters have been removed from the proposed repertoire pending further study) [N3665]
Sora Sompeng {110D0..110FF} : 35 characters for the Sora Sompeng script used in India [N3647]
Chakma {11100..1114F} : 67 characters for the Chakma script used in Bangladesh and India [N3645]
Sharada {11180..111DF} : 83 characters for the Śāradā script which was the principal inscriptional and literary script of Kashmir from the 8th through 20th centuries, but which is now virtually obsolete [N3595]
Takri {11680..116CF} : 66 characters for the Takri script that was used for writing the Dogri language of Kashmir until the 1940s [N3758]
Miao {16F00..16F9F} : 133 characters for the Old Miao script that was devised by Samuel Pollard during the early 20th century [N3761, N3789, N3877]

Funerary stele with Meroitic Hieroglyphic inscription [CC-BY-SA-3.0 by Piero d'Houin dit Triboulet]

New Blocks

Unicode 6.1 also includes four new blocks for extensions to existing scripts and for symbols:

Arabic Extended-A {08A0..08FF} : 39 characters (9 letters for African languages, 15 characters for Rohingya, 4 Koranic annotation signs, 11 vowel signs for African and Philippine languages) [N3791, N3816, N3882]
Sundanese Supplement {1CC0..1CCF} : 8 punctuation marks used in old Sundanese manuscripts [N3666]
Meetei Mayek Extensions {AAE0..AAFF} : 23 characters used in historical orthographies of Meetei Mayek, and which are not defined for modern use by the Manupuri Government [N3206, N3470, N3478]
Arabic Mathematical Alphabetical Symbols {1EE00..1EEFF} : 143 characters used in Arabic mathematical expressions [N3799]

Additions to Existing Blocks

Armenian {0530..058F} : 1 character (U+058F Armenian Dram Sign) [N3771]
Arabic {0600..06FF} : 1 character (U+0604 Arabic Sign Samvat) [N3734]
Gujarati {0A80..0AFF} : 1 character (U+0AF0 Gujarati Abbreviation Sign) [N3764]
Lao {0E80..0EFF} : 2 letters for Khmu [N3893]
Georgian {10A0..10FF} : 5 letters for Ossetian and Abkhaz [N3775]
Sundanese {1B80..1BBF} : 9 characters for historic usage [N3666]
Vedic Extensions {1CD0..1CFF} : 4 characters [N3844, N3861, N3881]
Miscellaneous Mathematical Symbols-A {27C0..27EF} : 2 diagonal bar symbols [N3763]
Coptic {2C80..2CFF} : 2 letters for the Bohairic dialect [N3873]
Georgian Supplement {2D00..2D2F} : 2 letters for Ossetian [N3775]
Tifinagh {2D30..2D7F} : 2 letters for Tuareg [N3870]
Supplemental Punctuation {2E00..2E7F} : 10 characters (8 historic punctuation marks, and 2 em dashes) [N3664, N3740, N3770]
CJK Unified Ideographs {4E00..9FFF} : 1 character (U+9FCC = Adobe-Japan1-6 CID+20156, a variant of U+6DBC 涼) [N3885]
Cyrillic Extended-B {A640..A69F} : 9 characters for medieval Church Slavonic manuscripts [N3748]
Latin Extended-D {A720..A7FF} : 5 letters (including the Cambrian symbol (U+A792), but excluding middle dot letter, which was again removed at the request of the US) [N3840, N3846]
CJK Compatibility Ideographs {F900..FAFF} : 2 characters (U+FA2E and U+FA2F) [N3747]
Enclosed Alphanumeric Supplement {1F100..1F1FF} : 2 characters (marque de commerce and marque déposée signs used in Canada) [N3860]
Miscellaneous Symbols and Pictographs {1F300..1F5FF} : 4 Orthodox typikon symbols [N3772]
Emoticons {1F600..1F64F} : 13 more emoticons (Grinning Face, Expressionless Face, Confused Face, Kissing Face, Kissing Face with Smiling Eyes, Face with Stuck-Out Tongue, Worried Face, Frowning Face with Open Mouth, Anguished Face, Grimacing Face, Face with Open Mouth, Hushed Face, Sleeping Face) [N3790]

Other Changes

Formal aliases will be defined for the following two Magnetic Ink Character Recognition (MICR) symbols used on cheques by banks, that were inadvertently given each other's name when encoded twenty years ago:

U+2118 ℘ SCRIPT CAPITAL P will be given the formal alias WEIERSTRASS ELLIPTIC FUNCTION
U+2448 ⑈ OCR DASH will be given the formal alias MICR ON US SYMBOL
U+2449 ⑉ OCR CUSTOMER ACCOUNT NUMBER will be given the formal alias MICR DASH SYMBOL

Once assigned character names may not be changed, so formal aliases are a mechanism for ameliorating problems caused by woefully misnamed characters, and processes are encouraged to use formal aliases in place of the official character names in user interfaces. Only a handful of characters have been assigned formal aliases, and the above are the first new formal aliases to be defined since formal aliases were introduced in Unicode 5.0 (July 2006). Formal aliases are only assigned in rare cases where there is a typographical error in the name (e.g. "bracket" misspelled as "brakcet") or where the name is confusingly wrong ("Yi Syllable Wu" is a syllable iteration mark, not the syllable wu), and are not assigned in cases where a character name is merely suboptimal or where there is academic dispute about about the transliteration or naming conventions used. See Unicode Character Names Part 3 for more details about formal aliases.

Unicode 6.1 Fonts

The following are some free or shareware fonts that already (prematurely) include some of the characters that will be added in Unicode 6.1:

BabelStone Han (covers the one new CJK unified ideograph and the two new CJK compatibility ideographs)
Everson Mono (covers various 6.1 additions for Armenian, Georgian, Georgian Supplement, Tifinagh, Supplemental Punctuation, Cyrillic Extended-B, and Latin Extended-D)
Symbola (covers the additions for Miscellaneous Mathematical Symbols-A, Supplemental Punctuation, Miscellaneous Symbols and Pictographs, and Emoticons)

BabelMap for Unicode 6.1

A test version of BabelMap Online supporting Unicode 6.1 is now available:

BabelMap Online for Unicode 6.1 Beta

Tags:

Unicode

Index of BabelStone Blog Posts