BabelStone Blog

Friday, 1 January 2016

What's new in Unicode 9.0 ?

Unicode version 9.0 is scheduled for release in June 2016. The final repertoire is now fixed, and 7,500 characters (including 72 emoji) will be added to Unicode 9.0. This will bring the total number of graphic and format characters in the Unicode Standard to 128,172 characters (in case you are concerned that Unicode is running out of space, that still leaves room for another 846,293 characters to be encoded). In summary, Unicode 9.0 wil include 11 new blocks (named ranges of characters) and cover 6 new scripts (Osage, Newa, Bhaiksuki, Marchen, Tangut, and Adlam), making a total of 270 blocks and 135 scripts.

Emoji

74 Emoji characters have been accepted for encoding in Unicode 9.0. However, two of these characters have been de-emojified at the request of Apple: U+1F946 RIFLE (representing Shooting or Hunting) and U+1F93B MODERN PENTATHLON (which includes Pistol Shooting as one of its disciplines) will have no Unicode properties to suggest that they are emoji. So the two characters will still be encoded in Unicode 9.0, but as plain symbols not as emoji characters; and it is unlikely that any major vendors will implement them as emoji.

Provisional Code Point	Provisional Character Name	Source
U+1F57A	MAN DANCING Encoded to match U+1F483 💃 DANCER (typically implemented as a female dancer)	L2/15-054
U+1F5A4	BLACK HEART unequivocally represented as black in all variants Encoded because there is a need for a black-coloured heart emoji, and U+2764 ❤ HEAVY BLACK HEART is typically implemented as a red heart	L2/15-054
U+1F6D1	OCTAGONAL SIGN stop sign	L2/15-054
U+1F6D2	SHOPPING TROLLEY shopping cart	L2/15-195
U+1F6F4	SCOOTER	L2/15-054
U+1F6F5	MOTOR SCOOTER	L2/15-054
U+1F6F6	CANOE	L2/15-196
U+1F919	CALL ME HAND	L2/15-054
U+1F91A	RAISED BACK OF HAND	L2/15-054
U+1F91B	LEFT-FACING FIST	L2/15-054
U+1F91C	RIGHT-FACING FIST	L2/15-054
U+1F91D	HANDSHAKE	L2/15-054
U+1F91E	HAND WITH INDEX AND MIDDLE FINGERS CROSSED	L2/15-054
U+1F920	FACE WITH COWBOY HAT	L2/15-054
U+1F921	CLOWN FACE	L2/15-054
U+1F922	NAUSEATED FACE	L2/15-054
U+1F923	ROLLING ON THE FLOOR LAUGHING	L2/15-054
U+1F924	DROOLING FACE	L2/15-054
U+1F925	LYING FACE	L2/15-054
U+1F926	FACE PALM	L2/15-054
U+1F927	SNEEZING FACE gesundheit	L2/15-195
U+1F930	PREGNANT WOMAN	L2/15-054
U+1F933	SELFIE typically used with face or human figure	L2/15-054
U+1F934	PRINCE Encoded to match U+1F478 👸 PRINCESS	L2/15-054
U+1F935	MAN IN TUXEDO groom Encoded to match U+1F470 👰 BRIDE WITH VEIL	L2/15-054
U+1F936	MOTHER CHRISTMAS Mrs. Claus Encoded to match U+1F385 🎅 FATHER CHRISTMAS	L2/15-054
U+1F937	SHRUG	L2/15-054
U+1F938	PERSON DOING CARTWHEEL gymnastics	L2/15-196
U+1F939	JUGGLING	L2/15-195
U+1F93A	FENCER fencing	L2/15-196
U+1F93B	MODERN PENTATHLON NOT AN EMOJI (see above)	L2/15-196
U+1F93C	WRESTLERS wrestling	L2/15-196
U+1F93D	WATER POLO	L2/15-196
U+1F93E	HANDBALL	L2/15-196
U+1F940	WILTED FLOWER wiltered rose	L2/15-054
U+1F941	DRUM WITH DRUMSTICKS	L2/15-195
U+1F942	CLINKING GLASSES	L2/15-054
U+1F943	TUMBLER GLASS typically shown with iced drink whisky	L2/15-195
U+1F944	SPOON	L2/15-195
U+1F945	GOAL NET	L2/15-196
U+1F946	RIFLE marksmanship, shooting (Olympic sport) hunting NOT AN EMOJI (see above)	L2/15-196
U+1F947	FIRST PLACE MEDAL gold medal	L2/15-196
U+1F948	SECOND PLACE MEDAL silver medal	L2/15-196
U+1F949	THIRD PLACE MEDAL bronze medal	L2/15-196
U+1F94A	BOXING GLOVE boxing	L2/15-196
U+1F94B	MARTIAL ARTS UNIFORM judo and other martial arts	L2/15-196
U+1F950	CROISSANT	L2/15-054
U+1F951	AVOCADO	L2/15-054
U+1F952	CUCUMBER	L2/15-054
U+1F953	BACON	L2/15-054
U+1F954	POTATO	L2/15-054
U+1F955	CARROT	L2/15-054
U+1F956	BAGUETTE BREAD french bread	L2/15-195
U+1F957	GREEN SALAD	L2/15-195
U+1F958	SHALLOW PAN OF FOOD paella, casserole	L2/15-195
U+1F959	STUFFED FLATBREAD döner kebab, falafel, gyro, shawarma	L2/15-195
U+1F95A	EGG	L2/15-267
U+1F95B	GLASS OF MILK	L2/15-267
U+1F95C	PEANUTS	L2/15-267
U+1F95D	KIWIFRUIT	L2/15-267
U+1F95E	PANCAKES	L2/15-267
U+1F985	EAGLE	L2/15-054
U+1F986	DUCK	L2/15-054
U+1F987	BAT	L2/15-054
U+1F988	SHARK	L2/15-054
U+1F989	OWL	L2/15-054
U+1F98A	FOX FACE	L2/15-054
U+1F98B	BUTTERFLY	L2/15-195
U+1F98C	DEER	L2/15-195
U+1F98D	GORILLA	L2/14-092 L2/15-195
U+1F98E	LIZARD	L2/15-195
U+1F98F	RHINOCEROS	L2/15-195
U+1F990	SHRIMP	L2/15-267
U+1F991	SQUID	L2/15-267

NB The above code points and character names are subject to change, and should not be relied on at this point in time.

Sources

L2/15-054 Emoji Subcommittee, Emoji Additions: Animals, Compatibility, and More Popular Requests (2015-05-21)
L2/15-195 Emoji Subcommittee, Emoji Additions Tranche 6: More Popular Requests and Gap Filling (2015-07-28)
L2/15-196 Hiroyuki Komatsu, Proposal to add more sports-related emoji characters (2015-07-31)
L2/15-267 Hiroyuki Komatsu, Proposal to add more food emoji characters (2015-11-05)

These characters are currently under ISO ballot for inclusion in ISO/IEC 10646:2016 (5th ed.) (see WG2 N4705 pages 130, 131, 135, and 137–138). Most of the 8,514 characters in this document will feed into Unicode version 10.0 in June 2017, but due to the urgent need of netizens to be able to use new emoji at the earliest possible date, the Unicode Technical Committee (UTC) has a habit (policy?) of fast-tracking emoji characters into the Unicode standard out of synchronization with the corresponding ISO standard (ISO/IEC 10646). On January 26 these 74 emoji characters were authorized for inclusion in the Unicode 9.0 beta, and unless any national bodies have strong and compelling objections to any of these emoji characters in the current CD ballot (which closes 29 February 2016), then these 74 emoji characters will definitely be in Unicode 9.0. A final decision will be made when the UTC meets in early May 2016.

In the end, at the UTC meeting in May 2016, the UTC decided to only accept 72 emoji characters. At the request of Apple (in response to several well-publicized emoji gun incidents, and a campaign against adding more violent emoji to Unicode), U+1F946 RIFLE and U+1F93B MODERN PENTATHLON (which includes shooting as one of its disciplines) were de-emojified, and will be encoded in Unicode 9.0 as plain non-emoji symbols. Of course, people can still use U+1F946 🥆 RIFLE (or various combinations of the letters A-Z, and many other Unicode characters) to threaten other people in text messages, but the threats will not need to be taken seriously because the rifle character will not be displayed in colour (and it is quite likely that major vendors will not support this character at all in their fonts).

More Emoji to Look Forward to ...

Proposals by Jennifer 8. Lee and friends to encode emoji characters representing chopsticks, dumplings, fortune cookies, and Chinese takeout boxes were joyfully accepted by the shadowy Emoji subcommittee at the January UTC meeting, but they were submitted too late for inclusion in Unicode 9.0 — we can look forwrd to welcoming them into Unicode 10.0 in June 2017.

Alolita Sharma (@alolita) : #UTC146 Peter Edberg accepts #dumpling #chopsticks #fortunecookie #takeoutbox originals from emoji designer YiyingLu (25 January 2016)

It's Not All About Emoji !

Emoji make up 99% of the noise and hype surrounding Unicode 9.0, but they account for only 1% of the new characters.

7,227 of the 7,426 non-emoji characters to be added to Unicode 9.0 are included in ISO/IEC 10646:2014 (4th ed.) Amendment 2, and are highlighted in this document (along with one currency sign, nine CJK unified ideographs, 36 emoji characters, and 5 emoji modifier characters which were fast-tracked into Unicode 8.0). These characters have all been through at least two rounds of ISO technical ballots, and they are now stable (they cannot be moved, removed, or renamed). The remaining 199 characters are included in the Committe Draft for ISO/IEC 10646:2016 (5th ed.) (full draft is downloadable as N4446). This edition has not yet completed its two rounds of technical ballots by ISO national bodies, but the UTC has decided to fast-track the Adlam script, the Newa script, and Japanese TV symbols (in addition to the 74 emoji discussed above) into Unicode 9.0. It is not unusual for the UTC to fast-track urgently-required characters (such as currency symbols and emoji) into a version of Unicode before they have completed their final technical ballot, but it is unprecedented to fast-track complete scripts, especially when the first technical ballot has not yet completed.

Newa in particular has been a very difficult script to get encoded because of technical and political differences of opinion about what characters to include and the encoding model to use (see the long list of documents relating to Newa in the table below). As recently as the first ballot on the Committee Draft for ISO/IEC 10646 in August 2015 the UK national body expressed concerns over the encoding of murmured resonants as atomic characters (L2/15-262 p. 16), so the encoding of Newa cannot be considered to be uncontroversial. By fast-tracking Adlam and Newa into Unicode 9.0, the UTC has effectively stiffled any ISO national body opposition to the Newa repertoire that the UTC has agreed upon. The CD ballot for ISO/IEC 10646 closes 29 February 2016, which theoretically allows the UTC time to tweak (or even withdraw) any of the fast-tracked characters in response to ballot comments by ISO national bodies, but any requests to change the character repertoire, character positions or character names for Newa or Adlam in the final ISO technical ballot (DIS ballot) later this year will have to be rejected as the encoding of Newa and Adlam is already a fait accompli.

Fast-tracked characters from the ISO/IEC 10646 CD are marked ** in the tables below.

7,297 of the 7,500 new characters in Unicode 9.0 belong to six new scripts :

Osage [Osge] : An alphabet for the Osage language which was devised between 2004 and 2006 for use by the Osage Nation in the USA.
Newa [Newa] : A Brahmic script used in Nepal to write Newar (Nepal Bhasa).
Bhaiksuki [Bhks] : A Brahmic script used for writing Buddhist manuscripts and inscriptions in the region of northern India and Tibet during the 11th and 12th centuries.
Marchen [Marc] : A Brahmic script used for writing the extinct Zhang Zhung language in the Bon religious tradition in Tibet.
Tangut [Tang] : An ideographic script used by the Tangut people to write the extinct Tangut language in the Western Xia and in China (Yuan and Ming dynasties) during the 11th through 16th centuries.
Adlam [Adlm] : An alphabetic script devised by the brothers Ibrahima Barry and Abdoulaye Barry during the late 1980s, in order to represent the Fulani language.

Inscription in the Marchen script on the library of the Yungdrung Bon Monastery in Dolanji (Himachal Pradesh)

Photograph © Chris Hatchell

Of the 7,500 characters added to Unicode 9.0 (including the 74 emoji), 7,357 characters are included in 11 new blocks, and 143 characters are added to existing blocks, as detailed in the two tables below. The code points and character names for all these characters are now fixed, and will not be changed. Draft official Unicode data files are available here, and I have made a plain text list of all the new characters to be added to Unicode 9.0 available here.

Characters Added to New Blocks
Block Name	Range	Characters / Source Documents
Cyrillic Extended-C	1C80..1C8F	9 letters used in early Church Slavonic (1C80..1C88). Aleksandr Andreev, Yuri Shardt, and Nikita Simmons, "Proposal to Use Standardized Variation Sequences to Encode Church Slavonic Glyph Variants in Unicode" (2014-07-20) [L2/13-153] Aleksandr Andreev, Yuri Shardt, and Nikita Simmons, "Proposal to Encode Additional Cyrillic Characters used in Early Church Slavonic Printed Books" (2014-08-20) [WG2 N4607 \|\| L2/14-196]
Osage	104B0..104FF	72 letters for Osage: 36 uppercase letters (104B0..104D3) and 36 lowercase letters (104D8..104FB). Michael Everson, Herman Mongrain Lookout, and Cameron Pratt, "Preliminary proposal to encode the Osage script in the UCS" (2014-02-20) [WG2 N4548 \|\| L2/14-068] Michael Everson, Herman Mongrain Lookout, and Cameron Pratt, "Proposal to encode Latin characters for Osage in the UCS" (2014-07-30) [WG2 N4587 \|\| L2/14-175] Michael Everson, Herman Mongrain Lookout, and Cameron Pratt, "Final proposal to encode the Osage script in the UCS" (2014-09-21) [WG2 N4619 \|\| L2/14-214]
Newa **	11400..1147F	92 characters for Newa: 53 letters (11400..11434); 13 vowel signs (11435..11441); 7 other signs (11442..11448); an Om character (11449); a Siddhi character (1144A); 5 punctuation marks (1144B..1144F); 10 digits (11450..11459); a placeholder mark (1145B); and an insertion sign (1145D). Anshuman Pandey, "Preliminary Proposal to Encode the Prachalit Nepal Script in ISO/IEC 10646" (2011-05-03) [WG2 N4038 \|\| L2/11-152] Anshuman Pandey, "Preliminary Proposal to Encode the Newar Script in ISO/IEC 10646" (2012-02-29) [WG2 N4184 \|\| L2/12-003] Dev Dass Manandhar, Samir Karmacharya and Bishnu Chitrakar, "Proposal for the Nepālalipi script in the UCS" (2012-02-05) [WG2 N4322 \|\| L2/12-120] Ken Whistler, "On the encoding of the “Nepaalalipi” / “Newar” script" (2012-05-11) [L2/12-200] Dev Dass Manandhar, "Response to L2/12-200 “On the encoding of ‘Nepaalalipi’/‘Newar’ script”" (2012-07-21) [L2/12-244] Dev Dass Manandhar, "Ancillary materials on “breathy consonants” in “Nepaalalipi”" (2012-07-21) [L2/12-245] Iain Sinclair, "Letter in support of N4184 and encoding the Newar script in ISO/IEC 10646" (2012-10-22) [WG2 N4372 \|\| L2/12-336] Dev Dass Manandhar, Samir Karmacharya and Bishnu Chitrakar, "Proposal for the Nepaalalipi script in the UCS" (2012-10-29) [L2/12-349] Pat Hall, "Proposal to Encode Nepal Himalayish Scripts in ISO/IEC 10646" (2012-10-08) [WG2 N4347 \|\| L2/12-365] Deborah Anderson, "Comparison between Newar and Nepaalalipi proposals (L2/12‐003 and L2/12‐349)" (2012-11-08) [L2/12-390] Dev Dass Manandhar, Bishnu Chitrakar and Samir Karmacharya, "To Unicode Technical Committee (UTC)" (2013-01-28) [L2/13-029] Dev Dass Manandhar, Samir Karmacharya and Bishnu Chitrakar, "Proposal to Encode Nepaalalipi Script in ISO/IEC 10646" (2014-04-10) [L2/14-086] Deborah Anderson, "Comparison between Newar and Nepaalalipi proposals (L2/12‐003 and L2/14‐086)" (2014-09-23) [L2/14-220] Deborah Anderson, "Recommendations to UTC from Script Meeting in Nepal" (2014-10-06) [L2/14-253] Anshuman Pandey, "Response to the Recommendation for Nepalese Scripts in L2/14-253" (2014-10-21) [WG2 N4602 \|\| L2/14-258] Ken Whistler, "Rationale for Atomic Encoding of Murmured Resonants in Newa" (2014-10-27) [L2/14-281] Anshuman Pandey, "Specimen Showing Representation of Murmured Consonants in the Newar Script" (2014-10-28) [WG2 N4552 \|\| L2/14-290] Ken Whistler, "Towards a Consensus Encoding of Newa" (2014-12-04) [WG2 N4660 \|\| L2/14-285]
Mongolian Supplement	11660..1167F	13 head marks for Mongolian (11660..1166C). Aaron Bell, Greg Eck, Andrew Glass, and Andrew West, "Encoding Mongolian head letters" (2014-01-17) [L2/14-030] Aaron Bell, Greg Eck, Andrew Glass, and Andrew West, "Proposal to encode five Mongolian head marks" (2014-02-06) [WG2 N4542 \|\| L2/14-067] China, "Comments on N4542 Five Mongolian Head Marks" (2014-02-19) [WG2 N4547 \|\| L2/14-081] China, "A Letter to the Authors of N4542 (5 Birgas in Mongolian Block" (2014-09-23) [WG2 N4632 \|\| L2/14-240]
Bhaiksuki	11C00..11C6F	97 characters for Bhaiksuki: 46 letters (11C00..11C08, 11C0A..11C2E); 12 vowel signs (11C2F..11C36, 11C38..11C3B); 5 other signs (11C3C..11C40); 2 dandas (11C41..11C42); a word separator (11C43); 2 gap fillers (11C44..11C45); 10 decimal digits (11C50..11C59); 18 numbers (11C5A..11C6B); and a hundreds unit mark (11C6C). Anshuman Pandey and Dragomir Dimitrov, "Proposal to Encode the Bhaiksuki Script in ISO/IEC 10646" (2013-07-22) [WG2 N4469 \|\| L2/13-167] Anshuman Pandey and Dragomir Dimitrov, "Revised Proposal to Encode the Bhaiksuki Script in ISO/IEC 10646" (2013-10-27) [WG2 N4489 \|\| L2/13-194] Anshuman Pandey and Dragomir Dimitrov, "Revised Proposal to Encode the Bhaiksuki Script in ISO/IEC 10646" (2014-01-27) [L2/14-036] Anshuman Pandey and Dragomir Dimitrov, "Final Proposal to Encode the Bhaiksuki Script in ISO/IEC 10646" (2014-04-23) [WG2 N4573 \|\| L2/14-091]
Marchen	11C70..11CBF	68 characters for Marchen: 2 marks (11C70..11C71); 30 letters (11C72..11C8F); 29 subjoined letters (11C92..11CA7, 11CA9..11CAF); 5 vowel signs (11CB0..11CB4); and 2 other signs (11CB5..11CB6). Andrew West, "Proposal to encode the Marchen script in the SMP of the UCS" (2011-04-30) [WG2 N4032 \|\| L2/11-140] Andrew West, "Final proposal to encode the Marchen script in the SMP of the UCS" (2013-10-22) [WG2 N4491 \|\| L2/13-197]
Ideographic Symbols and Punctuation	16FE0..16FFF	1 iteration mark for Tangut (16FE0). See under Tangut.
Tangut	17000..187FF	6,125 Tangut ideographs (17000..187EC) [characters are named algorithmically based on their code point, as TANGUT IDEOGRAPH-hhhhh]. Richard Cook (UC Berkeley Script Encoding Initiative), "Proposal to encode Tangut characters in UCS Plane 1" (2007-05-09) [WG2 N3297 \|\| L2/07-143] [Multi Column Chart : WG2 N3297A \|\| L2/07-144] [Single Column Chart: WG2 N3297B \|\| L2/07-145] Richard Cook (UC Berkeley Script Encoding Initiative), "Tangut Proposal Code Chart Update" (2007-07-24) [L2/07-229] Richard Cook (UC Berkeley Script Encoding Initiative), "Tangut Background" (2007-09-01) [WG2 N3307 \|\| L2/07-289] China, "Response to UC Berkeley’s proposals on Tangut" (2007-09-16) [WG2 N3338 \|\| L2/07-301] Richard Cook, "Expert feedback on Chinese NB input on WG2/N3297 Tangut Encoding Proposal" (2007-09-17) [WG2 N3343 \|\| L2/07-302] UK, "Comments on N3297: Proposal to encode Tangut characters in UCS Plane 1 and Charts" (2008-04-19) [WG2 N3448 \|\| L2/08-175] China and US, "Comments on N3297: Proposal to encode Tangut characters in UCS Plane 1 and charts" (2008-04-22) [WG2 N3467 \|\| L2/08-187] Richard Cook, [Five column chart] (2008-07-15 / 2010-04-16) [WG2 N3822 \|\| L2/08-259] Richard Cook, "Single-Column Tangut Code Chart (using Column G font)" (2008-09-03) [L2/08-336] UK, "Review of Proposed Tangut Repertoire" (2008-09-01 (rev. 2008-09-07)) [WG2 N3496 \|\| L2/08-337] Michael Everson and Andrew West, "Expert Feedback on the proposed Tangut character set in PDAM 6.2" (2008-09-24) [WG2 N3498 \|\| L2/08-341] Richard Cook and Ken Lunde, "The UCS Tangut Repertory" (2008-10-10) [WG2 N3521 \|\| L2/08-349] China, "Response from Tangut scholars of China on the Tangut Unicode proposal" (2008-10-13) [N3539 \|\| L2/08-376] Erkki I. Kolehmainen, "Report from the Ad Hoc on Tangut" (2008-10-13) [N3541 \|\| L2/08-377] Michael Everson, Nathan Hill, Guillaume Jacques, Andrew West, Viacheslav Zaytsev, "Proposal for a revised Tangut character set for encoding in the SMP of the UCS" (2009-03-01) [WG2 N3577 \|\| L2/09-095] Michael Everson, Nathan Hill, Guillaume Jacques, Andrew West, Viacheslav Zaytsev, "Proposal for a revised Tangut character set for encoding in the SMP of the UCS" (2009-04-08) [WG2 N3577R \|\| L2/09-115] [Appendix A: WG2 N3577R-A \|\| L2/09-116] [Appendix B: WG2 N3577R-B \|\| L2/09-117] Deborah Anderson and Richard Cook, "Request for Tangut font and mappings from N3577 to Amendment 7 repertoire" (2009-03-04) [WG2 N3586] Peter Constable, "Tangut Ad-Hoc Meeting Report" (2009-04-20) [WG2 N3629 \|\| L2/09-169] China, Ireland, UK, "Final proposal for encoding the Tangut script in the SMP of the UCS" (2010-04-05) [WG2 N3797 \|\| L2/10-095] [Appendix A: WG2 N3797-A] [Appendix B: WG2 N3797-B] Deborah Anderson and Richard Cook, "Comments on Tangut proposal N3797" (2010-04-16) [WG2 N3821 \|\| L2/10-131] Deborah Anderson, "Tangut Ad hoc report" (2010-04-21) [WG2 N3833 \|\| L2/10-141] UK, "Report on Tangut Encoding" (2011-05-22) [WG2 N4033 \|\| L2/11-214] [Appendix A: WG2 N4033A \|\| L2/11-214] [Appendix B: WG2 N4033B \|\| L2/11-214] Michael Everson and Andrew West, "Tangut chart to supplement N4033 'Report on Tangut Encoding'" (2011-05-26) [WG2 N4083 \|\| L2/11-204] Richard Cook and Deborah Anderson, Script Encoding Initiative, UC Berkeley, "Comments on Tangut report N4033" (2011‐06‐01) [WG2 N4094] Andrew West, Viacheslav Zaytsev, Michael Everson, "Proposal to encode the Tangut script in the UCS" (2012-10-02) [WG2 N4325 \|\| L2/12-313] Michael Everson and Andrew West, "Code chart for Tangut ideographs and Tangut radicals" (2012-10-02) [WG2 N4327 \|\| L2/12-315] China, "Comments on N4325, 4326 and N4327 (Tangut)" (2012-10-20) [WG2 N4370] China, "Explanation on the Re-facture of Tangut Fonts" (2013-06-10) [WG2 N4455] Deborah Anderson, SEI, UC Berkeley, "Summary of Tangut meeting (Beijing, China)" (2013-12-10) [WG2 N4516 \|\| L2/13-241] Andrew West, Michael Everson, Han Xiaomang, Jia Changye, Jing Yongshi, Viacheslav Zaytsev, "Proposal to encode the Tangut script in the UCS" (2014-01-21) [WG2 N4522 \|\| L2/14-023] Andrew West, Michael Everson, Han Xiaomang, Jia Changye, Jing Yongshi, Viacheslav Zaytsev, "Code chart for the Tangut script" (2014-01-21) [WG2 N4525 \|\| L2/14-021] Andrew West, Viacheslav Zaytsev, Sun Bojun, Michael Everson, "Tangut glyph corrections" (2014-10-01) [WG2 N4588R2 \|\| L2/14-209] China, "Review of N4558R Tangut glyph corrections" (2014-09-29) [WG2 N4640] Deborah Anderson, "Ad Hoc Reports for Tangut and Khitan Large Script" (2014-09-29) [WG2 N4642 \|\| L2/14-246] Andrew West, Viacheslav Zaytsev, Michael Everson, "Discussion of Tangut character L2008-4148" (2014-12-01) [WG2 N4650 \|\| L2/14-301] Andrew West, Michael Everson, Viacheslav Zaytsev, "Review of Tangut repertoire in DAM ballot" (2015-07-16) [WG2 N4667 \|\| L2/15-175] China, "Reply to WG2N4650 and WG2N4667 on Tangut" (2015-10-13) [WG2 N4684 \|\| L2/15-279]
Tangut Components	18800..18AFF	755 Tangut radicals and character components (18800..18AF2). Michael Everson and Andrew West, "Proposal to encode Tangut Radicals and CJK Strokes in the UCS" (2008-09-01) [WG2 N3495 \|\| L2/08-335] Richard Cook and Deborah Anderson, "Comments on the Tangut radicals and strokes proposal (N3495 = L2/08‐335)" (2008-10-29) [L2/08-399] Andrew West, Viacheslav Zaytsev, Michael Everson, "Proposal to encode Tangut radicals in the UCS" (2012-10-02) [WG2 N4326 \|\| L2/12-314] Michael Everson and Andrew West, "Code chart for Tangut ideographs and Tangut radicals" (2012-10-02) [WG2 N4327 \|\| L2/12-315] Andrew West, Viacheslav Zaytsev, Sun Bojun, Michael Everson, "Proposal to encode Tangut radicals in the UCS" (2014-09-30) [WG2 N4636 \|\| L2/14-228]
Glagolitic Supplement	1E000..1E02F	38 combining letters for Glagolitic (1E000..1E006, 1E008..1E018, 1E01B..1E021, 1E023..1E024, 1E026..1E02A). Aleksandr Andreev, Heinz Miklas, and Yuri Shardt, "Proposal to Encode Combining Glagolitic Letters in Unicode" (2014-08-20) [WG2 N4608 \|\| L2/14-087] Ralph Cleminson and David Birnbaum, "Expert Feedback on L2/14-087 Proposal to Encode Additional Glagolitic Characters" (2014-04-27) [WG2 N4608 \|\| L2/14-103] Ralph Cleminson and David Birnbaum, "Additional Expert Feedback on L2/14‐087 Proposal to Encode Additional Glagolitic Characters" (2014-07-21) [WG2 N4608 \|\| L2/14-165]
Adlam **	1E900..1E95F	87 characters for Adlam: 34 uppercase letters (1E900..1E921); 34 lowercase letters (1E922..1E943); 7 marks (1E944..1E94A); 10 digits (1E950..1E959); and 2 punctuation marks (1E95E..1E95F). Michael Everson, "Preliminary proposal for encoding the Adlam script in the SMP of the UCS" (2012-10-28) [WG2 N4488 \|\| L2/13-191] Michael Everson, "Proposal for encoding the Adlam script in the SMP of the UCS" (2014-09-23) [WG2 N4628 \|\| L2/14-219]

Leaf from a Tangut Buddhist manuscript (Great Perfection of Wisdom Sutra)

Tang. 334/249

Characters Added to Existing Blocks
Block Name	Range	Characters / Source Documents
Arabic Extended-A	08A0..08FF	5 Arabic letters for Bravanese (08B6..08BA). Hamid Banafunzi, Marghani Banafunzi, and Maxamed Nuur, "Proposal to encode five Arabic script characters for the Bravanese (Chimiini)" (2014-08-31) [L2/13-178] Roozbeh Pournader, "Proposal to encode four Arabic characters for Bravanese" (2014-11-06) [WG2 N4498 \|\| L2/13-223] Roozbeh Pournader and Shervin Afshar, "Proposal to Encode Arabic Letter Teh with Small Teh Above for Bravanese" (2014-11-01) [L2/13-293] 3 Arabic letters for Warsh-based orthographies (08BB..08BD). Lorna Evans (SIL International), "Supporting the Warsh orthography for Arabic script" (2014-04-29) [L2/14-104] Lorna Evans (SIL International), "Proposal to encode Warsh‐based Arabic script characters" (2014-08-15) [WG2 N4597 \|\| L2/14-211] 15 Quranic marks used in Pakistani printing (08D4..08E2). Lateef Sagar Shaikh, "Proposal to encode Quranic marks used in Quran published in Pakistan" (2014-04-24) [L2/14-095] Lateef Sagar Shaikh, "Proposal to encode Quranic Alternate Dammatan used in Quran published in Pakistan" (2014-04-25) [L2/14-096] Roozbeh Pournader, "Proposal to encode fourteen Pakistani Quranic marks" (2014-07-27) [WG2 N4589 \|\| L2/14-105] Lateef Sagar Shaikh, "Proposal to encode Quranic mark Ar-Rub used in Quran published in Pakistan" (2014-08-11) [WG2 N4592 \|\| L2/14-148]
Kannada	0C80..0CFF	1 spacing candrabindu sign (0C80). Vinodh Rajan, "Proposal to encode Kannada Sign Spacing Candrabindu" (2014-07-18) [WG2 N4591 \|\| L2/14-153]
Malayalam	0D00..0D7F	3 chillu letters (0D54..0D56). Cibu Johny, "Proposal to encode MALAYALAM LETTER CHILLU LLL" (2013-05-15) [WG2 N4428 \|\| L2/13-063] Cibu Johny, "Proposal to encode MALAYALAM LETTER CHILLU M" (2014-01-08) [WG2 N4539 \|\| L2/14-013] Cibu Johny, "Proposal to encode MALAYALAM LETTER CHILLU Y" (2013-12-26) [WG2 N4539 \|\| L2/14-017] 1 para sign (0D4F). Cibu Johny, "Proposal to encode MALAYALAM SIGN PARA" (2014-01-16) [WG2 N4538 \|\| L2/14-016] 10 characters for fractions (0D58..0D5E, 0D76..0D78). Shriramana Sharma, "Proposal to encode Malayalam minor fractions" (2013-04-25) [WG2 N4429 \|\| L2/13-051]
Combining Diacritical Marks Supplement **	1DC0..1DFF	1 combining deletion mark for Newa (1DFB). Ken Whistler, "Towards a Consensus Encoding of Newa" (2014-11-07) [WG2 N4660 \|\| L2/14-285]
Miscellaneous Technical	2300..23FF	4 power button symbols (23FB..23FE). Terence Eden, Joe Loughry, and Bruce Nordman, "Proposal to Include IEC Power Symbols" (2014-02-14) [WG2 N4567 \|\| L2/14-009] Michael Everson, "Towards a proposal to encode power symbols in the UCS" (2014-02-04) [WG2 N4535 \|\| L2/14-059]
Supplemental Punctuation	2E00..2E7F	1 punctuation mark for Slavonic (2E43: DASH WITH LEFT UPTURN). Aleksandr Andreev, Yuri Shardt, and Nikita Simmons, "Proposal to Encode a Slavonic Punctuation Mark in Unicode" (2014-02-04) [WG2 N4534 \|\| L2/13-238] 1 suspension mark for Byzantine Greek (2E44: DOUBLE SUSPENSION MARK). Dumbarton Oaks (Joel Kalvesmaki), "Proposal to encode GREEK BYZANTINE DOUBLE SUSPENSION MARK" (2014-07-18) [WG2 N4595 \|\| L2/14-157]
Latin Extended-D	A720..A7FF	1 letter for Unifon (A7AE: LATIN CAPITAL LETTER SMALL CAPITAL I). Michael Everson, "Proposal to encode “Unifon” and other characters in the UCS" (2012-04-29) [WG2 N4262 \|\| L2/12-138] Michael Everson, "Revised proposal to encode Unifon characters in the UCS" (2014-02-24) [WG2 N4549 \|\| L2/14-070]
Saurashtra	A880..A8DF	1 candrabindu sign (A8C5). Vinodh Rajan, "Proposal to encode Saurashtra Sign Candrabindu" (2014-08-07) [WG2 N4590 \|\| L2/14-163]
Ancient Greek Numbers	10140..1018F	2 signs for ancient Greek (1018D..1018E). Dumbarton Oaks (Joel Kalvesmaki), "Proposal to encode GREEK BYZANTINE INDICTION SIGN" (2014-07-18) [WG2 N4596 \|\| L2/14-156] Dumbarton Oaks (Joel Kalvesmaki), "Proposal to encode GREEK BYZANTINE NOMISMA SIGN" (2014-07-18) [WG2 N4594 \|\| L2/14-158]
Khojki	11200..1124F	1 sukun sign for Arabic transliteration in the Khojki script (1123E). Anshuman Pandey, "Proposal to Encode the Khojki Sign SUKUN in ISO/IEC 10646" (2014-05-05) [WG2 N4575 \|\| L2/14-133]
Enclosed Alphanumeric Supplement **	1F100..1F1FF	18 Japanese TV symbols required for ARIB STD-B62 (1F19B..1F1AC). Japan National Body, "Proposal to include additional Japanese TV symbols to ISO/IEC 10646" (2015-07-23) [WG2 N4671 \|\| L2/15-238]
Enclosed Ideographic Supplement **	1F200..1F2FF	1 Japanese TV symbol required for ARIB STD-B62 (1F23B). Japan National Body, "Proposal to include additional Japanese TV symbols to ISO/IEC 10646" (2015-07-23) [WG2 N4671 \|\| L2/15-238]
Miscellaneous Symbols and Pictographs **	1F300..1F5FF	2 emoji (see top of post): 1F57A : MAN DANCING 1F5A4 : BLACK HEART
Transport and Map Symbols **	1F680..1F6FF	5 emoji (see top of post): 1F6D1 : OCTAGONAL SIGN 1F6D2 : SHOPPING TROLLEY 1F6F4 : SCOOTER 1F6F5 : MOTOR SCOOTER 1F6F6 : CANOE
Supplemental Symbols and Pictographs **	1F900..1F9FF	67 emoji and emoticons (see top of post): 1F919 : CALL ME HAND 1F91A : RAISED BACK OF HAND 1F91B : LEFT-FACING FIST 1F91C : RIGHT-FACING FIST 1F91D : HANDSHAKE 1F91E : HAND WITH INDEX AND MIDDLE FINGERS CROSSED 1F920 : FACE WITH COWBOY HAT 1F921 : CLOWN FACE 1F922 : NAUSEATED FACE 1F923 : ROLLING ON THE FLOOR LAUGHING 1F924 : DROOLING FACE 1F925 : LYING FACE 1F926 : FACE PALM 1F927 : SNEEZING FACE 1F930 : PREGNANT WOMAN 1F933 : SELFIE 1F934 : PRINCE 1F935 : MAN IN TUXEDO 1F936 : MOTHER CHRISTMAS 1F937 : SHRUG 1F938 : PERSON DOING CARTWHEEL 1F939 : JUGGLING 1F93A : FENCER 1F93B : MODERN PENTATHLON 1F93C : WRESTLERS 1F93D : WATER POLO 1F93E : HANDBALL 1F940 : WILTED FLOWER 1F941 : DRUM WITH DRUMSTICKS 1F942 : CLINKING GLASSES 1F943 : TUMBLER GLASS 1F944 : SPOON 1F945 : GOAL NET 1F946 : RIFLE 1F947 : FIRST PLACE MEDAL 1F948 : SECOND PLACE MEDAL 1F949 : THIRD PLACE MEDAL 1F94A : BOXING GLOVE 1F94B : MARTIAL ARTS UNIFORM 1F950 : CROISSANT 1F951 : AVOCADO 1F952 : CUCUMBER 1F953 : BACON 1F954 : POTATO 1F955 : CARROT 1F956 : BAGUETTE BREAD 1F957 : GREEN SALAD 1F958 : SHALLOW PAN OF FOOD 1F959 : STUFFED FLATBREAD 1F95A : EGG 1F95B : GLASS OF MILK 1F95C : PEANUTS 1F95D : KIWIFRUIT 1F95E : PANCAKES 1F985 : EAGLE 1F986 : DUCK 1F987 : BAT 1F988 : SHARK 1F989 : OWL 1F98A : FOX FACE 1F98B : BUTTERFLY 1F98C : DEER 1F98D : GORILLA 1F98E : LIZARD 1F98F : RHINOCEROS 1F990 : SHRIMP 1F991 : SQUID

Previous Posts on Unicode Versions

What's new in Unicode 5.0 ? [November 2005]
What's new in Unicode 5.1 ? [June 2007]
What's new in Unicode 5.2 ? [April 2008]
What's new in Unicode 6.0 ? [November 2009]
What's new in Unicode 6.1 ? [June 2011]
What's new in Unicode 6.2 ? [May 2012]
What's new in Unicode 6.3 ? [January 2013]
What's new in Unicode 7.0 ? [October 2013]
What's new in Unicode 8.0 ? [April 2015]

Last modified: 2016-05-24

Tags:

Unicode

Index of BabelStone Blog Posts