Unicode version 13.0 released on 10 March 2020 introduces disunifications for nine Tangut characters ("ideographs" in Unicode terminology) which have subtle but semantically significant glyph differences. Seven of these disunifications relate to two Tangut components which Profs. Jiǎ Chángyè 賈常業 (Ningxia Academy of Social Sciences) and Jǐng Yǒngshí 景永時 (Beifang University of Nationalities) have identified as each actually being two distinct components. Tables 1 and 2 show example characters for the two distinct forms of Tangut Component 267 ( ~ ) and Tangut Component 316 ( ~ ), one example with the component on the left side and one example with the component on the right side.
Sources | (5 strokes) |
(3 strokes) |
||
---|---|---|---|---|
U+17F8B | U+18740 | U+17FC5 | U+17E50 | |
Sea of Writing 𘝞𗗚 (文海) | ||||
Homophones 𗙏𘙰 (同音) |
Sources | (5 strokes) |
(4 strokes) |
||
---|---|---|---|---|
U+18131 | U+18500 | U+18134 | U+18098 | |
Sea of Writing 𘝞𗗚 (文海) | ||||
Homophones 𗙏𘙰 (同音) |
When the Tangut script was first included in Unicode version 9.0 in June 2016, five pairs of Tangut characters with Component 267 and two pairs of Tangut characters with Component 316 were unified because their glyphs were identical in E. I. Kychanov's Tangut-Russian-English-Chinese dictionary (2006), Lǐ Fànwén's Tangut-Chinese Dictionary 夏漢字典 (2008), and other modern works of reference for the Tangut script. However, the differences between the two glyph forms of Components 267 and 316 are not just cosmetic, but are semantically significant, and are used to distinguish the seven unified pairs of characters in original printed texts and manuscripts from the Western Xia period. Table 3 shows these two components, and the now-disunified ideographs that include each component (see WG2 N4736 Tables 8 and 10 for complete lists of all Tangut ideographs with these components).
Existing Component | New Component | ||||
---|---|---|---|---|---|
Code Point | Component | Characters | Code Point | Component | Characters |
1890A | 18AFD | ||||
1893B | 18AFF |
In addition to the seven pairs of misunified Tangut ideographs with Components 267 and 316, there are two other pairs of Tangut ideographs with identical glyphs in modern Tangut works of reference which were misunified in Unicode version 9.0. All nine pairs of Tangut ideographs that have been disunified in Unicode version 13.0 are listed in Table 4, each with an example of the character in Homophones A or B edition (mouseover gives the source reference). In order to minimise disruption to existing Tangut data, the existing code point for each pair of disunified characters is assigned to the most common character, and the new code point added in Unicode 13.0 is assigned to the character that occurs less frequently. The result of this encoding decision is that the glyphs corresponding to four of the existing code points (17134, 175F6, 18139, 18147) remain unchanged; whereas the glyphs corresponding to five of the existing code points (17F0D, 17F8A, 17FA5, 184F1, 18736) are modified, and the existing glyphs are assigned to the new code points. Thus, characters which occur with high frequency in Tangut texts, such as
(negative prefix), (big), and (heaven), are not affected by the disunification, and do not need to be remapped to new code points. However, any of the characters listed under "New Ideographs" which occur in existing Unicode data do need to be remapped, although these characters mostly only occur in lexicographic or phonetic works (Sea of Writing, Homophones, Mixed Characters etc.).Existing Code Point | New Code Point | ||||||
---|---|---|---|---|---|---|---|
Code Point | Glyph | Reference / Reading / Meaning | Example | Code Point | Glyph | Reference / Reading / Meaning | Example |
17134 | L2008-3488 twe̱ pair, couple 對、雙 |
18D00 | L2008-3489 gja̱ foolish, stupid, clumsy 愚笨 |
||||
175F6 | L2008-1666 nə fox 狐 |
18D01 | L2008-1667 ta tail, east 尾、東 |
||||
17F0D | L2008-3436 sa̱ very close relative 至親 |
18D02 | L2008-3435 ɣu god, deity, divinity, supernatural being 神、神仙 |
||||
17F8A | L2008-2253 wjịj warehouse 倉庫 |
18D03 | L2008-2252 bju a kind of bird 鵑 |
||||
17FA5 | L2008-3683 sja the day after tomorrow 後日 |
18D04 | L2008-3684 śie a kind of bird 鳥名 |
||||
18139 | L2008-1317 twe to brush, to whisk 撣、搔、拂 |
18D05 | L2008-1318 ljij to jump, to leap 跳躍 |
||||
18147 | L2008-1734 tji a prefix representing no 不、莫、休、無; 否定前綴 |
18D06 | L2008-1735 kwej respectful 恭敬 |
||||
184F1 | L2008-1107 ŋwə heaven, emperor 天、皇 |
18D07 | L2008-1106 me̱ swallow 燕子 |
||||
18736 | L2008-4457 ljịj big, great, large 大、太、弘、巨、宏、奘、簡 |
18D08 | L2008-4456 tha wild goose 大雁 |
* All readings and meanings of the Tangut characters are taken from Lǐ Fànwén's 2008 Tangut-Chinese Dictionary.
In order to help migrate existing Tangut data to Unicode version 13.0, and correctly remap code points for disunified characters where necessary, the contexts in which each disunified Tangut character commonly occurs is given in Table 5. In most cases the context is simply a list of words that the character may occur in.
Existing Code Point | New Code Point | ||||
---|---|---|---|---|---|
Code Point | Glyph | Context | Code Point | Glyph | Context |
17134 | 18D00 | ||||
175F6 | 18D01 | Phonetic transcription |
|||
17F0D | 18D02 | ||||
17F8A | 18D03 | ||||
17FA5 | Phonetic transcription (Chinese xiè 泄謝, xuē 薛, xiàn 線, xiān 仙先, etc.) |
18D04 | |||
18139 | 18D05 | ||||
18147 | Negative prefix (e.g. , , , , etc.) |
18D06 | |||
184F1 | 18D07 | ||||
18736 | Everything except for 'wild goose' ( ), including , , , , , , etc. |
18D08 |
The proposal to disunify the Tangut characters listed in Table 4 was a collaborative effort between scholars from China, Russia and the UK, and involved several years of research and analysis. The issue of misunified Tangut characters was initially reported by Profs. Jiǎ and Jǐng at an international conference on the encoding of Khitan scripts held in Yinchuan, Ningxia, China in August 2016 under the auspices of the Script Encoding Initiative, and Andrew West was tasked with further investigating the issue and possible solutions. The detailed background investigation by West and Zaytsev was eventually submitted in February 2019, and the multinational proposal to disunify the nine characters was then submitted in May 2019. This proposal was considered and accepted at the June 2019 meeting of SC2/WG2 held in Redmond, Washington, USA, which was attended by Sūn Bójūn 孫伯君 and Andrew West on behalf of the proposal authors. The proposal was also accepted at the meeting of the Unicode Technical Committee (UTC) held in July 2019, and the new characters and glyph changes were subsequently incorporated into both the Unicode Standard version 13.0 and the corresponding international standard ISO/IEC 10646:2020 (6th edition).
NB The Tangut Yinchuan font supports the new characters and glyph changes introduced in Unicode version 13.0.
Last modified: 2020-11-29