Saturday, 25 March 2006
As discussed in my post on Good. Bad and Ugly Character Names, there are some Unicode characters with wrong or misleading names. Some people get very worked up about bad character names (or names that they perceive to be bad), and insist that Unicode must change the name. However, for reasons of stability with other standards, which may refer to Unicode characters by name rather than code point, character names once assigned cannot under any circumstances be changed.
Nevertheless, the names for 1,944 characters introduced in Unicode 1.0 are different from their current names (in the vast majority of cases the changes are very minor), but these name changes were required by the merger between the developing Unicode and ISO/IEC 10646 standards in 1993. One of the most noticeable difference between the 1.0 names (pre-merger) and the 1.1 names (post-merger) is that the 1.0 names reflect American English (because Unicode is in origins a consortium of American companies), whereas the 1.1 names have a more British English flavour (because, or so I am told, this was insisted upon by Bruce Paterson, who is British and was the editor of ISO/IEC 10646 until 2000).
Code Point | Unicode 1.0 Name | Unicode 1.1 Name |
---|---|---|
002E | PERIOD | FULL STOP |
002F | SLASH | SOLIDUS |
005C | BACKSLASH | REVERSE SOLIDUS |
00B6 | PARAGRAPH SIGN | PILCROW SIGN |
02D2 | MODIFIER LETTER CENTERED RIGHT HALF RING | MODIFIER LETTER CENTRED RIGHT HALF RING |
02D3 | MODIFIER LETTER CENTERED LEFT HALF RING | MODIFIER LETTER CENTRED LEFT HALF RING |
271B | OPEN CENTER CROSS | OPEN CENTRE CROSS |
271C | HEAVY OPEN CENTER CROSS | HEAVY OPEN CENTRE CROSS |
272B | OPEN CENTER BLACK STAR | OPEN CENTRE BLACK STAR |
272C | BLACK CENTER WHITE STAR | BLACK CENTRE WHITE STAR |
2732 | OPEN CENTER ASTERISK | OPEN CENTRE ASTERISK |
273C | OPEN CENTER TEARDROP-SPOKED ASTERISK | OPEN CENTRE TEARDROP-SPOKED ASTERISK |
2742 | CIRCLED OPEN CENTER EIGHT POINTED STAR | CIRCLED OPEN CENTRE EIGHT POINTED STAR |
32A5 | CIRCLED IDEOGRAPH CENTER | CIRCLED IDEOGRAPH CENTRE |
FE4A | SPACING CENTERLINE OVERSCORE | CENTRELINE OVERLINE |
FE4E | SPACING CENTERLINE UNDERSCORE | CENTRELINE LOW LINE |
Nevertheless, Unicode 1.1 did preserve a couple of American English spellings from Unicode 1.0 :
Since Unicode 1.1 the character names have remained predominantly British English, with U+1D355 TETRAGRAM FOR LABOURING and a further seven characters with CENTRE in their name. However, two American spellings did slip in with Unicode 3.0 :
Since the merger between Unicode and ISO/IEC 10646 only two characters have ever changed their name, namely U+00C6 and U+00E6, which were originally called LATIN CAPITAL LETTER A E and LATIN SMALL LETTER A E in Unicode 1.0, then changed to LATIN CAPITAL LIGATURE AE and LATIN SMALL LIGATURE AE in Unicode 1.1 after the merger with ISO/IEC 10646, and finally changed to their current names LATIN CAPITAL LETTER AE and LATIN SMALL LETTER AE in Unicode 2.0. The latter change was due to representations by the Danish Standards Association who considered these two characters to be letters rather than ligatures; but this caused so much trouble and acrimony that the respective committees of Unicode and ISO/IEC 10646 resolved never again to make any name changes, regardless of the severity of the mistake or the triviality of the change required (see the Unicode Standard Stability Policy).
Index of BabelStone Blog Posts