Sunday, 16 August 2009
The Holy Grail for historians of character encoding is to discover why the háček (ˇ)—a diacritical mark which looks like an inverted circumflex—is called a caron in character encoding standards such as Unicode. Until the recent popularisation of Unicode, the term caron seems to have been virtually unknown outside of character encoding standards, and has no obvious etymology or source, whereas the Czech name háček (diminutive form of hák "hook") was and still is a widely-used name for this mark (it is also less-commonly referred to as a wedge, inverted circumflex or inverted caret).
The question of the caron has time and time again vexed the Unicode and Unicore mailing lists, but thusfar no-one has been able to pin down when the term was coined or what the etymology of the term is. In 2001 Unicode Vice President Rick McGowan wrote a short tongue-in-cheek screenplay explaining how the term might have come into being :
How These Things Get Started, Chapter Two
Scene: The tiny cramped office of General Frump, commander of an obscure military base somewhere north of the 89th parallel... Everyone and their baby fur seal is suffering from walking pneumonia and dysentary...
General Frump: Thank you for your report on the enemy code, Corporal, is there anything else? I'm quite busy...
Corporal Dolt: Well, sir, there's the matter of the "hacek", which some thought might be confused with "hatchet", sir. But as you know, sir, there simply is no other term, and I thought, perhaps...
General Frump (eyes glazed): Carry -- ahem -- on, Corporal. Ahem. Carry -- ahem -- on.
Corporal Dolt (saluting): Very well, sir. "Caron" it is.
General Frump: Ahem... Dismissed, Corporal...
And seven years later Unicode guru Ken Whistler suggested that the following little dialogue at an ISO committee meeting may be closer to the truth:
Delegate from Slovakia: You can't spell "WITH HACEK" that way — it has a hacek on the C.
Convenor from Switzerland: Well, we can only use ASCII A-Z in character names.
Delegate from Slovakia: Well, it's spelled wrong, and that isn't acceptable to us.
Convenor from Switzerland (with a straight face): In Swiss French we call it a "caron", and there wouldn't be any trouble spelling that.
Delegate from Slovakia: Really?
Convenor from Switzerland: Yes, so let's just use that term instead. (aside to editor) Just change them all to "WITH CARON" and let's move on.
A couple of the more serious etymologies that have been proposed for the word "caron" are :
The history of the term "caron" is equally obscure, and so far the furthest back that this character name can be taken is to the mid 1980s, when, according to a Unicode FAQ, the "caron" is referred to in publications such as Frank Romano's The TypEncyclopedia (1984) and IBM's "Green Book" (National Language Support Reference Manual, 1986).
I thought that I once read that someone had once offered a $100 reward for anyone who could establish the origins of the term, but I can no longer find any evidence for this possible delusion of mine. Anyway, although I'm far from being in a position to claim this hypothetical reward, with a little digging it is still possible to take the history of the Caron back quite a bit earlier than the mid 1980s.
But before looking in detail at the history of the name, let's take a look at the 45 characters in Unicode 5.1 that have the term CARON in their name :
Code Point | Character | Character Name | Unicode 1.0 Name |
---|---|---|---|
02C7 | ˇ | CARON | MODIFIER LETTER HACEK |
030C | ̌ | COMBINING CARON | NON-SPACING HACEK |
032C | ̬ | COMBINING CARON BELOW | NON-SPACING HACEK BELOW |
010C | Č | LATIN CAPITAL LETTER C WITH CARON | LATIN CAPITAL LETTER C HACEK |
010D | č | LATIN SMALL LETTER C WITH CARON | LATIN SMALL LETTER C HACEK |
010E | Ď | LATIN CAPITAL LETTER D WITH CARON | LATIN CAPITAL LETTER D HACEK |
010F | ď | LATIN SMALL LETTER D WITH CARON | LATIN SMALL LETTER D HACEK |
011A | Ě | LATIN CAPITAL LETTER E WITH CARON | LATIN CAPITAL LETTER E HACEK |
011B | ě | LATIN SMALL LETTER E WITH CARON | LATIN SMALL LETTER E HACEK |
013D | Ľ | LATIN CAPITAL LETTER L WITH CARON | LATIN CAPITAL LETTER L HACEK |
013E | ľ | LATIN SMALL LETTER L WITH CARON | LATIN SMALL LETTER L HACEK |
0147 | Ň | LATIN CAPITAL LETTER N WITH CARON | LATIN CAPITAL LETTER N HACEK |
0148 | ň | LATIN SMALL LETTER N WITH CARON | LATIN SMALL LETTER N HACEK |
0158 | Ř | LATIN CAPITAL LETTER R WITH CARON | LATIN CAPITAL LETTER R HACEK |
0159 | ř | LATIN SMALL LETTER R WITH CARON | LATIN SMALL LETTER R HACEK |
0160 | Š | LATIN CAPITAL LETTER S WITH CARON | LATIN CAPITAL LETTER S HACEK |
0161 | š | LATIN SMALL LETTER S WITH CARON | LATIN SMALL LETTER S HACEK |
0164 | Ť | LATIN CAPITAL LETTER T WITH CARON | LATIN CAPITAL LETTER T HACEK |
0165 | ť | LATIN SMALL LETTER T WITH CARON | LATIN SMALL LETTER T HACEK |
017D | Ž | LATIN CAPITAL LETTER Z WITH CARON | LATIN CAPITAL LETTER Z HACEK |
017E | ž | LATIN SMALL LETTER Z WITH CARON | LATIN SMALL LETTER Z HACEK |
01C4 | DŽ | LATIN CAPITAL LETTER DZ WITH CARON | LATIN CAPITAL LETTER D Z HACEK |
01C5 | Dž | LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON | LATIN LETTER CAPITAL D SMALL Z HACEK |
01C6 | dž | LATIN SMALL LETTER DZ WITH CARON | LATIN SMALL LETTER D Z HACEK |
01CD | Ǎ | LATIN CAPITAL LETTER A WITH CARON | LATIN CAPITAL LETTER A HACEK |
01CE | ǎ | LATIN SMALL LETTER A WITH CARON | LATIN SMALL LETTER A HACEK |
01CF | Ǐ | LATIN CAPITAL LETTER I WITH CARON | LATIN CAPITAL LETTER I HACEK |
01D0 | ǐ | LATIN SMALL LETTER I WITH CARON | LATIN SMALL LETTER I HACEK |
01D1 | Ǒ | LATIN CAPITAL LETTER O WITH CARON | LATIN CAPITAL LETTER O HACEK |
01D2 | ǒ | LATIN SMALL LETTER O WITH CARON | LATIN SMALL LETTER O HACEK |
01D3 | Ǔ | LATIN CAPITAL LETTER U WITH CARON | LATIN CAPITAL LETTER U HACEK |
01D4 | ǔ | LATIN SMALL LETTER U WITH CARON | LATIN SMALL LETTER U HACEK |
01D9 | Ǚ | LATIN CAPITAL LETTER U WITH DIAERESIS AND CARON | LATIN CAPITAL LETTER U DIAERESIS HACEK |
01DA | ǚ | LATIN SMALL LETTER U WITH DIAERESIS AND CARON | LATIN SMALL LETTER U DIAERESIS HACEK |
01E6 | Ǧ | LATIN CAPITAL LETTER G WITH CARON | LATIN CAPITAL LETTER G HACEK |
01E7 | ǧ | LATIN SMALL LETTER G WITH CARON | LATIN SMALL LETTER G HACEK |
01E8 | Ǩ | LATIN CAPITAL LETTER K WITH CARON | LATIN CAPITAL LETTER K HACEK |
01E9 | ǩ | LATIN SMALL LETTER K WITH CARON | LATIN SMALL LETTER K HACEK |
01EE | Ǯ | LATIN CAPITAL LETTER EZH WITH CARON | LATIN CAPITAL LETTER YOGH HACEK |
01EF | ǯ | LATIN SMALL LETTER EZH WITH CARON | LATIN SMALL LETTER YOGH HACEK |
01F0 | ǰ | LATIN SMALL LETTER J WITH CARON | LATIN SMALL LETTER J HACEK |
021E | Ȟ | LATIN CAPITAL LETTER H WITH CARON | |
021F | ȟ | LATIN SMALL LETTER H WITH CARON | |
1E66 | Ṧ | LATIN CAPITAL LETTER S WITH CARON AND DOT ABOVE | |
1E67 | ṧ | LATIN SMALL LETTER S WITH CARON AND DOT ABOVE |
As can be seen from the above table, in the first version of the Unicode Standard, published in October 1991, these characters were actually named using HACEK, but when Unicode and ISO/IEC 10646 merged the character names were changed to use CARON, in line with the character names given in the draft version of ISO/IEC 10646, and so from Unicode version 1.1 (published June 1993) onwards the characters have been named using CARON. So where did the term CARON come from ?
Well, the immediate inspiration for using the name CARON in ISO/IEC 10646 must have been earlier ISO character encoding standards such as ISO/IEC 6937 (first published in 1983) and ISO/IEC 8859-2 (first published in 1985 as ECMA-94 "Latin Alphabet No.2"), which both use the term CARON. It was quite natural for ISO/IEC 10646 to borrow the term CARON from these earlier standards as the first editor of 10646, Hugh McGregor Ross, was also involved in the development of character encoding standards such as ISO/IEC 6937. It was equally natural for the first edition of Unicode to use HACEK, as the Xerox Character Code Standard (May 1986) which was developed by Unicode's founding father, Joe Becker, has "317 Hachek accent = caron" (see the comment by MMcM to this Language Hat post).
The earliest coded character sets to use the term CARON that I have managed to find are the bibliographic-usage character sets DIN 31624 (July 1979) and ISO 5426 (1980), which both have "CARON (HÁČEK)" at position 4/15, which is the same position as the plainly named "CARON" in ISO 6937. It would thus seem that the original sources for the ISO use of the name CARON were character sets intended for bibliographic use. Such character sets needed the "háček mark so that, for example, the Library of Congress could correctly catalogue books such as Jaroslav Hašek's The Good Soldier Švejk, but why did they call it a "caron", and only give "háček" as a parenthetical comment? Looking back to earlier bibliographic standards turns out to be disappointing. During the late 1960s and early 1970s computer systems designed specifically for libraries began to support the háček diacritic, but it seems to have been almost universally referred to as either hacek or háček in these systems, with no mention of a "caron" :
It seems that this line of enquiry is a dead end. The authors of DIN 31624 must have got the name from some other non-bibliographic source, but where? A year earlier than the publication of the German bibliographic character standard in 1979, Stanley Rice's Book design—systematic aspects has a "caret" symbol and a "caron" symbol next to each other, and separate from a "circumflex" accent :
Stanley Rice, Book design—systematic aspects (1978) page 103
This suggests that Rice thought of the caron not as an inverted circumflex used as a diacritic mark, but as an inverted caret used as an editorial mark, which seems to be taking us down a different path. But unfortunately, at this point the trail seems to go cold, with no other sightings of the "caron" in the earlier years of the 1970s. Then suddenly in 1967 the caron turns up in a most authoritative source, the 1967 edition of the United States Government Printing Office Style Manual :
United States Government Printing Office Style Manual (1967 edition) page 180
[Click on image to see the whole page (and your starter for 10, what character or characters on this page and the next are not yet encoded in Unicode?)]
The US GPO Style Manual has been revised and reprinted many times since its first edition in 1894 (linking to the Internet Archive copy as it seems to me that 90% of pre-20th century books that used to be downloadable in the UK from Google Book Search are no longer downloadable outside of the US ... grrr), but none of the editions prior to 1967 include a character named "caron". For example, this is the same table of characters in the previous edition to the 1967 edition, published in 1959 :
United States Government Printing Office Style Manual (1959 edition) page 178
The layout of characters in the 1967 edition of the US Government Printing Office Style Manual matches that in Stanley Rice's 1978 book on book design, and must surely have been the immediate source for it. It seems highly probable that the GPO Style Manual was also the source for the name "caron" in the 1979 German bibliographic character standard (DIN 31624). I imagine that someone on the committee that developed DIN 31624 must have had a copy of the 1967 edition of the GPO Style Manual, and when it came to naming the háček character the owner of this prestigious volume must have objected to the native name of the character as it evidently conflicted with international usage, and so as to ensure compatibility with the ISO version of the standard (ISO 5426) that the DIN committee must have realised it would evolve into, the accent was named with its apparent English name, CARON, and its real name, HÁČEK, was merely given as an annotation. The rest is history.
But where did the authors of the 1967 edition of the GPO Style Manual get the idea of including an inverted caret shaped mark called a "caron" from? That remains a mystery, as I have been unable to find any suggestion of the use of the name "caron" for such a mark in any earlier book on typography. Frank Romano has stated that "caron" was the name for the Slovak diacritic given in the "giant books in the center of the order department" of the Mergenthaler Linotype Company in Broooklyn, but none of the Linotype specimen books and manuals that I have seen use the term caron (see below). Nevertheless, I think it is probable that the earliest use of the name "caron" predates and is unrelated to the development of multilingual or bibliographic-usage character encoding standards (in 1967, ASCII was only just revised to include lower case characters, and character encoding standards with "exotic" letters, symbols and accents were still a decade away).
Not only do we not know where the authors of the 1967 edition of the GPO Style Manual got the name "caron" from, we also cannot be sure what exactly the mark labelled "caron" was intended to represent. In the manual the "caron" is paired with the caret mark, which is distinct from the similar-shaped circumflex accent, so it may well be that the "caron" was not intended to represent the háček diacritical mark at all, but was rather intended to represent an inverted caret used for proofreading purposes. If this was the case, as I believe, then the introduction of the name "caron" into DIN 31624, and thence, spreading like a virus, to sundry other ISO character encoding standards, and ultimately infiltrating its way into the Unicode standard, was a mistake.
The inverted caret mark used in proofreading is shown in the table of proofreading marks in the 1967 edition of the GPO Style Manual :
United States Government Printing Office Style Manual (1967 edition) page 4
[Click on image to see the whole page]
And books on typography, dating back into the 19th century explicitly name this mark an "inverted caret" :
Theodore Low de Vinne, Correct Composition: A Treatise of Spelling, Abbreviations, the Compounding and Division of Words, the Proper Use of Figures and Numerals, Italic and Capital Letters, Notes, etc., with Observations on Punctuation and Proof-Reading (New York, 1901) page 322
Even some Linotype specimen books have an "inverted caret", which is paired with the "caret", and does not appear to be the same thing as the háček diacritical mark :
Specimen Book Linotype Faces (Mergenthaler Linotype Company, 1939) page 892
But the question remains, when did the "inverted caret" become the "caron" ? Did the editors of the 1967 GPO Style Manual devise the name "caron" themselves on a whim, or did they borrow it from some other source ? (Answers on a postcard, please.)
Index of BabelStone Blog Posts