BabelStone Blog


Sunday, 16 August 2009

Antedating the Caron

The Holy Grail for historians of character encoding is to discover why the háček (ˇ)—a diacritical mark which looks like an inverted circumflex—is called a caron in character encoding standards such as Unicode. Until the recent popularisation of Unicode, the term caron seems to have been virtually unknown outside of character encoding standards, and has no obvious etymology or source, whereas the Czech name háček (diminutive form of hák "hook") was and still is a widely-used name for this mark (it is also less-commonly referred to as a wedge, inverted circumflex or inverted caret).

The question of the caron has time and time again vexed the Unicode and Unicore mailing lists, but thusfar no-one has been able to pin down when the term was coined or what the etymology of the term is. In 2001 Unicode Vice President Rick McGowan wrote a short tongue-in-cheek screenplay explaining how the term might have come into being :


How These Things Get Started, Chapter Two

Scene: The tiny cramped office of General Frump, commander of an obscure military base somewhere north of the 89th parallel... Everyone and their baby fur seal is suffering from walking pneumonia and dysentary...

General Frump: Thank you for your report on the enemy code, Corporal, is there anything else? I'm quite busy...

Corporal Dolt: Well, sir, there's the matter of the "hacek", which some thought might be confused with "hatchet", sir. But as you know, sir, there simply is no other term, and I thought, perhaps...

General Frump (eyes glazed): Carry -- ahem -- on, Corporal. Ahem. Carry -- ahem -- on.

Corporal Dolt (saluting): Very well, sir. "Caron" it is.

General Frump: Ahem... Dismissed, Corporal...


And seven years later Unicode guru Ken Whistler suggested that the following little dialogue at an ISO committee meeting may be closer to the truth:


Delegate from Slovakia: You can't spell "WITH HACEK" that way — it has a hacek on the C.

Convenor from Switzerland: Well, we can only use ASCII A-Z in character names.

Delegate from Slovakia: Well, it's spelled wrong, and that isn't acceptable to us.

Convenor from Switzerland (with a straight face): In Swiss French we call it a "caron", and there wouldn't be any trouble spelling that.

Delegate from Slovakia: Really?

Convenor from Switzerland: Yes, so let's just use that term instead. (aside to editor) Just change them all to "WITH CARON" and let's move on.


A couple of the more serious etymologies that have been proposed for the word "caron" are :

  1. that the word "caron" is a portmanteau for "caret" (a inverted v shaped editorial symbol used to indicate an ommission) and "macron" (a diacritic mark shaped like a horizontal line that is placed over a letter), the name suggesting the (inverted) shape of the caret and the above-letter positioning of the macron (suggested by James Naughton in October 2001);
  2. that the word "caron" derives from the Russian word короной (or the related word some other Slavic language) meaning "crown" or "corona", as the accents sits on top of a letter like a little crown (suggested by Alexander Savenkov in December 2003).

The history of the term "caron" is equally obscure, and so far the furthest back that this character name can be taken is to the mid 1980s, when, according to a Unicode FAQ, the "caron" is referred to in publications such as Frank Romano's The TypEncyclopedia (1984) and IBM's "Green Book" (National Language Support Reference Manual, 1986).

I thought that I once read that someone had once offered a $100 reward for anyone who could establish the origins of the term, but I can no longer find any evidence for this possible delusion of mine. Anyway, although I'm far from being in a position to claim this hypothetical reward, with a little digging it is still possible to take the history of the Caron back quite a bit earlier than the mid 1980s.

But before looking in detail at the history of the name, let's take a look at the 45 characters in Unicode 5.1 that have the term CARON in their name :


Code PointCharacterCharacter NameUnicode 1.0 Name
02C7ˇCARONMODIFIER LETTER HACEK
030C ̌COMBINING CARONNON-SPACING HACEK
032C ̬COMBINING CARON BELOWNON-SPACING HACEK BELOW
010CČLATIN CAPITAL LETTER C WITH CARONLATIN CAPITAL LETTER C HACEK
010DčLATIN SMALL LETTER C WITH CARONLATIN SMALL LETTER C HACEK
010EĎLATIN CAPITAL LETTER D WITH CARONLATIN CAPITAL LETTER D HACEK
010FďLATIN SMALL LETTER D WITH CARONLATIN SMALL LETTER D HACEK
011AĚLATIN CAPITAL LETTER E WITH CARONLATIN CAPITAL LETTER E HACEK
011BěLATIN SMALL LETTER E WITH CARONLATIN SMALL LETTER E HACEK
013DĽLATIN CAPITAL LETTER L WITH CARONLATIN CAPITAL LETTER L HACEK
013EľLATIN SMALL LETTER L WITH CARONLATIN SMALL LETTER L HACEK
0147ŇLATIN CAPITAL LETTER N WITH CARONLATIN CAPITAL LETTER N HACEK
0148ňLATIN SMALL LETTER N WITH CARONLATIN SMALL LETTER N HACEK
0158ŘLATIN CAPITAL LETTER R WITH CARONLATIN CAPITAL LETTER R HACEK
0159řLATIN SMALL LETTER R WITH CARONLATIN SMALL LETTER R HACEK
0160ŠLATIN CAPITAL LETTER S WITH CARONLATIN CAPITAL LETTER S HACEK
0161šLATIN SMALL LETTER S WITH CARONLATIN SMALL LETTER S HACEK
0164ŤLATIN CAPITAL LETTER T WITH CARONLATIN CAPITAL LETTER T HACEK
0165ťLATIN SMALL LETTER T WITH CARONLATIN SMALL LETTER T HACEK
017DŽLATIN CAPITAL LETTER Z WITH CARONLATIN CAPITAL LETTER Z HACEK
017EžLATIN SMALL LETTER Z WITH CARONLATIN SMALL LETTER Z HACEK
01C4DŽLATIN CAPITAL LETTER DZ WITH CARONLATIN CAPITAL LETTER D Z HACEK
01C5DžLATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARONLATIN LETTER CAPITAL D SMALL Z HACEK
01C6džLATIN SMALL LETTER DZ WITH CARONLATIN SMALL LETTER D Z HACEK
01CDǍLATIN CAPITAL LETTER A WITH CARONLATIN CAPITAL LETTER A HACEK
01CEǎLATIN SMALL LETTER A WITH CARONLATIN SMALL LETTER A HACEK
01CFǏLATIN CAPITAL LETTER I WITH CARONLATIN CAPITAL LETTER I HACEK
01D0ǐLATIN SMALL LETTER I WITH CARONLATIN SMALL LETTER I HACEK
01D1ǑLATIN CAPITAL LETTER O WITH CARONLATIN CAPITAL LETTER O HACEK
01D2ǒLATIN SMALL LETTER O WITH CARONLATIN SMALL LETTER O HACEK
01D3ǓLATIN CAPITAL LETTER U WITH CARONLATIN CAPITAL LETTER U HACEK
01D4ǔLATIN SMALL LETTER U WITH CARONLATIN SMALL LETTER U HACEK
01D9ǙLATIN CAPITAL LETTER U WITH DIAERESIS AND CARONLATIN CAPITAL LETTER U DIAERESIS HACEK
01DAǚLATIN SMALL LETTER U WITH DIAERESIS AND CARONLATIN SMALL LETTER U DIAERESIS HACEK
01E6ǦLATIN CAPITAL LETTER G WITH CARONLATIN CAPITAL LETTER G HACEK
01E7ǧLATIN SMALL LETTER G WITH CARONLATIN SMALL LETTER G HACEK
01E8ǨLATIN CAPITAL LETTER K WITH CARONLATIN CAPITAL LETTER K HACEK
01E9ǩLATIN SMALL LETTER K WITH CARONLATIN SMALL LETTER K HACEK
01EEǮLATIN CAPITAL LETTER EZH WITH CARONLATIN CAPITAL LETTER YOGH HACEK
01EFǯLATIN SMALL LETTER EZH WITH CARONLATIN SMALL LETTER YOGH HACEK
01F0ǰLATIN SMALL LETTER J WITH CARONLATIN SMALL LETTER J HACEK
021EȞLATIN CAPITAL LETTER H WITH CARON
021FȟLATIN SMALL LETTER H WITH CARON
1E66LATIN CAPITAL LETTER S WITH CARON AND DOT ABOVE
1E67LATIN SMALL LETTER S WITH CARON AND DOT ABOVE

As can be seen from the above table, in the first version of the Unicode Standard, published in October 1991, these characters were actually named using HACEK, but when Unicode and ISO/IEC 10646 merged the character names were changed to use CARON, in line with the character names given in the draft version of ISO/IEC 10646, and so from Unicode version 1.1 (published June 1993) onwards the characters have been named using CARON. So where did the term CARON come from ?

Well, the immediate inspiration for using the name CARON in ISO/IEC 10646 must have been earlier ISO character encoding standards such as ISO/IEC 6937 (first published in 1983) and ISO/IEC 8859-2 (first published in 1985 as ECMA-94 "Latin Alphabet No.2"), which both use the term CARON. It was quite natural for ISO/IEC 10646 to borrow the term CARON from these earlier standards as the first editor of 10646, Hugh McGregor Ross, was also involved in the development of character encoding standards such as ISO/IEC 6937. It was equally natural for the first edition of Unicode to use HACEK, as the Xerox Character Code Standard (May 1986) which was developed by Unicode's founding father, Joe Becker, has "317 Hachek accent = caron" (see the comment by MMcM to this Language Hat post).

The earliest coded character sets to use the term CARON that I have managed to find are the bibliographic-usage character sets DIN 31624 (July 1979) and ISO 5426 (1980), which both have "CARON (HÁČEK)" at position 4/15, which is the same position as the plainly named "CARON" in ISO 6937. It would thus seem that the original sources for the ISO use of the name CARON were character sets intended for bibliographic use. Such character sets needed the "háček mark so that, for example, the Library of Congress could correctly catalogue books such as Jaroslav Hašek's The Good Soldier Švejk, but why did they call it a "caron", and only give "háček" as a parenthetical comment? Looking back to earlier bibliographic standards turns out to be disappointing. During the late 1960s and early 1970s computer systems designed specifically for libraries began to support the háček diacritic, but it seems to have been almost universally referred to as either hacek or háček in these systems, with no mention of a "caron" :


It seems that this line of enquiry is a dead end. The authors of DIN 31624 must have got the name from some other non-bibliographic source, but where? A year earlier than the publication of the German bibliographic character standard in 1979, Stanley Rice's Book design—systematic aspects has a "caret" symbol and a "caron" symbol next to each other, and separate from a "circumflex" accent :


Stanley Rice, Book design—systematic aspects (1978) page 103


This suggests that Rice thought of the caron not as an inverted circumflex used as a diacritic mark, but as an inverted caret used as an editorial mark, which seems to be taking us down a different path. But unfortunately, at this point the trail seems to go cold, with no other sightings of the "caron" in the earlier years of the 1970s. Then suddenly in 1967 the caron turns up in a most authoritative source, the 1967 edition of the United States Government Printing Office Style Manual :


United States Government Printing Office Style Manual (1967 edition) page 180

[Click on image to see the whole page (and your starter for 10, what character or characters on this page and the next are not yet encoded in Unicode?)]


The US GPO Style Manual has been revised and reprinted many times since its first edition in 1894 (linking to the Internet Archive copy as it seems to me that 90% of pre-20th century books that used to be downloadable in the UK from Google Book Search are no longer downloadable outside of the US ... grrr), but none of the editions prior to 1967 include a character named "caron". For example, this is the same table of characters in the previous edition to the 1967 edition, published in 1959 :


United States Government Printing Office Style Manual (1959 edition) page 178


The layout of characters in the 1967 edition of the US Government Printing Office Style Manual matches that in Stanley Rice's 1978 book on book design, and must surely have been the immediate source for it. It seems highly probable that the GPO Style Manual was also the source for the name "caron" in the 1979 German bibliographic character standard (DIN 31624). I imagine that someone on the committee that developed DIN 31624 must have had a copy of the 1967 edition of the GPO Style Manual, and when it came to naming the háček character the owner of this prestigious volume must have objected to the native name of the character as it evidently conflicted with international usage, and so as to ensure compatibility with the ISO version of the standard (ISO 5426) that the DIN committee must have realised it would evolve into, the accent was named with its apparent English name, CARON, and its real name, HÁČEK, was merely given as an annotation. The rest is history.

But where did the authors of the 1967 edition of the GPO Style Manual get the idea of including an inverted caret shaped mark called a "caron" from? That remains a mystery, as I have been unable to find any suggestion of the use of the name "caron" for such a mark in any earlier book on typography. Frank Romano has stated that "caron" was the name for the Slovak diacritic given in the "giant books in the center of the order department" of the Mergenthaler Linotype Company in Broooklyn, but none of the Linotype specimen books and manuals that I have seen use the term caron (see below). Nevertheless, I think it is probable that the earliest use of the name "caron" predates and is unrelated to the development of multilingual or bibliographic-usage character encoding standards (in 1967, ASCII was only just revised to include lower case characters, and character encoding standards with "exotic" letters, symbols and accents were still a decade away).

Not only do we not know where the authors of the 1967 edition of the GPO Style Manual got the name "caron" from, we also cannot be sure what exactly the mark labelled "caron" was intended to represent. In the manual the "caron" is paired with the caret mark, which is distinct from the similar-shaped circumflex accent, so it may well be that the "caron" was not intended to represent the háček diacritical mark at all, but was rather intended to represent an inverted caret used for proofreading purposes. If this was the case, as I believe, then the introduction of the name "caron" into DIN 31624, and thence, spreading like a virus, to sundry other ISO character encoding standards, and ultimately infiltrating its way into the Unicode standard, was a mistake.

The inverted caret mark used in proofreading is shown in the table of proofreading marks in the 1967 edition of the GPO Style Manual :


United States Government Printing Office Style Manual (1967 edition) page 4

[Click on image to see the whole page]


And books on typography, dating back into the 19th century explicitly name this mark an "inverted caret" :


Theodore Low de Vinne, Correct Composition: A Treatise of Spelling, Abbreviations, the Compounding and Division of Words, the Proper Use of Figures and Numerals, Italic and Capital Letters, Notes, etc., with Observations on Punctuation and Proof-Reading (New York, 1901) page 322


Even some Linotype specimen books have an "inverted caret", which is paired with the "caret", and does not appear to be the same thing as the háček diacritical mark :


Specimen Book Linotype Faces (Mergenthaler Linotype Company, 1939) page 892


But the question remains, when did the "inverted caret" become the "caron" ? Did the editors of the 1967 GPO Style Manual devise the name "caron" themselves on a whim, or did they borrow it from some other source ? (Answers on a postcard, please.)



Tags:

Latin

Index of BabelStone Blog Posts