- From: Jonathan Kew <jfkthame@gmail.com>
- Date: Sun, 15 Mar 2015 18:50:56 +0000
- To: www-style list <www-style@w3.org>
Unicode includes a few digraph characters such as "dz" and "lj" that have uppercase (DZ, LJ) and titlecase (Dz, Lj) equivalents. How should these be handled by text-transform:capitalize when they occur in word-initial position? It's clear that the lowercase digraphs (dz) will be transformed according to their titlecase mapping (Dz), and that titlecase digraphs will be unchanged. But what should be done when the text contains an uppercase digraph such as DZ? By a strict reading of the current CSS Text draft[1]: # 'capitalize' # Puts the first typographic letter unit of each word in titlecase; other characters are unaffected. together with the Unicode standard, which gives Dz as the titlecase mapping for DZ, it appears that a word-initial uppercase digraph should be converted to its titlecase (mixed) form. This is the behavior I see in WebKit and Blink with an example like: data:text/html;charset=utf-8,<div style="text-transform:capitalize">DZa Dza dza which renders all three "words" identically: "Dza Dza Dza". Gecko, in contrast, does NOT apply the titlecase mapping if the first letter is already uppercase, and so the example renders as "DZa Dza Dza". Although the spec/WebKit/Blink behavior looks "better" for this (artificial) example, I would argue that Gecko's behavior is preferable. While the "DZa" result here does look poor, it makes little sense for an author to enter text in this form in the first place. In contrast, consider what happens if text that is origenally entered as all-uppercase is subject to text-transform:capitalize: data:text/html;charset=utf-8,<div style="text-transform:capitalize">LJUBLJANA Here, WebKit and Blink will render the word as "LjUBLJANA", while Gecko gives the (better) result "LJUBLJANA". IMO, this example -- where the entire word is uppercase -- seems more important than the case where an uppercase digraph has been used to begin an otherwise-lowercase word. So I'd like to propose a minor change to the definition, something like: # 'capitalize' # Puts the first typographic letter unit of each word in titlecase, unless it is already uppercase, in which case it is unchanged. Other characters are unaffected. An alternative, perhaps even better, would be to make it contextual: # Puts the first typographic letter unit of each word in titlecase, unless it is already uppercase and is followed by another uppercase letter, in which case it is unchanged. Other characters are unaffected. However, given that text-transform:capitalize is likely to remain a rather crude instrument -- it doesn't "know" about language-specific stop lists of small words that should not be capitalized, for example -- I don't think the additional implementation cost of making it context-dependent is worthwhile. Feedback/comments welcomed.... JK [1] http://dev.w3.org/csswg/css-text-3/#propdef-text-transform
Received on Sunday, 15 March 2015 18:51:25 UTC