[css-text-3] Assorted editorial clarifications/requests #3461

litherum · 2018-12-19T18:53:12Z

Form feeds (U+000C) (that are not segment breaks) are rendered as a zero-width space (U+200B).
...
If the character ... is the zero-width space character (U+200B), then the break is removed

Unclear whether or not the form feed causes breaks to be removed.

Also, "rendered as a zero-width space" is probably not what you want. Zero-width spaces can participate in text shaping inside the font, and some fonts even have visible glyphs for zero-width-space that get removed by shaping. Why doesn't the spec say they're removed instead of rendered as U+200B?

If the character immediately before or immediately after the segment break is the zero-width space character (U+200B), then the break is removed, leaving behind the zero-width space

Similarly to above, leaving behind the ZWS is probably not what you want, as it can participate in shaping. It seems CSS is trying to treat the ZWS as a flag to signal behavior for the following break; leaving the flag behind seems like pollution.

Vaguely related: One of the things I'd like to do in WebKit is to forcibly not render ZWS under all circumstances using all fonts. After shaping, we would trace back each glyph to determine which character it came from, and if it came from a ZWS, literally delete the glyph from the sequence of glyphs.

As required by [UNICODE], unsupported Default_ignorable characters must be ignored for rendering.

A more useful link would be appreciated, instead of linking to the Unicode homepage.

As with spaces, any collapsible segment break immediately following another collapsible segment break is removed.
...
Any space immediately following another collapsible space ... is collapsed to have zero advance width.

These sentences seem contradictory. "Removed" is not "collapsed."

break is F, W, or H (not A)

Please spell these terms out. I had to go digging through specs to figure out which classes these letters correspond to.

It took me a while to figure out what the Segment Break Transformation Rules were trying to accomplish. An example showing uninterrupted Chinese text would be appreciated.

In some other writing systems, soft wrap opportunities...

This paragraph seems non-normative. Perhaps move it to a note?

litherum · 2018-12-19T21:43:26Z

Oh, one more:

We will refer to the the typographic character unit as the basic unit of text. ... The typographic character represents ... a Latin alphabetic letter (including its diacritics), Hangul syllable, ..."

This seems to say that a typographic character unit often corresponds to more than one code point.

A typographic letter unit or letter for the purpose of this specification is a typographic character unit belonging to one of the Letter or Number general categories in Unicode

The general categories are categories of a single code point. How can we determine the general category for an entire multi-code-point-sequence?

fantasai · 2018-12-19T23:52:42Z

@litherum Wrt #3461 (comment) see https://www.w3.org/TR/css-text-3/#character-properties , cross referenced from the sentence right after the one you're quoting.

fantasai · 2019-01-03T17:08:50Z

Unclear whether or not the form feed causes breaks to be removed.

Shouldn't be, rendering as a different character doesn't change its identity....

Also, "rendered as a zero-width space" is probably not what you want. Zero-width spaces can participate in text shaping inside the font, and some fonts even have visible glyphs for zero-width-space that get removed by shaping. Why doesn't the spec say they're removed instead of rendered as U+200B?

Because you didn't want to in #855 (comment) ? :)

[Re-opened that issue because we later resolved on different handling for CR, but didn't re-discuss FF.]

Similarly to above, leaving behind the ZWS is probably not what you want, as it can participate in shaping. It seems CSS is trying to treat the ZWS as a flag to signal behavior for the following break; leaving the flag behind seems like pollution.

ZWSP isn't pollution, it's signal. It's the white space surrounding it that's noise we're trying to clear out. If we removed the ZWSP along with the white space, we'd lose important behavior that it represents, like a line break opportunity / break in joining.

A more useful link would be appreciated, instead of linking to the Unicode homepage.

That's not the Unicode home page (which is at http://www.unicode.org/), it's the table of contents for the core standard. I can't take responsibility for how the main content is in a sidebar. :/ Default_ignorable is defined, in this edition, in section 5.21. (Many of our references to Unicode are to its annexes, which each have their own HTML document at their own URL, but this is a reference into the core standard, which is a PDF linked from this version-agnostic landing page.)

These sentences seem contradictory. "Removed" is not "collapsed."

Adjusted the wording. Fwiw, the latter is trying to handle the requirements brought up in http://lists.w3.org/Archives/Public/www-style/2012Jan/0936.html I'm not sure how well it's succeeding.

It took me a while to figure out what the Segment Break Transformation Rules were trying to accomplish. An example showing uninterrupted Chinese text would be appreciated.

OK, I'll add an example. :)

In some other writing systems, soft wrap opportunities... This paragraph seems non-normative. Perhaps move it to a note?

That paragraph as well as the previous one are similar in that they're laying out the expectations. They're not strictly normative nor really solely informative... This is somewhat of an introductory section anyway, so I think it's better to leave them outside of a note. Notes are a type of aside, and this information is somewhat critical to making sense of the spec.

fantasai · 2019-01-03T21:15:44Z

@litherum OK, I think I've addressed all your comments. Let me know if something still seems unresolved?

litherum · 2019-02-05T22:58:54Z

Unclear whether or not the form feed causes breaks to be removed.

Shouldn't be, rendering as a different character doesn't change its identity....

Okay.

Why doesn't the spec say they're removed instead of rendered as U+200B?

Because you didn't want to in #855 (comment) ? :)

heehee. Yes, I see now. I'll see if I can come up with an example where it matters.

If we removed the ZWSP along with the white space, we'd lose important behavior that it represents, like a line break opportunity / break in joining.

I'm not sure that preserving it is doing authors any favors. Is this implemented anywhere, and do we know of authors who have used this mechanism to un-break lines? I'd like to hear what their expectations / desires are.

it's the table of contents for the core standard.

Can't we do better somehow? 😕

Adjusted the wording.

👍

OK, I'll add an example. :)

👍

so I think it's better to leave them outside of a note.

🤷🏻‍♂️ Okay.

https://www.w3.org/TR/css-text-3/#character-properties

Ah, I see. It's pretty unfortunate we have to have special cases for Unicode. Presumably other text engines don't have these special cases, which means other editors should have the special cases, or CSS should adopt whatever behavior the other editors have.

Therefore, either:

These special cases should be moved into Unicode, allowing the behavior to be shared among all text editors, or
We should find a new way of solving this problem

fantasai · 2019-02-05T23:40:25Z

I'm not sure that preserving it is doing authors any favors.

If the author didn't want it there, he shouldn't have put it there.

Can't we do better somehow?

Open to suggestions...

Ah, I see. It's pretty unfortunate we have to have special cases for Unicode. Presumably other text engines don't have these special cases, which means other editors should have the special cases, or CSS should adopt whatever behavior the other editors have.

Text engines generally don't do white space collapsing, which is the only place we're using EAW, so that's not something they need to think about, only we do. Wrt text orientation, we did end up with UAX50. Then there's justification which references things in https://www.w3.org/TR/css-text-3/#justify-symbols, which is outside the scope of Unicode and somewhat handwavy anyway... I don't see there's much benefit in getting Unicode involved here.

frivoal · 2019-07-09T15:24:54Z

@litherum this issue contained a bunch of subtopics. As far as I can tell, they've all be addressed to satisfaction (fixed or explained away), so I'm closing this. If you think one or more of your concerns still need more work, please open individual issues to track them separately.

frivoal added the css-text-3 Current Work label Dec 20, 2018

fantasai added a commit that referenced this issue Jan 3, 2019

[css-text-3] Clean up some wording. #3461

670ed13

fantasai added a commit that referenced this issue Jan 3, 2019

[css-text-3] Expand out EAW abbreviations. #3461

3160736

fantasai added a commit that referenced this issue Jan 3, 2019

[css-text-3] Add example for segment break transformation. #3461

791b4fb

fantasai added Closed Accepted as Editorial Commenter Response Pending labels Jan 3, 2019

fantasai added the Tracked in DoC label Jan 15, 2019

frivoal mentioned this issue Apr 23, 2019

[css-text] Should zero width space break Arabic shaping? #3861

Open

frivoal closed this as completed Jul 9, 2019

frivoal added the Testing Unnecessary Memory aid - issue doesn't require tests label Dec 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[css-text-3] Assorted editorial clarifications/requests #3461

[css-text-3] Assorted editorial clarifications/requests #3461

litherum commented Dec 19, 2018

litherum commented Dec 19, 2018

fantasai commented Dec 19, 2018

fantasai commented Jan 3, 2019

fantasai commented Jan 3, 2019

litherum commented Feb 5, 2019

fantasai commented Feb 5, 2019

frivoal commented Jul 9, 2019

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

[css-text-3] Assorted editorial clarifications/requests #3461

[css-text-3] Assorted editorial clarifications/requests #3461

Comments

litherum commented Dec 19, 2018

litherum commented Dec 19, 2018

fantasai commented Dec 19, 2018

fantasai commented Jan 3, 2019

fantasai commented Jan 3, 2019

litherum commented Feb 5, 2019

fantasai commented Feb 5, 2019

frivoal commented Jul 9, 2019

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!