Content-Length: 294684 | pFad | https://github.com/w3c/csswg-drafts/issues/3461

76 [css-text-3] Assorted editorial clarifications/requests · Issue #3461 · w3c/csswg-drafts · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[css-text-3] Assorted editorial clarifications/requests #3461

Closed
litherum opened this issue Dec 19, 2018 · 7 comments
Closed

[css-text-3] Assorted editorial clarifications/requests #3461

litherum opened this issue Dec 19, 2018 · 7 comments

Comments

@litherum
Copy link
Contributor

Form feeds (U+000C) (that are not segment breaks) are rendered as a zero-width space (U+200B).
...
If the character ... is the zero-width space character (U+200B), then the break is removed

Unclear whether or not the form feed causes breaks to be removed.

Also, "rendered as a zero-width space" is probably not what you want. Zero-width spaces can participate in text shaping inside the font, and some fonts even have visible glyphs for zero-width-space that get removed by shaping. Why doesn't the spec say they're removed instead of rendered as U+200B?

If the character immediately before or immediately after the segment break is the zero-width space character (U+200B), then the break is removed, leaving behind the zero-width space

Similarly to above, leaving behind the ZWS is probably not what you want, as it can participate in shaping. It seems CSS is trying to treat the ZWS as a flag to signal behavior for the following break; leaving the flag behind seems like pollution.

Vaguely related: One of the things I'd like to do in WebKit is to forcibly not render ZWS under all circumstances using all fonts. After shaping, we would trace back each glyph to determine which character it came from, and if it came from a ZWS, literally delete the glyph from the sequence of glyphs.

As required by [UNICODE], unsupported Default_ignorable characters must be ignored for rendering.

A more useful link would be appreciated, instead of linking to the Unicode homepage.

As with spaces, any collapsible segment break immediately following another collapsible segment break is removed.
...
Any space immediately following another collapsible space ... is collapsed to have zero advance width.

These sentences seem contradictory. "Removed" is not "collapsed."

break is F, W, or H (not A)

Please spell these terms out. I had to go digging through specs to figure out which classes these letters correspond to.

  • It took me a while to figure out what the Segment Break Transformation Rules were trying to accomplish. An example showing uninterrupted Chinese text would be appreciated.

In some other writing systems, soft wrap opportunities...

This paragraph seems non-normative. Perhaps move it to a note?

@litherum
Copy link
Contributor Author

Oh, one more:

We will refer to the the typographic character unit as the basic unit of text. ... The typographic character represents ... a Latin alphabetic letter (including its diacritics), Hangul syllable, ..."

This seems to say that a typographic character unit often corresponds to more than one code point.

A typographic letter unit or letter for the purpose of this specification is a typographic character unit belonging to one of the Letter or Number general categories in Unicode

The general categories are categories of a single code point. How can we determine the general category for an entire multi-code-point-sequence?

@fantasai
Copy link
Collaborator

@litherum Wrt #3461 (comment) see https://www.w3.org/TR/css-text-3/#character-properties , cross referenced from the sentence right after the one you're quoting.

@frivoal frivoal added the css-text-3 Current Work label Dec 20, 2018
@fantasai
Copy link
Collaborator

fantasai commented Jan 3, 2019

Unclear whether or not the form feed causes breaks to be removed.

Shouldn't be, rendering as a different character doesn't change its identity....

Also, "rendered as a zero-width space" is probably not what you want. Zero-width spaces can participate in text shaping inside the font, and some fonts even have visible glyphs for zero-width-space that get removed by shaping. Why doesn't the spec say they're removed instead of rendered as U+200B?

Because you didn't want to in #855 (comment) ? :)

[Re-opened that issue because we later resolved on different handling for CR, but didn't re-discuss FF.]

Similarly to above, leaving behind the ZWS is probably not what you want, as it can participate in shaping. It seems CSS is trying to treat the ZWS as a flag to signal behavior for the following break; leaving the flag behind seems like pollution.

ZWSP isn't pollution, it's signal. It's the white space surrounding it that's noise we're trying to clear out. If we removed the ZWSP along with the white space, we'd lose important behavior that it represents, like a line break opportunity / break in joining.

A more useful link would be appreciated, instead of linking to the Unicode homepage.

That's not the Unicode home page (which is at http://www.unicode.org/), it's the table of contents for the core standard. I can't take responsibility for how the main content is in a sidebar. :/ Default_ignorable is defined, in this edition, in section 5.21. (Many of our references to Unicode are to its annexes, which each have their own HTML document at their own URL, but this is a reference into the core standard, which is a PDF linked from this version-agnostic landing page.)

These sentences seem contradictory. "Removed" is not "collapsed."

Adjusted the wording. Fwiw, the latter is trying to handle the requirements brought up in http://lists.w3.org/Archives/Public/www-style/2012Jan/0936.html I'm not sure how well it's succeeding.

It took me a while to figure out what the Segment Break Transformation Rules were trying to accomplish. An example showing uninterrupted Chinese text would be appreciated.

OK, I'll add an example. :)

In some other writing systems, soft wrap opportunities... This paragraph seems non-normative. Perhaps move it to a note?

That paragraph as well as the previous one are similar in that they're laying out the expectations. They're not strictly normative nor really solely informative... This is somewhat of an introductory section anyway, so I think it's better to leave them outside of a note. Notes are a type of aside, and this information is somewhat critical to making sense of the spec.

@fantasai
Copy link
Collaborator

fantasai commented Jan 3, 2019

@litherum OK, I think I've addressed all your comments. Let me know if something still seems unresolved?

@litherum
Copy link
Contributor Author

litherum commented Feb 5, 2019

Unclear whether or not the form feed causes breaks to be removed.

Shouldn't be, rendering as a different character doesn't change its identity....

Okay.

Why doesn't the spec say they're removed instead of rendered as U+200B?

Because you didn't want to in #855 (comment) ? :)

heehee. Yes, I see now. I'll see if I can come up with an example where it matters.

If we removed the ZWSP along with the white space, we'd lose important behavior that it represents, like a line break opportunity / break in joining.

I'm not sure that preserving it is doing authors any favors. Is this implemented anywhere, and do we know of authors who have used this mechanism to un-break lines? I'd like to hear what their expectations / desires are.

it's the table of contents for the core standard.

Can't we do better somehow? 😕

Adjusted the wording.

👍

OK, I'll add an example. :)

👍

so I think it's better to leave them outside of a note.

🤷🏻‍♂️ Okay.

https://www.w3.org/TR/css-text-3/#character-properties

Ah, I see. It's pretty unfortunate we have to have special cases for Unicode. Presumably other text engines don't have these special cases, which means other editors should have the special cases, or CSS should adopt whatever behavior the other editors have.

Therefore, either:

  1. These special cases should be moved into Unicode, allowing the behavior to be shared among all text editors, or
  2. We should find a new way of solving this problem

@fantasai
Copy link
Collaborator

fantasai commented Feb 5, 2019

I'm not sure that preserving it is doing authors any favors.

If the author didn't want it there, he shouldn't have put it there.

Can't we do better somehow?

Open to suggestions...

Ah, I see. It's pretty unfortunate we have to have special cases for Unicode. Presumably other text engines don't have these special cases, which means other editors should have the special cases, or CSS should adopt whatever behavior the other editors have.

Text engines generally don't do white space collapsing, which is the only place we're using EAW, so that's not something they need to think about, only we do. Wrt text orientation, we did end up with UAX50. Then there's justification which references things in https://www.w3.org/TR/css-text-3/#justify-symbols, which is outside the scope of Unicode and somewhat handwavy anyway... I don't see there's much benefit in getting Unicode involved here.

@frivoal
Copy link
Collaborator

frivoal commented Jul 9, 2019

@litherum this issue contained a bunch of subtopics. As far as I can tell, they've all be addressed to satisfaction (fixed or explained away), so I'm closing this. If you think one or more of your concerns still need more work, please open individual issues to track them separately.

@frivoal frivoal closed this as completed Jul 9, 2019
@frivoal frivoal added the Testing Unnecessary Memory aid - issue doesn't require tests label Dec 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: https://github.com/w3c/csswg-drafts/issues/3461

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy