Content-Length: 245128 | pFad | https://github.com/w3c/iip/issues/87

42 Grapheme clusters fail to represent syllabic conjuncts in north Indian scripts · Issue #87 · w3c/iip · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grapheme clusters fail to represent syllabic conjuncts in north Indian scripts #87

Open
r12a opened this issue Feb 5, 2020 · 1 comment
Labels
doc:beng doc:deva doc:gujr gap i:segmentation Grapheme/word segmentation & selection l:bn Bengali language & script l:gu Gujurati language & script l:hi Hindi, Devanagari script p:basic s:beng Bengali script s:deva Devanagari script s:gujr Gurajati script x:beng x:deva x:gujr

Comments

@r12a
Copy link
Contributor

r12a commented Feb 5, 2020

The Unicode concept of 'grapheme cluster' currently fails to represent syllabic conjuncts (plus vowels, etc) in scripts like Devanagari. This means that various editing operations, line breaking algorithms, vertical text, etc. are likely to break text at the wrong point.

Indic Layout Requirements provides a grammar for indian orthographic syllable boundaries which works for Devanagari, and CSS uses the concept of 'typographic character unit', rather than grapheme cluster, in its specs with the explanation that these cases are beyond the scope of the grapheme cluster concept and that implementations should provide appropriate support. In addition, a modification to the concept of grapheme cluster is currently in development at the Unicode Consortium, which is likely to resolve the problem for a script like Devanagari.

See requirements at: Indic Layout Requirements

Specs
CSS uses the concept of 'typographic character unit', rather than grapheme cluster, in its specs with the explanation that these cases are beyond the scope of the grapheme cluster concept and that implementations should provide appropriate support.

Tests

@r12a r12a added i:segmentation Grapheme/word segmentation & selection gap p:basic doc:deva labels Feb 5, 2020
@r12a
Copy link
Contributor Author

r12a commented Feb 5, 2020

The first comment in this issue contains text that will automatically appear in the Devanagari gap-analysis document as a subsection with the same title as this issue. Any edits made to that comment will be immediately available in the document. Proposals for changes or discussion of the content can be made in comments below this point.

@r12a r12a changed the title Grapheme clusters fail to represent syllabic conjuncts Grapheme clusters fail to represent syllabic conjuncts in north Indian scripts May 18, 2021
@r12a r12a added l:hi Hindi, Devanagari script l:bn Bengali language & script l:gu Gujurati language & script labels May 1, 2024
@r12a r12a moved this to Issue identified, needing investigation in Gap-analysis pipeline Jun 20, 2024
@r12a r12a added s:gujr Gurajati script s:beng Bengali script s:deva Devanagari script labels Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc:beng doc:deva doc:gujr gap i:segmentation Grapheme/word segmentation & selection l:bn Bengali language & script l:gu Gujurati language & script l:hi Hindi, Devanagari script p:basic s:beng Bengali script s:deva Devanagari script s:gujr Gurajati script x:beng x:deva x:gujr
Projects
Status: Issue identified, needing investigation
Development

No branches or pull requests

1 participant








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: https://github.com/w3c/iip/issues/87

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy