Chinese character components

In Written Chinese, components (Chinese: 部件; pinyin: bùjiàn) are building blocks of characters, composed of strokes. ^[1] In most cases, a component consists of more than one stroke, and is smaller than the whole of the character. For example, the character 件 consists of two components: 亻 and 牛. These can be further decomposed: 亻 can be analyzed as the sequence of strokes ㇓㇑, and 牛 as the sequence ㇓㇐㇐㇑. ^[2]

There are two methods for Chinese character component analysis, hierarchical dividing and plane dividing. Hierarchical dividing separates layer by layer from larger to smaller components, and finally gets the primitive components. Plane dividing separates out the primitive components at one time. ^[3]

The structure of a Chinese character is the pattern or rule in which the character is formed by its (first level) components. ^[4] Chinese character structures include single-component structure, left-right structure, up-down structure and surrounding structure. ^[5]

Analysis

Chinese characters may be analyzed in terms of smaller components. This analysis is generally based on graphical forms, without considering aspects like pronunciation and meaning.^[3]

Component analysis is very helpful for learning Chinese characters. For example:

想→相+心
湘→氵+相
相→木+目

Through component analysis, one may learn characters in an easier way. If a student learns 相 first, the knowledge will help with the learning or review of 想, 湘, 木 and 目. Obviously, learning by component analysis is much more efficient than learning by analyzing each character to strokes. Component analysis is also used in Chinese character encoding for computer input. ^[6]

There are two methods for Chinese character dividing, hierarchical dividing and plane dividing. Hierarchical dividing separates layer by layer from large to small components, and finally gets the primitive components. Plane dividing separates out the primitive components at one time. Hierarchical dividing can display the external structure of Chinese characters, while plane splitting can be regarded as omitting the higher splitting levels, and directly writing out the final separating result of primitive components. ^[3]

Rules for division

The rules for hierarchical dividing include: ^[7]

The separation space ditch/gap (分割沟) is an obvious boundary, where the character (or bigger component) is split into (smaller) components.
If there is only one separation ditch, split into two components along the separation ditch. For example: 私→禾+厶, 兵→丘+八.
When there is more than one separation ditch, divide along the longer one first. For example: 想→ 相+心，相→木+目, to get the hierarchical structure of 想(相(木+目)+心) with two layers of components.
When several separation ditches are parallel and equal in length, divide along all of them. For example: 鴻→氵+工+鳥.
Intersecting stroke groups are not divided, for example, 車 and 车 are primitive components.
The lower bound of dividing is generally greater than single strokes, and components with only two strokes, such as "二八刁", are not to be separated.
Hierarchical analysis should conform to the basic structure of Chinese characters. For example: the outermost layer of "謎" is in a left-right structure, so the left and right separation is employed first: 謎→訁+迷, followed by the inside-outside division (迷→辶+米), although the latter's L-shaped separation ditch may be longer.
A character containing multi-level components are divided from larger to smaller sizes to generate first-level components, second-level components, third-level components, etc.

An example

The hierarchical analysis of character 戇 in (1) bracketed representation:

戇(贛(章(立+早(日+十))+(⿱夂貢)(夂+貢(工+貝(目+八))))+心),^[a] 5 layers of components.

or in tree structure:

                    戇
                  /    \
                贛       心
              /    \
             章    (⿱夂貢)
            /  \    /  \
           立   早  夂   貢
              /  \     /  \
             日   十   工  貝
                         /  \
                        目   八

The level to which a Chinese character is to be analyzed or divided depends on actual applications.

In plane analysis, only components on the tree-leaves are presented, i.e.,

戇: 立,日,十,夂,工,目,八,心.

Analysis data of the Cihai

The following is the analysis data of Cihai (辭海), with a character set of 16,339 traditional and simplified Chinese characters.^[8]

*Cihai* Traditional and Simplified Chinese Character component analysis
component level	different components	total components
1	3061	32065
2	1302	34296
3	539	16777
4	195	3872
5	48	396
6	12	184
7	3	6

In most cases, a component is larger than a stroke and smaller than the whole character (combines with some other components to form the character). The condition for a single stroke to be a component is: occupies a relatively independent location usually occupied by a multiple-stroke component in a character. For example: the top stroke 一 in character 元, the bottom 一 in 旦, the left 丨 in 旧, the right ㇟ in 礼, the central ㇔ in 勺, and the outer ㇆ in 司. In the special cases of one-stroke characters, such as 一 and 乙, a stroke is a component and is a character. ^[9]

Classification of components

Character components and non-character components

A component that can independently form a character is a character component, or a component of independent character formation (成字部件). For example, component 口 formed character 口 independently, and is a component in characters 另, 洁 and 唱; and component 相 is also a character by itself, and a component in 湘, 箱 and 想. ^[1]

A component that can not independently form a character is a non-character component, or a component of dependent character formation (非成字部件). For example, component 冂 in character 同, 钢 and 岗; and component 疒 in 疾, 病 and 痛. Neither 冂 nor 疒 is a character in modern Chinese. ^[1]

Primitive components and Compound components

A component that cannot be (further) divided into smaller components by the rules is a primitive component, or basic component (基礎部件, 基础部件). Primitive components are the final-level components of hierarchical dividing. For example, components 田 and 力 in character 男, and 氵 in character 河. ^[4]

A component composed of two or more primitive components is a compound component (合成部件). For example, component 咅 (立+口) in character 陪, 部 and 菩, and component 相 (木+目) in 厢, 霜 and 孀. ^[4]

Hierarchy of components

A component divided out at the first level is called a level-one component, a component divided out at the second level is called a level-two component, and so on. A component divided out at the final level is called a final-level component, i.e., primitive component. For example, in the example of character 戇,

                   戇
                 /    \
               贛       心 (level-one components)
             /    \
            章    (⿱夂貢) (level-two components)
           /  \    /  \
          立   早  夂   貢 (level-three components)
             /  \     /  \
            日   十   工  貝 (level-four components)
                        /  \
                       目   八 (level-five components)

where the leaf components 立, 日, 十, 夂, 工, 目, 八 and 心 are final-level components or primitive components.

Single-stroke components and multi-stroke components

A component formed by one stroke is called a single-stroke component. For example, stroke 一 in character 丛， stroke ㇑ in character 引， stroke ㇓ in character 系， stroke ㇔ in character 良， stroke ㇆ in character 司. ^[4]

A component formed by more than one stroke is called a multi-stroke component. For example, component 从 in character 丛, 弓 in character 引, and 艮 of 良.

Primitive components

Among the 16,339 traditional, simplified and unsimplified characters in Cihai, there are 675 primitive components; among the 11,834 characters excluding the simplified traditional characters, there are 648 primitive components.^[10] In Chinese Character Information Dictionary,^[11] among the 7,785 China Mainland standard characters, a total of 623 primitive components have been divided out.

Primitive components with the most character combinations in *Cihai*
serial number	components	characters composed	frequency
1	口	2409	20.3579%
2	一	1279	10.8089%
3	艹	812	6.8622%
4	木	791	6.6841%
5	人	774	6.5404%
6	日	766	6.4736%
7	氵	691	5.8391%
8	亻	679	5.7383%
9	八	642	5.4252%
10	土	597	5.0457%

(Divided from 11,834 simplified and unsimplified characters from Cihai).^[10]

Component standards

Chinese character components are widely used in Chinese character keyboard encoding input methods. Different encoding input methods have different ways for component separation. Therefore, it is necessary to formulate norms or standards related to Chinese character components.

"Chinese Character Component Standard of GB13000.1 Character Set for Information Processing" (信息处理用GB13000.1字符集汉字部件规范) is a standard released on February 1, 1997, by the National Language Commission of China. It includes a "List of Chinese Character Primitive Components". The list contains 560 primitive components. All the 20,902 CJK Chinese characters in the GB13000.1 character set can be formed with these components. This standard is mainly for Chinese information processing. ^[12]

Another important standard is the " Specification of Common Modern Chinese Character Components and Component Names" (現代常用字部件及部件名稱規範) formulated by the National Language Commission in 2009. ^[13] ^[14] It includes a list of 514 primitive components of commonly-used characters and component names. This standard is mainly for Chinese character education and dictionary collation.

Component naming

The rules for component naming include the following: ^[15]

If the component is a character, then call it by this character, for example: 口(kǒu) and 土(tǔ). If the character has more than one sounds, then use the more common one, such as: component "中" is called zhōng, not zhòng.

If the component is not a character, then if it has a name, then use the existing name. For example, 扌 (tí shǒu, 提手) and 宀 (bǎo gài, 宝盖). If the component has more than one name, then use the name commonly used, for example, 彳 is rather called shuāng lì rén (双立人) than shuāngrén páng (双人旁).

For a component without a name, a colloquial and reasonable name should be determined. One way is to refer to the component by its position in common characters. For example: "the head of character 青" (龶, 青字头), "the frame of character 国" (囗, 国字框).

Chinese character structures

The structure of a Chinese character is the pattern or rule in which the character is formed by its (first level) components. ^[4] Chinese character structures include ^[5]

Single-component structure: The character is formed by a single primitive component, such as 口, 日 and 月.
Left-right structure (⿰): The character is formed by a component on the left and another one on the right, such as 好, 部 and 件.
Left-middle-right structure (⿲): The character is formed by a component on the left, a component on the right and a component in the middle, such as 街, 班 and 辯.
Up-down structure (⿱): The character is formed by a component above another component, such as 昌, 号 and 召.
Up-middle-down structure (⿳): The character is formed by a component at the top, a component at the bottom and a component in the middle, such as 鼻, 曼 and 率.
Complete-surrounding (⿴ ): such as 國, 围 and 回.
Left-top-right-surrounding (⿵): such as 同, 问 and 向.
Top-left-bottom-surrounding (⿷): such as 區, 匠 and 匣.
Left-bottom-right surrounding (⿶): such as 函, 凶 and 凼.
Top-left surrounding (⿸): such as 历, 廣 and 居.
Top-right surrounding (⿹): such as 司, 可 and 氧.
Left-bottom surrounding (⿺): such as 進, 建 and 題.
overlapping (⿻), or multi-frame surrounding: such as 坐, 乘, 幽, 噩.

The principles of Chinese character first-level structure analysis can be extended to other levels. For example, character 部 is in left-right structure, where the left component is in up-down structure.

Deformation of components

Sometimes in order to make the glyph more beautiful and reasonable in structure, a component may need to be changed in form according to the character environment. The deformation of the components can be made in two ways:

Change the shape of individual strokes.
The entire component is flattened or narrowed. ^[16]

Stroke deformation within a component

Stroke deformation includes the following situations: ^[16]

When the bottom stroke of a left component is ㇐ (heng, horizontal) or ㇐ intersected with ㇑ (shu, vertical), the ㇐ is usually changed to ㇀ (ti ). For example: 攻、城、骄、班、敛、孫、特、轴, exception: 軸鞍.
When "半、羊、辛" is used as the left component, the last stroke ㇑(shu) should be changed to ㇓(pie). For example: "判、翔、辣".
If the last stroke of a component is ㇏ (na), and the component is on the left side or in a surrounding structure, then ㇏ often needs to be changed to ㇔ (dian, dot). Such as: "林、劝、因".
When adjacent strokes have two or more (parallel) ㇏ (na), generally only keep one ㇏, and change the rest to ㇔ (dots). Such as: "秦、这、炎、餐".
When component "几" is on another component, the hook should be removed. For example: "朵般鉛".
When "九、儿、几" is on the left side of other components, the horizontal bending hook is often changed to lifting. For example: "鳩、頹、微".
When the last stroke of the left radical is a ㇟(vertical bend hook), it is often changed to a ㇙ (vertical lift). For example: "切、顾、改、耀".
When "手" (hand) is used on the left side, the vertical hook may be changed to ㇓ (pie). For example: "拜、掰".

Narrowing or flattening of components

The narrowing or flattening of components is to make the structure of the whole character harmonious and well-proportioned. Take "犬" (dog) as an example:

In the upper and lower structures, to be flattened. For example: "哭、器".
In the left and right structure, to be narrowed. For example: "狀、獄".

Pianpang and radicals

Pianpangs (偏旁; piānpáng) and radicals (部首; bùshǒu) are components.

Originally, the left side of a combined Chinese character was called pian, and the right side was called pang. Nowadays, it is customary to refer to the left and right, upper and lower, outer and inner parts of combined characters as pianpangs. Therefore, the pianpang analysis of combined characters is similar to the first-level component analysis. Piangpang generally carry sound or meaning information. They are called "sound side" (also called "sound symbol") and "meaning side" (also called "meaning symbol") respectively. ^[17]

Radicals are components used for sorting and retrieving Chinese characters. According to the glyph structure of Chinese characters, the common components of a group of characters are taken as the basis for character sorting and searching. And these components are called radicals. In pictophonetic characters, the radicals are mostly pianpangs representing the meaning. ^[6]

Component optimization

Hu Qiaomu said: ^[18] "The (primitive) components of Chinese characters should be reduced, and the components of Chinese characters should be made independent characters as many as possible; those that cannot be characters should be universal and easy to say. This may be more important than reducing the number of strokes and characters. Some simplified characters have added new components of Chinese characters. For example, '书农长' and so on. Although the traditional character 農 has more strokes, it is very clear to say: '曲+辰農'. When we simplify Chinese characters, we should avoid new unspeakable and uncommon components. "

Components are important structural units of Chinese characters. Optimizing the components of Chinese characters to make them more concise, standardized, and easy to learn and use is an important task for Chinese character optimization, and there is a long way to go. ^[18] ^[19]

Notes

^ "(⿱夂貢)" represents the right component of 贛, which is not displayable in Unicode

References

Citations

^ ^a ^b ^c National Language Commission 2009, p. 1.
^ Su 2014, pp. 84–85.
^ ^a ^b ^c Su 2014, p. 86.
^ ^a ^b ^c ^d ^e National Language Commission 2009, p. 2.
^ ^a ^b Fu 1999, p. 2.
^ ^a ^b Fu 1999, p. 18.
^ Su 2014, p. 86-88.
^ Fu 1999, p. 25.
^ Su 2014, p. 85.
^ ^a ^b Fu 1999, p. 26.
^ Li 1988, p. 1027.
^ National Language Commission 1998.
^ National Language Commission 2009.
^ CP 2021, pp. 149–224.
^ CP 2021, pp. 153–156.
^ ^a ^b Su 2014, pp. 91–93.
^ Fu 1999, p. 17.
^ ^a ^b Su 2014, pp. 93.
^ Zhang 2004.

Works cited

CP, Commercial Press (collected) (2021). 语言文字规范手册 (Handbook of Language Standards) (in Chinese). Beijing: 商务印书馆 (Commercial Press). ISBN 978-7-5176-0774-8.
Fu, Yonghe (傅永和) (1999). 中文信息处理 (Chinese Information Processing) (in Chinese) (3rd ed.). Guangzhou: 广东教育出版社 (Guangdong Education Press). p. 84. ISBN 9-787540-640804.
Li, Gongyi (李公宜，劉如水（主編）) (1988). 漢字信息字典 (Chinese Character Information Dictionary) (in Chinese). Beijing: 科学出版社 (Science Press). ISBN 7-03-000862-6.
National Language Commission, Ministry of Education, China (1998). Chinese Character Component Standard of GB 13000.1 Character Set for Information Precessing (信息处理用GB 13000.1 字符集汉字部件规范) (PDF). Beining: National Language Commission. ISBN 7-80126-314-6. Retrieved 3 September 2023.{{cite book}}: CS1 maint: multiple names: authors list (link)
National Language Commission, Ministry of Education, China (2009). Specification of Common Modern Chinese Character Components and Component Names ( 现代常用字部件及部件名称规范) (PDF). Beining: National Language Commission. Retrieved 3 September 2023.{{cite book}}: CS1 maint: multiple names: authors list (link)
Su, Peicheng (苏培成) (2014). 现代汉字学纲要 (Essentials of Modern Chinese Characters) (in Chinese) (3rd ed.). Beijing: 商务印书馆 (Commercial Press). p. 84. ISBN 978-7-100-10440-1.
Zhang, Xiaoheng (2004). "Difficulties in the Application of "Chinese Character Component Standard of GB 13000.1 Character Set for Information Processing" for Chinese Character Input (《信息处理用GB13000.1字符集汉字部件规范》在输入法应用中的难点讨论)". Journal of Chinese Information Processing (中文信息学报). 18 (2004) (4): 60–65.

[8] "(⿱夂貢)" represents the right component of 贛, which is not displayable in Unicode

[FOOTNOTENational_Language_Commission20091-1] National Language Commission 2009, p. 1.

[FOOTNOTESu201484–85-2] Su 2014, pp. 84–85.

[FOOTNOTESu201486-3] Su 2014, p. 86.

[FOOTNOTENational_Language_Commission20092-4] National Language Commission 2009, p. 2.

[FOOTNOTEFu19992-5] Fu 1999, p. 2.

[FOOTNOTEFu199918-6] Fu 1999, p. 18.

[FOOTNOTESu201486-88-7] Su 2014, p. 86-88.

[FOOTNOTEFu199925-9] Fu 1999, p. 25.

[FOOTNOTESu201485-10] Su 2014, p. 85.

[FOOTNOTEFu199926-11] Fu 1999, p. 26.

[FOOTNOTELi19881027-12] Li 1988, p. 1027.

[FOOTNOTENational_Language_Commission1998-13] National Language Commission 1998.

[FOOTNOTENational_Language_Commission2009-14] National Language Commission 2009.

[FOOTNOTECP2021149–224-15] CP 2021, pp. 149–224.

[FOOTNOTECP2021153–156-16] CP 2021, pp. 153–156.

[FOOTNOTESu201491–93-17] Su 2014, pp. 91–93.

[FOOTNOTEFu199917-18] Fu 1999, p. 17.

[FOOTNOTESu201493-19] Su 2014, pp. 93.

[FOOTNOTEZhang2004-20] Zhang 2004.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[a]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]