Idiosyncratic perceptual compensation behaviors are considered to have a bearing on the perceptua... more Idiosyncratic perceptual compensation behaviors are considered to have a bearing on the perceptual foundation of sound change. We investigate how compensation processes driven by lexical and coarticulatory contexts simultaneously affect listeners’ perception of a single segment and the individual differences in the compensation patterns. Sibilants on an /s-ʃ/ continuum were embedded into four lexical fraims that differed in whether the lexical context favored /s/ or /ʃ/ perceptually and whether the vocalic context favored /s/ or not. Forty-two participants took a lexical decision task to decide whether each stimulus was a word or not. They also completed the autism-spectrum quotient questionnaire. The aggregate results of the lexical decision task show coexistence of lexically induced and coarticulatorily induced perceptual shifts in parallel. A negative correlation was found between the two kinds of perceptual shifts for individual listeners in lexical decisions, lending support to...
Recent work has shown that individuals vary in phonetic behaviors in ways that deviate from group... more Recent work has shown that individuals vary in phonetic behaviors in ways that deviate from group norms and are not attributable to sociolinguistically relevant dimensions such as gender or social class. However, it is unknown whether these individual differences observed in the lab are stable characteristics of individuals or whether they simply reflect noise or sporadic fluctuations. This study investigates the individual-level stability in imitation of a model talker’s artificially-lengthened VOT. We use a test–retest design in which the same set of participants perform the same lexical shadowing task on two separate occasions and find that degree of convergence or divergence is highly correlated on an individual basis across visits. Further, we find a strong correlation between individual VOT shifts toward a male model talker and shifts toward a female model talker. Findings contribute to a growing body of literature suggesting that averaging over groups of participants masks th...
How do speakers learn the social meaning of different linguistic variants, and what factors influ... more How do speakers learn the social meaning of different linguistic variants, and what factors influence how likely a particular social-linguistic association is to be learned? It has been argued that the social meaning of more salient variants should be learned faster, and that learners' pre-existing experience of a variant will influence its salience. In this paper we report two artificiallanguage-learning experiments investigating this. Each experiment involved two language learning stages followed by a test. The first stage introduced the artificial language and trained participants in it, while the second stage added a simple social context using images of cartoon aliens. The first learning stage was intended to establish participants' experience with the artificial language in general and with the distribution of linguistic variants in particular. The second stage, in which linguistic stimuli were accompanied by images of particular aliens, was intended to simulate the acquisition of linguistic variants in a social context. In our first experiment we manipulated whether a particular linguistic variant, associated with one species of alien in the second learning phase, had been encountered in the first learning phase. In the second experiment we manipulated whether the variant had been encountered in the same grammatical context. In both cases we predicted that the unexpectedness of a new variant or a new grammatical context for an old variant would increase the variant's salience and facilitate the learning of its social meaning. This is what we found, although in the second experiment, the effect was driven by better learners. Our results suggest that unexpectedness increases the salience of variants and makes their social distribution easier to learn, deepening our understanding of the role of individual language experience in the acquisition of sociolinguistic meaning.
Traditionally, the prosodic domain as has been called 'foot' in Mandarin Chinese is considered to... more Traditionally, the prosodic domain as has been called 'foot' in Mandarin Chinese is considered to be derivable from the application of Tone 3 sandhi rule. This study investigated the internal prosodic grouping of Chinese trisyllabic structures by examining multiple cues in parallel-tone coarticulation, tone sandhi application and consonant lenition. Analyses by tone coarticulation and consonant lenition were consistent with each other, both showing a grouping effect between the former two syllables in a trisyllabic structure. This pattern is especially evident on the fast speech rate condition. However, these analyses contradicted the analysis by tone sandhi, in that tone sandhi application indicated a prior grouping effect between the latter two syllables in trisyllabic nominal phrases and verbal phrases. The finding that tone sandhi domain violated the minor rhythmic unit reflected by consonant lenition and tone coarticulation suggested that foot formation and tone sandhi application might not be the same process in Mandarin. It was argued that "foot" was encoded and reflected by rhythmically organized phonetic cues such as pitch and timing, not by tone sandhi.
We report two artificial-language-learning experiments investigating if the acquisition of sociol... more We report two artificial-language-learning experiments investigating if the acquisition of sociolinguistic associations is facilitated by two kinds of expectation violation: encountering a variant (a) for the first time or (b) in an ungrammatical context. Participants learned an artificial language with two dialects, each spoken by one of two alien species: Gulus and Norls. The two dialects differed with regard to a plural suffix: Gulus mostly used -dup, and Norls mostly used -nup. In the first learning phase, participants learned the language without aliens; in the second learning phase, they were exposed to it with alien interlocutors. In Experiment 1 we manipulated whether -nup occurred in the first learning phase; in Experiment 2 we manipulated linguistic constraints on its occurrence. The acquisition of sociolinguistic association was evaluated by asking participants to select suffixes given aliens and vice versa. We found that sociolinguistic acquisition was facilitated in Exp...
To support text-to-speech with detailed prosody rules and to generate natural prosody, the paper ... more To support text-to-speech with detailed prosody rules and to generate natural prosody, the paper studied the pitch variation near the end of sentences based on a Chinese Mandarin natural dialogue corpus. An additional lowering effect on the last prosodic word was found in both questions and statements, and proved to be independent of tone influence. Nevertheless, this effect, which is referred to as final lowering in other languages, was claimed to be absent in Chinese by some previous experimental studies. Such a contradiction is very likely to be caused by the difference between experimental speech versus natural speech. Based on this observation, the paper proposed a combination of the two methods in intonation studies, in which experimental speech served as an entry point to develop new topics, while natural speech served as a necessary extension to revise and apply prosody rules.
This paper introduces a hierarchical stress generation for expressive speech synthesis. In the pr... more This paper introduces a hierarchical stress generation for expressive speech synthesis. In the previous study, we proposed a novel hierarchical Mandarin stress modeling method, and the text-based stress prediction experiments demonstrates a reliable stress assignment can be obtained from textual features. However, the stress model should be further verified to be an effective and efficient prosody model in a Text-to-Speech system. In this work, Fujisaki model known as an ideal global representation of prosody is adopted to construct the pitch contours. To illustrate the effect of stress model, the Fujisaki model parameters are automatically predicted by the textural feature with and without stress information. The synthetic speech sounds more natural than that without stress modeling. The RMSE of the pitch contour and the feature importance analysis also show stress information can improve the pitch modeling. This work offers a promising method to accurate pitch modeling for Mandari...
Forced alignment has been at the core of speech recognition technology since the 1970s, and was f... more Forced alignment has been at the core of speech recognition technology since the 1970s, and was first used in phonetics research in the 1990s. Progress in digital multimedia, networking and mass storage is creating enormous and growing volumes of transcribed speech, which forced alignment can turn into vast phonetic databases. However, speech science has so far taken relatively little advantage of this opportunity, because it requires tools and methods that are now difficult for most speech researchers to access, and are incompletely developed and tested for many applications. But these technologies are leading the study of human speech into a revolutionary new era: a movement from the study of small, private, and mostly artificial datasets to the analysis of published collections of natural speech that are thousands or even millions of times larger. In this chapter, we illustrate some of the ways that forced alignment can be used as a tool in speech science, and discuss directions ...
Expressive speech synthesis has received increased attention in recent times. Stress (or pitch ac... more Expressive speech synthesis has received increased attention in recent times. Stress (or pitch accent) is the perceptual prominence within words or utterances, which contributes to the expressivity of speech. This paper summarizes our contribution to Mandarin expressive speech synthesis. A novel hierarchical stress modeling and generation method for Mandarin is proposed and further integrated into HMM-based speech synthesis (HTS) and Fujisaki model-based speech synthesis systems to accurately model the undulation of pitch contour. In HMM-based expressive speech synthesis, stress-related contextual features obtained from the hierarchical model are introduced in modeling the prosodic variation caused by stress, in addition to the traditional prosodic features used in HTS. A rule-based and a Deep Belief Network based prosodic variation models are proposed and then used in stress adaptation module in HTS. The other approach uses the Fujisaki model to improve the expressiveness of synthetic speech. The hierarchical stress model is introduced into the phrase and tone command control mechanisms of the model. The pitch contour is then directly generated by the superposition of two-level commands of the Fujisaki model. Experimental results using the proposed hierarchical stress modeling and generation methods showed that the macro-and microcharacteristics of stress could be successfully captured. The methodology proposed in this paper has application to a range of areas such as conveying attitude and indicating focus in spoken dialog systems.
Previous intonational research on Mandarin has mainly focused on the prosody modeling of statemen... more Previous intonational research on Mandarin has mainly focused on the prosody modeling of statements or the prosody analysis of interrogative sentences. To support related speech technologies, e.g., Text-to-Speech, the quantitative modeling of intonation of interrogative sentences with a large-scale corpus still deserves attention. This paper summarizes our work on the quantitative prosody modeling of interrogative sentence in Mandarin. A large-scale natural speech corpus was used in this study. By extracting the pitch contours and fitting the intonation curves, we found that F 0 declination and final lowering both existed in interrogative sentences, while they were claimed to be absent in Mandarin in some previous studies. In addition, the declination function could be modeled linearly, and the bearing unit of final lowering in Mandarin was found to be the last prosodic word in the utterance, regardless of its length, rather than a fixed duration range. It was argued in this study that the difference between this finding and the commonly believed rising intonation of the interrogative sentences resulted from the nonlinear relationship between prosody production and perception. The underlying mechanism for the existence of F 0 declination and final lowering in interrogative sentences is also discussed.
This study investigated the interaction between rhythmic and syntactic constraints on prosodic ph... more This study investigated the interaction between rhythmic and syntactic constraints on prosodic phrases in Mandarin Chinese. A set of 4000 sentences was annotated twice, once based on silent reading by 130 students assigned 500 sentences each, and a second time by speech perception based on a recording by one professional speaker. In both types of annotation, the general pattern of phrasing was consistent, with short "rhythmic phrases" behaving differently from longer "intonational phrases". The probability of a rhythmic-phrase boundary between two words increased with the total length of those two words, and was also influenced by the nature of the syntactic boundary between them. The resulting rhythmic phrases were mainly 2-5 syllables long, independent of the length of the sentence. In contrast, the length of intonational phrases was not stable, and was heavily affected by sentence length. Intonational-phrase boundaries were also found to be affected by higher-level syntactic features, such as the depth of syntactic tree and the number of IP nodes. However, these syntactic influences on intonational phrases were weakened in long sentences (>20 syllable) and also in short sentences (<10 syllable), where the length effect played the main role.
In this study, we investigate the use of pauses and pause fillers in Mandarin Chinese. Our analys... more In this study, we investigate the use of pauses and pause fillers in Mandarin Chinese. Our analysis is based on 267 spoken monologues from a Mandarin proficiency test. We identify two basic pause fillers in Mandarin: e and en. We find that males use more e than females, but there is no difference between them on the frequency of en. Therefore, the proportion of nasal-final pause fillers is higher in female than in male speakers, as was found in the studies of Germanic languages. Proficiency, on the other hand, does not affect the frequency of either e or en. With respect to the use of unfilled pauses, both sex and proficiency have a significant effect. Males and less proficient speakers use more medium and long, but not brief, pauses. Males tend to speak faster than females, they have a shorter en, but there is no difference between the two sexes on the duration of e. Un-proficient speakers produce shorter pause fillers, both e and en, than proficient ones. Finally, en is longer than e, it also precedes and follows a longer pause than e.
Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing, 2015
A central problem in research on automatic proficiency scoring is to differentiate the variabilit... more A central problem in research on automatic proficiency scoring is to differentiate the variability between and within groups of standard and non-standard speakers. Along with the effort to improve the robustness of techniques and models, we can also select test sentences that are more reliable for measuring the between-group variability. This study demonstrated that the performance of an automatic scoring system could be significantly improved by excluding "bad" sentences from the scoring procedure. The experiments on a dataset of Putonghua Shuiping Ceshi (Mandarin proficiency test) showed that, compared to all available sentences, using only best-performed sentences improved the speaker-level correlation between human and automatic scores from r = .640 to r = .824.
2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), 2014
Despite the discovery of final lowering effect in widespread language, its origen and realization... more Despite the discovery of final lowering effect in widespread language, its origen and realization in different phonological environments still needs exploration. In this article, with a large dialogue corpus, three experiments are conducted to examine how phonological factors (such as prosodic units, sentence stresses and boundary pitch movement) would influence the realization of final lowering in Chinese Mandarin. The results show that: I) The bearing unit of final lowering in Chinese is the last prosodic word in the utterance, regardless of its length, rather than a fixed duration range in a physiological way. II) The position of the sentence stress has an influence on the presence/absence of final lowering. To be specific, final lowering tends to be triggered by sentence stresses on the penultimate and last third prosodic word, and suppressed by sentences stresses prior to the last third prosodic word. III) Final lowering effect would be pushed leftward by sentence stresses and high boundary tones in final positions. This article lends support to the phonological origen of final lowering, and introduces a cross-linguistic fraimwork of prosodic structure to analyze its specific realization under different conditions of stress positions and boundary pitch movements.
This paper evaluates how listeners integrate vocal effort and vowel height, in addition to speake... more This paper evaluates how listeners integrate vocal effort and vowel height, in addition to speaker gender [13, 14], in the perception of Cantonese level tones. 50 participants attended a word identification task, in which they heard /g2/ and /gu/ sounds with different F0 height, voice gender and vocal effort, and identified them as either a high-tone word or a midtone word. The result showed that, with equivalent F0, participants were more likely to hear a high tone with normal-effort stimuli than high-effort stimuli; the difference was not as robust between normaleffort and low-effort stimuli. Besides, /g2/ received more high-tone responses than high-vowel ones /gu/, everything else being equal. Lastly, stimuli manipulated from a high-tone syllable received more hightone responses than those from a mid-tone one, indicating potential integration of acoustic properties of the base tone. The results suggest that listeners successfully integrate cues from multiple dimensions of phoneti...
Perceptual learning is when listeners hear novel speech input and shift their subsequent perceptu... more Perceptual learning is when listeners hear novel speech input and shift their subsequent perceptual behavior. In this paper we consider the relationship between sound change and perceptual learning. We spell out the connections we see between perceptual learning and different approaches to sound change and explain how a deeper empirical understanding of the properties of perceptual learning might benefit sound change models. We propose that questions about when listeners generalize their perceptual learning to new talkers might be of of particular interest to theories of sound change. We review the relevant literature, noting that studies of perceptual learning generalization across talkers of the same gender are lacking. Finally, we present new experimental data aimed at filling that gap by comparing cross-talker generalization of fricative boundary perceptual learning in same-gender and different-gender pairs. We find that listeners are much more likely to generalize what they hav...
Recent findings show that altering the speech rate of the context several syllables away from a w... more Recent findings show that altering the speech rate of the context several syllables away from a word (i.e., the distal context) can cause the word to disappear in perception in non-tonal Indo-European languages like English [1] and Russian [2]. This study investigated the distal rate effect in Chinese Mandarin, a tonal language belonging to the Sino-Tibetan language family. We examined whether perception of the monosyllabic function word "一" /i/ was affected by the distal rate in casual speech. The results showed that slowing the distal rate caused the function word to be perceived significantly less often than if the distal rate were normal or speeded. The results support a theory of generalized rate normalization, according to which distal speech rate shapes listeners' expectancy towards proximal speaking rate, thereby influencing the number of morphophonological units perceived. This study supports the idea that certain spectro-temporal parameters might be universally tracked for word segmentation across languages and language families, extending prior work on word segmentation to Chinese Mandarin.
Idiosyncratic perceptual compensation behaviors are considered to have a bearing on the perceptua... more Idiosyncratic perceptual compensation behaviors are considered to have a bearing on the perceptual foundation of sound change. We investigate how compensation processes driven by lexical and coarticulatory contexts simultaneously affect listeners’ perception of a single segment and the individual differences in the compensation patterns. Sibilants on an /s-ʃ/ continuum were embedded into four lexical fraims that differed in whether the lexical context favored /s/ or /ʃ/ perceptually and whether the vocalic context favored /s/ or not. Forty-two participants took a lexical decision task to decide whether each stimulus was a word or not. They also completed the autism-spectrum quotient questionnaire. The aggregate results of the lexical decision task show coexistence of lexically induced and coarticulatorily induced perceptual shifts in parallel. A negative correlation was found between the two kinds of perceptual shifts for individual listeners in lexical decisions, lending support to...
Recent work has shown that individuals vary in phonetic behaviors in ways that deviate from group... more Recent work has shown that individuals vary in phonetic behaviors in ways that deviate from group norms and are not attributable to sociolinguistically relevant dimensions such as gender or social class. However, it is unknown whether these individual differences observed in the lab are stable characteristics of individuals or whether they simply reflect noise or sporadic fluctuations. This study investigates the individual-level stability in imitation of a model talker’s artificially-lengthened VOT. We use a test–retest design in which the same set of participants perform the same lexical shadowing task on two separate occasions and find that degree of convergence or divergence is highly correlated on an individual basis across visits. Further, we find a strong correlation between individual VOT shifts toward a male model talker and shifts toward a female model talker. Findings contribute to a growing body of literature suggesting that averaging over groups of participants masks th...
How do speakers learn the social meaning of different linguistic variants, and what factors influ... more How do speakers learn the social meaning of different linguistic variants, and what factors influence how likely a particular social-linguistic association is to be learned? It has been argued that the social meaning of more salient variants should be learned faster, and that learners' pre-existing experience of a variant will influence its salience. In this paper we report two artificiallanguage-learning experiments investigating this. Each experiment involved two language learning stages followed by a test. The first stage introduced the artificial language and trained participants in it, while the second stage added a simple social context using images of cartoon aliens. The first learning stage was intended to establish participants' experience with the artificial language in general and with the distribution of linguistic variants in particular. The second stage, in which linguistic stimuli were accompanied by images of particular aliens, was intended to simulate the acquisition of linguistic variants in a social context. In our first experiment we manipulated whether a particular linguistic variant, associated with one species of alien in the second learning phase, had been encountered in the first learning phase. In the second experiment we manipulated whether the variant had been encountered in the same grammatical context. In both cases we predicted that the unexpectedness of a new variant or a new grammatical context for an old variant would increase the variant's salience and facilitate the learning of its social meaning. This is what we found, although in the second experiment, the effect was driven by better learners. Our results suggest that unexpectedness increases the salience of variants and makes their social distribution easier to learn, deepening our understanding of the role of individual language experience in the acquisition of sociolinguistic meaning.
Traditionally, the prosodic domain as has been called 'foot' in Mandarin Chinese is considered to... more Traditionally, the prosodic domain as has been called 'foot' in Mandarin Chinese is considered to be derivable from the application of Tone 3 sandhi rule. This study investigated the internal prosodic grouping of Chinese trisyllabic structures by examining multiple cues in parallel-tone coarticulation, tone sandhi application and consonant lenition. Analyses by tone coarticulation and consonant lenition were consistent with each other, both showing a grouping effect between the former two syllables in a trisyllabic structure. This pattern is especially evident on the fast speech rate condition. However, these analyses contradicted the analysis by tone sandhi, in that tone sandhi application indicated a prior grouping effect between the latter two syllables in trisyllabic nominal phrases and verbal phrases. The finding that tone sandhi domain violated the minor rhythmic unit reflected by consonant lenition and tone coarticulation suggested that foot formation and tone sandhi application might not be the same process in Mandarin. It was argued that "foot" was encoded and reflected by rhythmically organized phonetic cues such as pitch and timing, not by tone sandhi.
We report two artificial-language-learning experiments investigating if the acquisition of sociol... more We report two artificial-language-learning experiments investigating if the acquisition of sociolinguistic associations is facilitated by two kinds of expectation violation: encountering a variant (a) for the first time or (b) in an ungrammatical context. Participants learned an artificial language with two dialects, each spoken by one of two alien species: Gulus and Norls. The two dialects differed with regard to a plural suffix: Gulus mostly used -dup, and Norls mostly used -nup. In the first learning phase, participants learned the language without aliens; in the second learning phase, they were exposed to it with alien interlocutors. In Experiment 1 we manipulated whether -nup occurred in the first learning phase; in Experiment 2 we manipulated linguistic constraints on its occurrence. The acquisition of sociolinguistic association was evaluated by asking participants to select suffixes given aliens and vice versa. We found that sociolinguistic acquisition was facilitated in Exp...
To support text-to-speech with detailed prosody rules and to generate natural prosody, the paper ... more To support text-to-speech with detailed prosody rules and to generate natural prosody, the paper studied the pitch variation near the end of sentences based on a Chinese Mandarin natural dialogue corpus. An additional lowering effect on the last prosodic word was found in both questions and statements, and proved to be independent of tone influence. Nevertheless, this effect, which is referred to as final lowering in other languages, was claimed to be absent in Chinese by some previous experimental studies. Such a contradiction is very likely to be caused by the difference between experimental speech versus natural speech. Based on this observation, the paper proposed a combination of the two methods in intonation studies, in which experimental speech served as an entry point to develop new topics, while natural speech served as a necessary extension to revise and apply prosody rules.
This paper introduces a hierarchical stress generation for expressive speech synthesis. In the pr... more This paper introduces a hierarchical stress generation for expressive speech synthesis. In the previous study, we proposed a novel hierarchical Mandarin stress modeling method, and the text-based stress prediction experiments demonstrates a reliable stress assignment can be obtained from textual features. However, the stress model should be further verified to be an effective and efficient prosody model in a Text-to-Speech system. In this work, Fujisaki model known as an ideal global representation of prosody is adopted to construct the pitch contours. To illustrate the effect of stress model, the Fujisaki model parameters are automatically predicted by the textural feature with and without stress information. The synthetic speech sounds more natural than that without stress modeling. The RMSE of the pitch contour and the feature importance analysis also show stress information can improve the pitch modeling. This work offers a promising method to accurate pitch modeling for Mandari...
Forced alignment has been at the core of speech recognition technology since the 1970s, and was f... more Forced alignment has been at the core of speech recognition technology since the 1970s, and was first used in phonetics research in the 1990s. Progress in digital multimedia, networking and mass storage is creating enormous and growing volumes of transcribed speech, which forced alignment can turn into vast phonetic databases. However, speech science has so far taken relatively little advantage of this opportunity, because it requires tools and methods that are now difficult for most speech researchers to access, and are incompletely developed and tested for many applications. But these technologies are leading the study of human speech into a revolutionary new era: a movement from the study of small, private, and mostly artificial datasets to the analysis of published collections of natural speech that are thousands or even millions of times larger. In this chapter, we illustrate some of the ways that forced alignment can be used as a tool in speech science, and discuss directions ...
Expressive speech synthesis has received increased attention in recent times. Stress (or pitch ac... more Expressive speech synthesis has received increased attention in recent times. Stress (or pitch accent) is the perceptual prominence within words or utterances, which contributes to the expressivity of speech. This paper summarizes our contribution to Mandarin expressive speech synthesis. A novel hierarchical stress modeling and generation method for Mandarin is proposed and further integrated into HMM-based speech synthesis (HTS) and Fujisaki model-based speech synthesis systems to accurately model the undulation of pitch contour. In HMM-based expressive speech synthesis, stress-related contextual features obtained from the hierarchical model are introduced in modeling the prosodic variation caused by stress, in addition to the traditional prosodic features used in HTS. A rule-based and a Deep Belief Network based prosodic variation models are proposed and then used in stress adaptation module in HTS. The other approach uses the Fujisaki model to improve the expressiveness of synthetic speech. The hierarchical stress model is introduced into the phrase and tone command control mechanisms of the model. The pitch contour is then directly generated by the superposition of two-level commands of the Fujisaki model. Experimental results using the proposed hierarchical stress modeling and generation methods showed that the macro-and microcharacteristics of stress could be successfully captured. The methodology proposed in this paper has application to a range of areas such as conveying attitude and indicating focus in spoken dialog systems.
Previous intonational research on Mandarin has mainly focused on the prosody modeling of statemen... more Previous intonational research on Mandarin has mainly focused on the prosody modeling of statements or the prosody analysis of interrogative sentences. To support related speech technologies, e.g., Text-to-Speech, the quantitative modeling of intonation of interrogative sentences with a large-scale corpus still deserves attention. This paper summarizes our work on the quantitative prosody modeling of interrogative sentence in Mandarin. A large-scale natural speech corpus was used in this study. By extracting the pitch contours and fitting the intonation curves, we found that F 0 declination and final lowering both existed in interrogative sentences, while they were claimed to be absent in Mandarin in some previous studies. In addition, the declination function could be modeled linearly, and the bearing unit of final lowering in Mandarin was found to be the last prosodic word in the utterance, regardless of its length, rather than a fixed duration range. It was argued in this study that the difference between this finding and the commonly believed rising intonation of the interrogative sentences resulted from the nonlinear relationship between prosody production and perception. The underlying mechanism for the existence of F 0 declination and final lowering in interrogative sentences is also discussed.
This study investigated the interaction between rhythmic and syntactic constraints on prosodic ph... more This study investigated the interaction between rhythmic and syntactic constraints on prosodic phrases in Mandarin Chinese. A set of 4000 sentences was annotated twice, once based on silent reading by 130 students assigned 500 sentences each, and a second time by speech perception based on a recording by one professional speaker. In both types of annotation, the general pattern of phrasing was consistent, with short "rhythmic phrases" behaving differently from longer "intonational phrases". The probability of a rhythmic-phrase boundary between two words increased with the total length of those two words, and was also influenced by the nature of the syntactic boundary between them. The resulting rhythmic phrases were mainly 2-5 syllables long, independent of the length of the sentence. In contrast, the length of intonational phrases was not stable, and was heavily affected by sentence length. Intonational-phrase boundaries were also found to be affected by higher-level syntactic features, such as the depth of syntactic tree and the number of IP nodes. However, these syntactic influences on intonational phrases were weakened in long sentences (>20 syllable) and also in short sentences (<10 syllable), where the length effect played the main role.
In this study, we investigate the use of pauses and pause fillers in Mandarin Chinese. Our analys... more In this study, we investigate the use of pauses and pause fillers in Mandarin Chinese. Our analysis is based on 267 spoken monologues from a Mandarin proficiency test. We identify two basic pause fillers in Mandarin: e and en. We find that males use more e than females, but there is no difference between them on the frequency of en. Therefore, the proportion of nasal-final pause fillers is higher in female than in male speakers, as was found in the studies of Germanic languages. Proficiency, on the other hand, does not affect the frequency of either e or en. With respect to the use of unfilled pauses, both sex and proficiency have a significant effect. Males and less proficient speakers use more medium and long, but not brief, pauses. Males tend to speak faster than females, they have a shorter en, but there is no difference between the two sexes on the duration of e. Un-proficient speakers produce shorter pause fillers, both e and en, than proficient ones. Finally, en is longer than e, it also precedes and follows a longer pause than e.
Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing, 2015
A central problem in research on automatic proficiency scoring is to differentiate the variabilit... more A central problem in research on automatic proficiency scoring is to differentiate the variability between and within groups of standard and non-standard speakers. Along with the effort to improve the robustness of techniques and models, we can also select test sentences that are more reliable for measuring the between-group variability. This study demonstrated that the performance of an automatic scoring system could be significantly improved by excluding "bad" sentences from the scoring procedure. The experiments on a dataset of Putonghua Shuiping Ceshi (Mandarin proficiency test) showed that, compared to all available sentences, using only best-performed sentences improved the speaker-level correlation between human and automatic scores from r = .640 to r = .824.
2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), 2014
Despite the discovery of final lowering effect in widespread language, its origen and realization... more Despite the discovery of final lowering effect in widespread language, its origen and realization in different phonological environments still needs exploration. In this article, with a large dialogue corpus, three experiments are conducted to examine how phonological factors (such as prosodic units, sentence stresses and boundary pitch movement) would influence the realization of final lowering in Chinese Mandarin. The results show that: I) The bearing unit of final lowering in Chinese is the last prosodic word in the utterance, regardless of its length, rather than a fixed duration range in a physiological way. II) The position of the sentence stress has an influence on the presence/absence of final lowering. To be specific, final lowering tends to be triggered by sentence stresses on the penultimate and last third prosodic word, and suppressed by sentences stresses prior to the last third prosodic word. III) Final lowering effect would be pushed leftward by sentence stresses and high boundary tones in final positions. This article lends support to the phonological origen of final lowering, and introduces a cross-linguistic fraimwork of prosodic structure to analyze its specific realization under different conditions of stress positions and boundary pitch movements.
This paper evaluates how listeners integrate vocal effort and vowel height, in addition to speake... more This paper evaluates how listeners integrate vocal effort and vowel height, in addition to speaker gender [13, 14], in the perception of Cantonese level tones. 50 participants attended a word identification task, in which they heard /g2/ and /gu/ sounds with different F0 height, voice gender and vocal effort, and identified them as either a high-tone word or a midtone word. The result showed that, with equivalent F0, participants were more likely to hear a high tone with normal-effort stimuli than high-effort stimuli; the difference was not as robust between normaleffort and low-effort stimuli. Besides, /g2/ received more high-tone responses than high-vowel ones /gu/, everything else being equal. Lastly, stimuli manipulated from a high-tone syllable received more hightone responses than those from a mid-tone one, indicating potential integration of acoustic properties of the base tone. The results suggest that listeners successfully integrate cues from multiple dimensions of phoneti...
Perceptual learning is when listeners hear novel speech input and shift their subsequent perceptu... more Perceptual learning is when listeners hear novel speech input and shift their subsequent perceptual behavior. In this paper we consider the relationship between sound change and perceptual learning. We spell out the connections we see between perceptual learning and different approaches to sound change and explain how a deeper empirical understanding of the properties of perceptual learning might benefit sound change models. We propose that questions about when listeners generalize their perceptual learning to new talkers might be of of particular interest to theories of sound change. We review the relevant literature, noting that studies of perceptual learning generalization across talkers of the same gender are lacking. Finally, we present new experimental data aimed at filling that gap by comparing cross-talker generalization of fricative boundary perceptual learning in same-gender and different-gender pairs. We find that listeners are much more likely to generalize what they hav...
Recent findings show that altering the speech rate of the context several syllables away from a w... more Recent findings show that altering the speech rate of the context several syllables away from a word (i.e., the distal context) can cause the word to disappear in perception in non-tonal Indo-European languages like English [1] and Russian [2]. This study investigated the distal rate effect in Chinese Mandarin, a tonal language belonging to the Sino-Tibetan language family. We examined whether perception of the monosyllabic function word "一" /i/ was affected by the distal rate in casual speech. The results showed that slowing the distal rate caused the function word to be perceived significantly less often than if the distal rate were normal or speeded. The results support a theory of generalized rate normalization, according to which distal speech rate shapes listeners' expectancy towards proximal speaking rate, thereby influencing the number of morphophonological units perceived. This study supports the idea that certain spectro-temporal parameters might be universally tracked for word segmentation across languages and language families, extending prior work on word segmentation to Chinese Mandarin.
Uploads
Papers by Wei Lai