No tones change for the word before backquote. Instead, the syllables after backquote changed to shorter and softer tones.
For TTS, this appears to be the biggest challenge. Using POST (part of speech tagging - lexical category) would help. But this is a huge task.
What are the existing databases out there? What models have been developed for tagging Taiwanese? Can we revise the existing POST for a jump start?
Further, a vocabulary could be a noun or a verb. How could TTS distinguish the difference when a word has multiple tags?Meta's M2M uses different approach to solve complex tone sandhi probelms. Meta’s new AI-powered speech translation system presents a new approach for an unwritten language which is promising.
Note: MTL vocabularies, like English, may contain muti-syllables . The majority of Taiwanese vocabularies are 2-syllable words. The front syllables may be derived from a syllable which changes the tone based tone sandhi rule. But as a whole, when it is used as a noun, we don't perform any further tone change. Many words contain 'ar' at the end. When creating a word with 'ar', the front syllable will change tone. E.g "niaw" -> "niau'ar" (kitty cat). But if the front syllable is a flat tone, it does not change tone when follows with 'ar'. E.g. "te" -> "te'ar" (bag).
N Noun C Conjunction r Pronoun
Goar ka korng.
Goar ka y korng. - but this one does change tone.
e.g. phaotee khaikarng kofng'oe sngroe iusafn oansuie kekef hoxho hongthor binzeeng pexlau kviafiux lily laklagThe following do change tones, since the nouns are used as adjectives: e.g: tikhaf-mixsvoax zekpeq-hviati
Note: MTL is based upon southern Taiwan's accent. Hence, some tone change like curving tone to low falling tone or short tone to long tone are avoided.(e.g. mii -> miphoe, instead of mixphoe; ciah -> ciaqpng instead of ciaxpng). Of course, people would speak with their preferred tones.In addition, there are colloquial and literary tones in Taiwanese. While the pronounciation might be quite differnt (e.g. zap vs. sip), tone changes rules are the same.
For NLP, MTL has a big advantage due to it's adpotion of the concept of "voacbulary". Unlike 'harnji', the majority of the words in MTL are two syllables. E.g. 'sikoef' which stands for watermellon. And because it is a noun, there is no tone changes while speaking. Using multiple syllables for a word also overcome the ambiguity(homonyms) issues that other systems such as POJ encountered.
hofng teq zhoef hongzhoef hongzhoef ho hofng zhoef`khix