Giving Our Mother Tongue a Digital Future: The MTLTag & VITS Pipeline

Taiwanese is defined by the Tone Sandhi Circle. Unlike non-tonal languages such as English and Japanese, where intonation is the primary carrier of affect, Taiwanese requires ”Tonal-Safe Emotional Mapping.”

MTL is a phonetically consistent, multi-syllable orthography based on POJ. MTL is sandhi-aware, meaning the written form of a multi-syllable word reflects the tone of the front syllable after Tone Sandhi has applied. This overcame the homonym ambiguity in character-based writing system.

Historically, this system functioned as a structural checksum; our research suggests that native speakers utilize a form of ”Perfect Pitch” to decode these melodic headers. When universal AI models fail to respect this melodic continuity, intelligibility drops significantly.

By treating Taiwanese as a formal signal-processing system rather than a collection of statistical probabilities, we enable a ”Sovereign AI” that outperforms universal models in both accuracy and cultural resonance.

MTL-TTS model utilized new modeling technologies:

Synthesized Audio Samples

Note: This model was trained from a pool of two females and one male with 9000+ sentences for 36 hours with 160,000 steps of iteration(reading and listening 4 sentences at a time). For VITs, it is said that 350,000 steps are required to achieve a best model. We are sorting out and balancing the next training dataset. by changing the speed and pitch of one female and one male who are the minority of this dataset. By balancing multiple speakers training, we have better control. As a result, we are able to choose which person's voice for audio synthesis.


Modern Taiwanese Language (MTL) | Washington DC Taiwan School