While creating lexicon in voice recognition in Bahasa Indonesia, we need to define the phoneme set by ourselves since there are not such a widely used standard in Bahasa Indonesia. Instead of creating new definition, there is an idea to adapt from existing phoneme set. Obtained from Kaldi resources, we can adapt the phoneme set from English issued by Carnegie Mellon University (CMU Dictionary) which contains 134,000 words.
Bahasa Indonesia is quite simplelook here also as in major case the pronunciation and written letter are the same compared to English. Thus, it is not a tedious work to start building lexicon based on CMU Dictionary although we need to add new phonemes and leave some phonemes.
Table 1 CMU Dictionary Phoneme Set
Phoneme | IPA Symbol | Indonesian Words Example | English Words Example |
---|---|---|---|
AA | ɑ | ternak, gembala | odd, balm |
AE | æ | - | at, bat |
AH | ʌ | ambil | hut, butt |
AO | ɔ | bakpao | cow, story |
AW | aʊ | saung | ought, bout |
AY | aɪ | kait, senarai | hide, bite |
B | b | lebah, beli | bee, buy |
CH | tʃ | cuka, ceri | cheese, china |
D | d | diam, duduk | dump, did |
DH | ð | ridho | the, thy |
EH | ɛ | enak, sepak | education, bet |
ER | ɝ | bageur*, reueus* | hurt |
EY | eɪ | - | ate, bait |
F | ɾ | faedah, fana | fee, forest |
G | g | gerbang, guna | green, gate |
HH | h | hampar, unggah | he, hair |
IH | ɪ | singgah, ikatan | it, implication |
IY | i | - | eat, sheep |
JH | dʒ | adiraja, keganjilan | genuine, jimmy |
K | k | kenangan, batuk | key, camp |
L | l | lingkaran, betul | luck, love |
M | m | minum, temaram | mama, mine |
N | n | naik, menikah | knee, nice |
NG | ŋ | yang, ngengat | bank, sink |
OW | oʊ | bongkar, bogor | oat, boat |
OY | ɔɪ | - | toy, boy |
P | p | pulsa, peluh | pulp, pen |
R | ɹ | ranjau, rintangan | right, row |
S | s | sakit, sayang | sea, sun |
SH | ʃ | masyarakat, syaikh* | shine, she |
T | t | tikung, timpa | tea, tone |
TH | θ | rabiul tsani* | thug, theta |
UH | ʊ | - | hood, book |
UW | u | kuku, suku | two, coup |
V | v | viral, vas | vee, vocal |
W | w | wejangan, wayang | we, wide |
Y | j | yakin, yoga | yam, yield |
Z | z | zaman, zamrud | zoo, zee |
ZH | ʒ | jangkrik, jerapah | seizure, pleasure |
From table above, it’s clear that there are phonemes that are not (commonly) used in Bahasa Indonesia. Yet these phoneme set does not cover all Bahasa Indonesia lemma regarding to the root of Bahasa Indonesia which come from majorly Malay, Dutch, Arabic, Chinese, Javanese, and Sundanese. To create the things short, here is list of phonemes that needs to be added to CMU Dictionary for Bahasa Indonesia
Table 2 Addition Phoneme Set
Phoneme | IPA Symbol | Indonesian Words Example | Notes |
---|---|---|---|
NY | ɲ | kenyang, nyamuk | alveolo palatal |
KH | x | kholifah | voiceless velar fricative |
Q | ʔ | qurban | |
KX | ʕ | sa’at | voiced pharyngeal fricative |
DL | zˁ | dhuhur | voiced alveolar sibilant with pharyngealization |
GH | ɣ | ghaib | voiced velar fricative |
sources :
[1] http://kaldi-asr.org/doc/examples.html
[2] http://www.speech.cs.cmu.edu/cgi-bin/cmudict
[3] https://en.wikipedia.org/wiki/ARPABET
[4] https://open-dict-data.github.io/ipa-lookup/ma/