Page 1 of 1

Automatic pronunciation

Posted: Tue Aug 15, 2017 2:49 pm
by amhlaobh
A chàirdean,

I created an online app that gives the pronunciation of Scottish Gaelic words, which I think is the first one of its kind.

The link is https://gaelphon.gaelictools.com/.

It can give the IPA encoded pronunciation hints for any Scottish Gaelic looking word, be they lenited, plural, genitive, or even made up. This is possible through the underlying artificial neural network which has learned the rules by itself and can generalise.

I believe it is mostly correct, at least more than 80%, and where it isn't, it's at least quite close.

I'd love to hear your feedback about this, whether you find it useful, find errors or think of anything to improve this app, don't hesitate to write something here or send me a PM.

Automatic pronunciation

Posted: Tue Aug 15, 2017 4:34 pm
by akerbeltz
It's no worse than my rule-based system ;)

It does weird things to leacagach (misses the ag bit) and cuidhteas (misses the t) but if you're doing some fancy neural network stuff, you may not be able to "fix" stuff?

Also it gets its knickers in a twist over the length of i in sibhsigeach vs sibhsigeachadh which I find puzzling.

Automatic pronunciation

Posted: Tue Aug 15, 2017 9:26 pm
by amhlaobh
The key to improvement with neural networks is to have lots and lots of training data, to have all possible letter combinations and their pronunciations in redundant form. So things could be fixed that way if a suitable corpus were available. Otherwise I would have to resort to a list of hard wired exceptions, which I am reluctant to do.
There are other funny things like bruidhinn [briː.ɪɲ] vs bhruidhinn [vrɯjɪɲ] where you could argue that [iː.ɪɲ] for -uidhinn is the exception if you consider bhuidhinn [vujɪɲ].
It also has learned abhainn as [a.ɪɲ], but thinks it's [avɪɲ], which is reasonable, considering the rules for -bha- and that both pronunciations are possible (if I'm not mistaken).

So, if nothing else, it makes you think about how certain letter combinations represent certain pronunciation rules, and since it's so easy to change/add/remove letters, it's quite easy to try out lots.