DR. MUTHONI MASINDE
PhD. Computer Science, MSc. Computer Science (Distinction), BSc. Computer Science (Upper Second Division)
Chiromo Campus, School of Computing and Informatics Building, muthoni@uonbi.ac.ke
Chiromo Campus, School of Computing and Informatics Building, muthoni@uonbi.ac.ke
The need to conserve the under-resourced languages is becoming more urgent as some of them are becoming extinct; natural language processing can be used to redress this. Currently, most initiatives around language processing technologies are focusing on western languages such as English and French, yet resources for such languages are already available. Sesotho language is one of the under-resourced Bantu languages; it is mostly spoken in Free State province of South Africa and in Lesotho. Like other parts of South Africa, Free State has experienced a high number of non-Sesotho speaking migrants from neighbouring provinces and countries. Such people are faced with serious language barrier problems especially in the informal settlements where everyone tends to speak only Sesotho. As a solution to this, we developed a parallel corpus that has English as a source and Sesotho as a target language and packaged it in UmobiTalk - Ubiquitous mobile speech based learning translator. UmobiTalk is a mobile-based tool for learning Sesotho for English speakers. The development of this tool was based on the combination of automatic speech recognition, machine translation and speech synthesis. This application will be used as an analysis tool for testing accuracy and speed of the corpus. We present the development, testing and evaluation of UmobiTalk in this paper. Keywords: UmobiTalk, Automatic speech recognition (ASR), Machine translation (MT), Text to speech (TTS) and Parallel corpora