Short Bio

Dr. Wanjiku Ng’ang’a is a senior faculty member at the School of Computing and Informatics, University of Nairobi, where she is actively involved in research as well as teaching and supervision of undergraduate and postgraduate research projects. She holds a doctorate degree from the University of Helsinki, Finland, an M.Phil degree from Cambridge University, UK, and a Bachelors degree in Computer Science from the University of Nairobi.



Ogot, M, Nganga W.  2012.  Anchoring and Weighting Knowledge Economy and Knowledge Indices as improved measures of a country’s readiness for the Knowledge Economy: A case study of Kenya, 2012. 4(10):25-40. Abstract

This study sought to develop a set of indices better able to track a country's readiness for the knowledge economy. The new indices, the Anchored Knowledge Index and the Anchored Knowledge Economy Index are based on the World Bank (WB) knowledge economy framework. The rationale for the introduction, and the procedures to calculate the new indices are presented. The WB indices provide for rank-ordered normalization based on the latest data available for a benchmarking group of countries. The proposed anchored set of indices, however, provides for a relative ordering of the data. Relative-order (weighting) determines by how much each country, along a particular indicator, is better (or worse) than the others. The new indices address the short-coming of rank-order where as long as the relative positions of the benchmarking countries remain the same, the indices do not change even though the gaps between countries could be decreasing (desired) or increasing (cause for alarm). Further, the subject country now appears twice, based on both the latest data available, and a baseline (anchor) from the World Bank Knowledge Assessment Methodology 2009 data. Using Kenya as a case study, a basic scorecard for Kenya is proposed and used for the calculation of the indices for Kenya and ve benchmark countries, Singapore, South Africa, Japan, South Korea. The results clearly illustrate the efficacy of the proposed approach in tracking a countries readiness for the knowledge economy.

Nganga, W.  2012.  Building Swahili Resource Grammars for the Grammatical Framework, 2012. Shall We Play the Festschrift Game? Essays on the Occasion of Lauri Carlson’s 60th Birthday.. (Diana Santos, Krister Linden, Wanjiku Nganga, Ed.).:215-226.: Springer Abstract

Grammatical Framework (GF) is a multilingual parsing and generation framework. In this paper, we describe the development of the Swahili Resource Grammar, a first in extending GF’s coverage with a Bantu language. The paper details the linguistic detail and considerations that have to be addressed whilst defining the grammars. The paper also describes an end-user application that uses the developed grammars to achieve multilinguality.


de Visendi P, Ng'ang'a W, BBOVEPWRJ.  2011.  TparvaDB: a database to support Theileria parva vaccine development, 2011. Abstract


Nganga, W.  2010.  Towards a Comprehensive, Machine-readable Dialectal Dictionary of Igbo, 2010. Proceedings of the Second Workshop on African Language Technology (AfLaT 2010), European Language Resources Association (ELRA). (Guy De Pauw, H.J. Groenewald, De Schryver, Gilles-Maurice, Eds.).:63-67. Abstract

Availability of electronic resources, textual or otherwise, is a first step towards language technology research and development. This paper describes the acquisition and processing of a multi-dialectal speech and text corpora for the Igbo language. The compiled corpus provides the key resource for the denition of a machine-readable dialectal dictionary for Igbo. The work centres around an online portal that facilitates collaborative acquisition, denition and editing of the dialectal dictionary. The complete dictionary, which includes features such as phonetic pronounciation, syllabication, synthesized pronounciation as well as GIS locations, is then made available via a website.

Mutahi, J, Nganga W.  2010.  Extending the Grammatical Framework with Swahili Translation Capability, 2010. Tenth CHAKITA Conference. Abstract
Kamau, C., WDPMN'ang'a NMPWGL.  2010.  Developing an Open source spell checker for Gikuyu, 2010. Proceedings of the Second Workshop on African Language Technology (AfLaT 2010), European Language Resources Association (ELRA. (Guy De Pauw, H.J. Groenewald, De Schryver, Gilles-Maurice, Eds.).:31-35. Abstract

In this paper, we describe the development of an open source spell checker for Gikuyu language using the Hunspell language tools. We explore the morphology of Gikuyu, highlighting the inflection of various parts of speech in Gikuyu including verbs, nouns, and adjectives among others. In Hunspell, surface words are realized as a set of continuation classes, with each class providing a morpheme with a specific function. In addition, circumfixation, which is prevalent in Gikuyu derived nouns, is implemented. Hunspell also provides for word suggestion using character prevalence and replacement rules. Given that the developed Gikuyu spellchecker and the Hunspell tools are open source, the spell checking function developed in this work can be adopted in major open-source products such as Mozilla and OpenOffice products. The spell checker has a fairly representative Gikuyu lexicon and achieves an acceptable realization of a Gikuyu spellchecker. When tested on a test corpus, the spell checker attains a precision of 82%, recall of 84% and an accuracy of 75%.


Nganga, W.  2008.  African Language Technology for Multilingual Local Content Development, 2008. Proceedings of the Third International IST-Africa Conference. Abstract

Information and Communication Technologies have been identified as major catalysts to rapid and sustainable development, especially in today's information-driven economies. The provision of timely, accurate and relevant information to the masses should therefore be a prime consideration in any development agenda. For this information to have an effective developmental impact, it must be presented in the language that the populace is most proficient in, usually the language(s) commonly spoken in day to day life. The language factor should therefore be recognized as a critical success factor in the deployment of ICTs for development, especially in the areas of education, health and governance. While Africa is home to a third of all the world's languages, the “information languages” on the continent are more often than not, European languages, namely English, French and Portuguese, which are by and large, languages of the educated elite. This has had the adverse effect of locking out a huge percentage of Africa's populace from effectively participating in the increasingly information-based economies. This paper discusses how language technology can be exploited to support the development of relevant local content that addresses and satisfies the multilingual requirements deriving from the continent's linguistic and cultural diversity, effectively reducing the language barrier to technology that exists especially among Africa's rural populations.


Wanjiku Ng’ang’a, AL, Carlson L.  2006.  Multilingual Generation of Live Math Problems in WebALT, 2006. Proceedings of the First WebALT Conference and Exhibition. (Mika Seppälä, Sebastian Xambo, Olga Caprotti, Ed.).:155-159. Abstract
Wanjiku Ng'ang'a, Anni Laine, LC.  2006.  Natural Language Generation from OpenMath, 2006. Abstract
Olga Caprotti, Mika Seppala, WN.  2006.  Multilingual Technology for teaching Mathematics, 2006. Advances in Computer, Information, and Systems Sciences, and Engineering. (, Elleithy, et al, Eds.).:380-386.: Springer Abstract

This paper describes the experiences acquired and the goals of the European project Web Advanced Learning Technology, WebALT, in developing a multilingual showcase of exercise problems in mathematics to be used by university students.

Nganga, W.  2006.  Multilingual Content Development for e-Learning in Africa, 2006. Proceedings of eLearning Africa: 1st Pan-African Conference on ICT for Development, Education and Training. Abstract
Daniel Marques, Wanjiku Ng'ang'a, GC.  2006.  Web service for the multilingual delivery of mathematical content, 2006. Abstract


Strotmann, A., N'ang'a CW & O.  2005.  Multilingual Access to Mathematical Exercise Problems, 2005. Proceedings of the Internet-accessible Mathematical Computation Workshop (ISSAC 05). Abstract

TheWeb Advanced Learning Technologies (WebALT) project, financed through the European Union’s eContent programme, is working to provide pan-European (and eventually world-wide) multilingual and multicultural internet access to a repository of algorithmically generated exercises for students and teachers of mathematics at the secondary and tertiary education levels, building as much as possible on existing frameworks, standards, and software. The two-year WebALT project has reached its quarter mark now, and we can now report early results. We are working on a framework in which a large percentage of undergraduate and highschool mathematics exercises can be created in the language independent form of content markup in a way that captures both the meaning of the simple sentences and the formulas embedded in them that together make up a math problem. Such content, expressed in OpenMath because it is more easily extended with the extra concepts required here, is then localized using language-specific content-topresentation markup stylesheets for the embedded formulae, and natural language generation techniques for rendering the embedding sentences that tell the students what to do with those formulae in the language of their choice.

Nganga, W.  2005.  Word Sense Disambiguation of Swahili: Extending Swahili Language Technology with Machine Learning, 2005. : Helsinki University Press Abstract

This thesis addresses the problem of word sense disambiguation within the context of Swahili-English machine translation. In this setup, the goal of disambiguation is to choose the correct translation of an ambiguous Swahili noun in context. A corpus based approach to disambiguation is taken, where machine learning techniques are applied to a corpus of Swahili, to acquire disambiguation information automatically. In particular, the Self-Organizing Map algorithm is used to obtain a semantic categorization of Swahili nouns from data. The resulting classes form the basis of a class-based solution, where disambiguation is recast as a classification problem. The thesis exploits these semantic classes to automatically obtain annotated training data, addressing a key problem facing supervised word sense disambiguation. The semantic and linguistic characteristics of these classes are modelled as Bayesian belief networks, using the Bayesian Modelling Toolbox. Disambiguation is achieved via probabilistic inferencing.The thesisdevelops a disambiguation solution which does not make extensive resource requirements, but rather capitalizes on freely-available lexical and computational resources for English as a source of additional disambiguation information. A semantic tagger for Swahili is created by altering the configuration of the Bayesian classifiers. The disambiguation solution is tested on a subset of unambiguous nouns and a manually created gold standard of sixteen ambiguous nouns, using standard performance evaluation metrics.

Caprotti, O., N'ang'a SW & M.  2005.  Multilingual technology for teaching mathematics, 2005. Proceedings of the International Conference on Engineering Education, Instructional Technology, Assessment, and E-Learning (EIAE 05). Abstract


Nganga, W.  2004.  Towards Machine Translation of African Languages: Requirements, Challenges and Achievements, 2004. Proceedings of the 24th West African Linguistics Conference. Abstract


Nganga, W.  2003.  Automatic Word Sense Disambiguation of Kiswahili Nouns, 2003. Proceedings of the Fourth World Congress of African Linguistics.. :381-390.: Köln: Rüdiger Köppe Verlag Abstract
Nganga, W.  2003.  Semantic analysis of Kiswahili words using the Self-Organizing Map, 2003. :405-423. Abstract

Acquisition of semantic knowledge to support natural language processing tasks is a nontrivial task, and more so if manually undertaken. This paper presents an automatic lexical acquisition method that learns semantic properties of Kiswahili words directly from data. The method exploits Kiswahili’s system of nominal and concordial agreement that is inherently rich with semantic information, to capture the morphological and syntactic contexts of words. Classification of nouns and verbs into clusters of semantically-similar words is done based on this contextual encoding. The method uses training data from the Helsinki corpus of Kiswahili while the machine-learning component is implemented using the Self-organizing Map algorithm. The proposed method offers an efficient and consistent way of augmenting lexicons with semantic information, where electronic corpora of the language in question are available. It also provides researchers with an investigative tool that can be used to identify dependencies within linguistic data and represent them in an understandable form, for further analysis.

UoN Websites Search