The SAWA corpus: a parallel corpus English-Swahili

Pauw GD, Wagacha PW, De Schryver G-M. "The SAWA corpus: a parallel corpus English-Swahili." Association for Computational Linguistics. 2009:9-16.


Abstract Research in data-driven methods for Machine Translation has greatly benefited
from the increasing availability of parallel corpora. Processing the same text in two different
languages yields useful information on how words and phrases are translated from a source
language into a target language. To investigate this, a parallel corpus is typically aligned by
linking linguistic tokens in the source language to the corresponding units in the target
language. An aligned parallel corpus therefore facilitates the automatic development of a

