n Lexikos - A balanced and representative corpus : the effects of strict corpus-based dictionary compilation in Sesotho sa Leboa

Volume 23, Issue 1
  • ISSN : 1684-4904
  • E-ISSN: 2224-0039



Theoretically the Northern Sotho language is made up of almost 30 dialects while practically it is not so, because the standard language was formed from very few of its dialects. As a result, even today the language has no corpus which is balanced or representative owing to the fact that almost all of the available corpora are compiled from the written standard language and the written dialects. The majority of the Northern Sotho dialects do not have written orthographies, and the few dialects which had written orthographies prior to standardization came to monopolize the standard language and the Northern Sotho corpora. Therefore, the compilation of a corpus-based dictionary in Northern Sotho is tantamount to a continuation of producing unbalanced and unrepresentative dictionaries, which continue to sideline and to marginalize the majority of the communities and the linguistic varieties which could potentially enrich both the Northern Sotho standard language and the Northern Sotho corpora. The main objective with this research is to analyze, to expose and to suggest ways of correcting these irregularities so that the marginalized Northern Sotho dialects can be accommodated in the standard language. This will obviously increase the size of the Northern Sotho standard language and the corpus by more than 50%.

Teoreties bestaan die Noord-Sotho taal uit byna 30 dialekte, terwyl dit prakties nie die geval is nie omdat die standaardtaal uit slegs 'n paar van sy dialekte gevorm is. Gevolglik het die taal selfs vandag nog geen korpus wat gebalanseerd of verteenwoordigend is nie as gevolg van die feit dat byna al die beskikbare korpusse saamgestel is uit die geskrewe standaardtaal en die geskrewe dialekte. Die meerderheid Noord-Sotho dialekte het nie geskrewe ortografieë nie, en die paar dialekte wat geskrewe ortografieë gehad het voor standaardisasie het begin om die standaard-taal en die Noord-Sotho korpusse te monopoliseer. Die samestelling van 'n korpusgebaseerde woordeboek kom gevolglik neer op 'n voortsetting van die totstandbrenging van ongebalanseerde en onverteenwoordigende woordeboeke wat voortgaan om die meerderheid van die gemeenskappe en taalvariëteite opsy te skuif en te marginaliseer wat potensieel sowel die Noord-Sotho standaardtaal as die Noord-Sotho korpusse kan verryk. Die hoofdoel met hierdie navorsing is om maniere te ondersoek, uit te wys en voor te stel om hierdie ongelykhede reg te stel sodat die gemarginaliseerde Noord-Sotho dialekte in die standaardtaal ondergebring kan word. Dit sal vanselfsprekend die grootte van die Noord-Sotho standaardtaal en korpus met meer as 50% vermeerder.

Loading full text...

Full text loading...


Article metrics loading...


This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error