n Lexikos - Semi-automatic term extraction for the African languages, with special reference to Northern Sotho : research article

Volume 12, Issue 1
  • ISSN : 1684-4904
  • E-ISSN: 2224-0039



Go ntšhwa ga mareo ka tirišo ya seripa sa semotšhene go tšwa ka gare ga dikhophase go thomile go ba setlwaedi go hlangweng ga mananeo a mareo, dipanka tša mareo goba dipukuntšu mererong yeo e itšego lefaseng ka bophara. Ge e le gore boramareo ba maleme a Afrika ba ikemišeditše go tšea madulo a bona mo mileneamong wo mofsa, ga ba swanela go hlokomela fela tsela ye, eupša ba swanetše gape ke go ikemišetša go diriša theknolotši ye mphsa. Mo taodišwaneng ye go hlalošwa gore mo nakong ye, tsela ye kaone ya go dira dilo tše pedi tše go boletšwego ka tšona ke go kgetha ditlhamolo tša thwii tšeo di dirišago khomphutha (se se ra gore tšhomišo ya khophase) le go šomiša ditlabakelo tša (bj.k. Tools) tšeo di lego gona gohle. Ka fao maikemišetšo a magolo ke go humana ge e ka ba go ntšhwa ga mareo ka seripa sa semotšhene go tšwa ka gare ga khophase yeo e se nago ditlaleletšo tšeo di tseneletšego ka mašakaneng, tša go hlahla, go ka dirišwa malemeng a Afrika goba aowa. Gore re kgone go araba potšišo ye, go hlalošitšwe ka tsinkelo mohlala wa taba ya go nyakišišwa yeo e amanego le diteng tša thutapolelo tša Sesotho sa Leboa. Dipoelo tšeo di humanwego ka go diriša khomphutha di bapetšwa ka gohle le dipoelo tšeo di humanwego ge go dirišwa kgetho ya mantšu ka matsogo. šedi e fiwa dikgopolo tša kgakologelo () le nepagalo (); mekgwa yeo e fapafapanego e a akanywa gore e kgone go hlatholla mareo a lentšu le tee ge a bapetšwa le mareo a mantšu a mantši; gomme dikhumano tšeo di fapanego di akaretšwa ka gare ga pukuntšu ya Mareo a Thutapolelo yeo e tšweletšwago bjalo ka Mamatletšo.

Worldwide, semi-automatically extracting terms from corpora is becoming the norm for the compilation of terminology lists, term banks or dictionaries for special purposes. If African language terminologists are willing to take their rightful place in the new millennium, they must not only take cognisance of this trend but also be ready to implement the new technology. In this article it is advocated that the best way to do the latter two at this stage, is to opt for computationally straightforward alternatives (i.e. use 'raw corpora') and to make use of widely available software tools (e.g. WordSmith Tools). The main aim is therefore to discover whether or not the semiautomatic extraction of terminology from untagged and unmarked running text by means of basic corpus query software is feasible for the African languages. In order to answer this question a fullblown case study revolving around Northern Sotho linguistic texts is discussed in great detail. The computational results are compared throughout with the outcome of a manual excerption, and vice versa. Attention is given to the concepts 'recall' and 'precision'; different approaches are suggested for the treatment of single-word terms versus multi-word terms; and the various findings are summarised in a Linguistics Terminology lexicon presented as an Appendix.

Loading full text...

Full text loading...


Article metrics loading...


This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error