1887

n Lexikos - Semi-automatic term extraction for an isiZulu linguistic terms dictionary

USD

 

Abstract

The University of KwaZulu-Natal (UKZN) is compiling a series of Language for Special Purposes (LSP) dictionaries for various specialized subject domains in line with its language policy and plan. The focus in this paper is the term extraction for words in the linguistics subject domain. This paper advances the use of frequency analysis and the keyword analysis as strategies to extract terms for the compilation of the dictionary of isiZulu linguistic terms. The study uses the isiZulu National Corpus (INC) of about 1,2 million tokens as a reference corpus as well as an LSP corpus of about 100,000 tokens as a study corpus. The study is analyzed through the use of a software tool called WordSmith Tools (version 6). WordSmith Tools (hence forth WS Tools) is an integrated suite of three main programs, which include the WordList, Concord and Keywords, used in analysing words and word patterns in any given text. Using the WS Tools software a lot of qualitative and quantitative research can be done in the language. Central to this study is a computational determination of which words are typical of the linguistic domain in isiZulu and therefore stand out as preferred candidates for headword selection. Thus the study uses the corpus linguistics method as a basis for theoretical analysis. The advantage of such a theoretical approach is that a corpus is stored and queried by means of computer and computer software, which makes it easy to find, sort and count items, either as a basis for linguistic description or for addressing language-related issues and problems. Using the WS Tools software, the study shows that term extraction for the isiZulu dictionary of linguistic terms is done following reliable computational techniques in corpus lexicography.


Die Universiteit van KwaZulu-Natal (UKZN) is besig met die samestelling van 'n reeks Taal vir Spesiale Doeleindes (TSD)-woordeboeke vir verskeie gespesialiseerde vakgebiede wat strook met hul taalbeleid en -plan. Die fokus van hierdie artikel is die termontrekking vir woorde in die vakgebied taalkunde. Die gebruik van frekwensieanalise en sleutelwoordanalise as strategieë in die samestelling van die isiZulu taalkundige termwoordeboek word bevorder. Die studie gebruik die isiZulu National Corpus (INC) van ongeveer 1,2 miljoen items as 'n verwysings-korpus asook 'n TSD-korpus van ongeveer 100,000 items as 'n studiekorpus. Die studie is ontleed met behulp van 'n sagteware nutsprogam, WordSmith Tools (weergawe 6). WordSmith Tools (voortaan WS Tools) is 'n geïntegreerde programsuite bestaande uit drie hoofprogramme, wat WordList, Concord en Keywords insluit, en wat gebruik word in die analise van woorde en woord-patrone in enige gegewe teks. Met behulp van die WS Tools-sagteware kan baie kwalitatiewe en kwantitatiewe navorsing in die taal gedoen word. Sentraal in hierdie studie is 'n rekenaarmatige bepaling van watter woorde verteenwoordigend is van die isiZulu-taalkundige domein en daarom voorkeur geniet by trefwoordseleksie. Sodoende word die korpuslinguistiekmetode as basis vir teoretiese analise gebruik. Die voordeel verbonde aan so 'n teoretiese benadering is dat 'n korpus gestoor en geraadpleeg word deur middel van 'n rekenaar en rekenaarsagteware, wat dit maklik maak om items te vind, te sorteer en te tel, öf as basis vir taalkundige beskrywing öf om taalkundig verwante kwessies en probleme aan te spreek. Deur gebruik te maak van WS Tools-sagteware, toon die studie dat term-ontrekking vir die isiZulu taalkundige termwoordeboek gedoen word deur betroubare rekenaarmatige tegnieke in korpusleksikografie te volg.

Loading

Article metrics loading...

/content/lexikos/25/1/EJC180573
2015-01-01
2016-12-06
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error