South African Computer Journal - Volume 2006, Issue 36, 2006
Volumes & issues
Volume 2006, Issue 36, 2006
Editorial introduction to special ARIMA / SACJ joint issue on advances in end-user data mining techniquesSource: South African Computer Journal 2006, pp 1 –3 (2006)More Less
Although modern hardware and database technology has made it possible to store gigabytes of informationin databases, our human capacity for analyzingand exploiting these large amounts of data is limited. The rapid advancement towards increasingly large-scale digitalization has highlighted the importance of tools and techniques for efficiently delving into and discovering valuable, non-obvious information from such databases. In recent times, the field ofknowledge discovery in databases (KDD) has emerged as a new research discipline, lying at the crossroads ofstatistics, machine learning, data management, and other areas.
Author M. HartSource: South African Computer Journal 2006, pp 4 –15 (2006)More Less
This paper describes three largely qualitative studies, spread over a five year period, into the current practice of data mining in several large South African organisations. The objective was to gain an understanding through in-depth interviews of the major issues faced by participants in the data mining process. The focus is more on the organisational, resource and business issues than on technological or algorithmic aspects. Strong progress is revealed to have been made over this period, and a model for the data mining organisation is proposed.
Source: South African Computer Journal 2006, pp 16 –28 (2006)More Less
A new methodology for automated extraction of repeated patterns in time-series data is presented, aimed in particular at the analysis of musical sequences. The basic principles consists in a search for closed patterns in a multi-dimensional parametric space. It is shown that this basic mechanism needs to be articulated with a periodic pattern discovery system, implying therefore a strict chronological scanning of the time-series data. Thanks to this modelling global pattern filtering may be avoided and rich and highly pertinent results can be obtained. The modelling has been integrated in a collaborative project between ethnomusicology, cognitive sciences and computer science, aimed at the study of Tunisian Modal Music.
One-class classifiers : a review and analysis of suitability in the context of mobile-masquerader detectionAuthor O. MazhelisSource: South African Computer Journal 2006, pp 29 –48 (2006)More Less
One-class classifiers employing for training only the data from one class are justified when the data from other classes is difficult to obtain. In particular, their use is justified in mobile-masquerader detection, where user characteristics are classified as belonging to the legitimate user class or to the impostor class, and where collecting the data originated from impostors is problematic. This paper systematically reviews various one-class classification methods, and analyses their suitability in the context of mobile-masquerader detection. For each classification method, its sensitivity to the errors in the training set, computational requirements, and other characteristics are considered. After that, for each category of features used in masquerader detection, suitable classifiers are identified.
Source: South African Computer Journal 2006, pp 49 –56 (2006)More Less
In this paper we present a hybrid approach to segment and classify contents of document images. A Document Image is segmented into three types of regions: Graphics, Text and Space. The image of a document is subdivided into blocks and for each block five GLCM (Grey Level Co-occurrence Matrix) features are extracted. Based on these features, blocks are then clustered into three groups using K-Means algorithm; connected blocks that belong to the same group are merged. The classification of groups is done using pre-learned heuristic rules. Experiments were conducted on scanned newspapers and images from Media Team Document Database.
New evolutionary classifier based on Genetic Algorithms and neural networks : application to the bankruptcy forecasting problemAuthor M.A. EsseghirSource: South African Computer Journal 2006, pp 57 –68 (2006)More Less
Artificial neural networks (ANNs) have been widely applied in data mining as a supervised classification technique. The accuracy of this model is mainly provided by its high tolerance to noisy data as well as its ability to classify patterns on which they have not been trained. Moreover, the performance of ANN based models mainly depends both on the ANN parameters and on the quality of input variables. Whereas, an exhaustive search on either appropriate parameters or predictive inputs is very computationally expensive. In this paper, we propose a new hybrid model based on genetic algorithms and artificial neural networks. Our evolutionary classifier is capable of: selecting the best set of predictive variables, then, searching for the best neural network classifier and improving classification and generalization accuracies. The designed model was applied to the problem of bankruptcy forecasting, experiments have shown a very promising results for the bankruptcy prediction in terms of predictive accuracy and adaptability.
Source: South African Computer Journal 2006, pp 69 –85 (2006)More Less
Sampling of large datasets for data mining is important for at least two reasons. The processing of large amounts of data results in increased computational complexity. The cost of this additional complexity may not be justifiable. On the other hand, the use of small samples results in fast and efficient computation for data mining algorithms. Statistical methods for obtaining sufficient samples from datasets for classification problems are discussed in this paper. Results are presented for an empirical study based on the use of sequential random sampling and sample evaluation using univariate hypothesis testing and an information theoretic measure. Comparisons are made between theoretical and empirical estimates.
Source: South African Computer Journal 2006, pp 86 –94 (2006)More Less
This article reports on the approach taken, experience gathered, and results found in building a tool to support the derivation of solutions to a particular kind of word game. This required that techniques had to be derived for simple yet acceptably quick access to a dictionary of natural language words (in the present case, Afrikaans). The main challenge was to access a large corpus of natural language words via a partial match retrieval technique. Other challenges included discovering how to represent such a dictionary in a "semi-compressed" format, thus arriving at a balance that favours search speed but nevertheless derives a savings on storage requirements. In addition, a query language had to be developed that would effectively exploit this access method. The system is designed to support a more intelligent query capability in the future. Acceptable response times were achieved even though an interpretive scripting language, ObjectREXX, was used.
Source: South African Computer Journal 2006, pp 95 –98 (2006)More Less
Context-free languages are well studied, but of limited practical use due to their simplicity. Context-sensitive languages on the other hand provide more than enough power for compiler construction, but are difficult to use precisely because of this. Random context languages fall within the gap between these two types and make a trade-off between expressivity and ease-of-use.
To select the most appropriate grammar class for a task it helps to know which languages it can and cannot generate. Examples of languages beyond the generative power of a grammar class give one a sense of what is being traded away when selecting a simpler type. These examples are also of theoretical interest for probing the limits of formal languages.
Although it has been shown that one must exist, to date there is no known example of a language that may be generated by a context-sensitive grammar but not by a random context grammar. This paper considers one language conjectured in the literature to fall within this gap and shows that it does not in fact do so, by giving a random context grammar capable of generating it.
Translating mutually recursive function systems into generalized random context picture grammars : reviewed articleSource: South African Computer Journal 2006, pp 99 –109 (2006)More Less
Previous research showed that any Iterated Function System (IFS) can be translated into an equivalent Generalised Random Context Picture Grammar (GRCPG). It has also been shown that GRCPGs can be constructed that generate sets of pictures that cannot be generated by any IFS. Mutually Recursive Function Systems (MRFSs) are a generalisation of Iterated Function Systems. In this paper we show that for any MRFS, an equivalent GRCPG can be constructed and that GRCPGs can be constructed that generate sequences of pictures that cannot be generated by an MRFS.
Source: South African Computer Journal 2006, pp 110 –114 (2006)More Less
The column subtraction method (CSM) is a branch-and-bound method for solving set partitioning, covering and packing problems. We present an improved CSM for the set partitioning problem and computational experience for an implementation of the improved CSM. Our computational experience shows that it is at least an order of magnitude faster than the original version on larger more complex problems and compares favorably with the branch-and-cut algorithm in solving real life airline crew scheduling problems.
Source: South African Computer Journal 2006, pp 115 –123 (2006)More Less
Usability evaluation techniques have evolved over several years to assess the user interface of systems with regard to efficiency, interaction flexibility, interaction robustness and quality of use. The evaluation of the user's thought process is difficult to access with traditional usability techniques. Eye movement data and eye fixations can supplement the data obtained through usability testing by providing more specific information on the user's visual attention. Network Management (NM) tools have been developed to analyse the large amount of data generated by network applications and to display the data using various information visualisation (IV) techniques. The general increase in the use of IV techniques has highlighted the need for methodologies to evaluate the user interface of software, including NM tools. This article investigates how eye tracking data can supplement the usability evaluation data of the IV techniques used in a NM tool. This article further discusses the results obtained from a usability evaluation that used a methodology combining traditional usability methods and eye tracking methods for the usability evaluation of the IV techniques used by a NM tool. The results show that eye tracking does provide additional value to the usability evaluation of IV techniques.