 Home
 AZ Publications
 Annual Proceedings of the South African Statistical Association Conference
 Previous Issues
 Congress 1, 2014
Annual Proceedings of the South African Statistical Association Conference  Congress 1, January 2014
Volumes & Issues
Congress 1, January 2014

Comparing logistic regression methods for a sparse data set when complete separation is present
Authors: Michelle Botes and Lizelle FletcherSource: Annual Proceedings of the South African Statistical Association Conference 2014, pp 1 –8 (2014)More LessAn occurrence which is sometimes observed in a model based on dichotomous dependent variables is separation in the data. Separation in the data occurs when one or more of the independent variables can perfectly predict some binary outcome and it primarily happens in small samples. There are three different mutually exclusive and exhaustive classes into which the data from a logistic regression can be classified: complete separation, quasicomplete separation and overlap. Separation (either complete or quasicomplete) in the data gives rise to a number of problems since it implies infinite or zero maximum likelihood estimates which are unrealistic and does not happen in practice. In this paper different methods to deal with complete separation will be investigated when only continuous independent variables are considered.

Modelling extreme maximum annual rainfall for Zimbabwe
Authors: Retius Chifurira and Delson ChikobvuSource: Annual Proceedings of the South African Statistical Association Conference 2014, pp 9 –16 (2014)More LessIn this paper the generalized extreme value distribution is fitted to the mean annual rainfall to describe the extremes of rainfall. Extreme value theory (EVT) is used to estimate the probabilities of meteorological floods. The maxima distribution is used to fit the generalized extreme value distribution to the data and find probabilities of extreme high levels of mean annual rainfall. AndersonDarling goodness of fit test shows that the simpler generalized extreme value model of the Gumbel distribution provides a good fit. We explore the possibility of trends in the data. Results indicate that there are no such trends. The yearly mean return level estimates are derived and return level of 1193mm (recorded high mean annual rainfall amount) is associated with a mean return period of approximately 300 years. This paper provides the first application of extreme value distributions to mean annual rainfall from a drought prone country such as Zimbabwe.

Optimal factorial designs for twocolour microarray experiments : properties of admissible designs, A, D and Eoptimality criteria
Authors: Legesse Kassa Debusho, Dibaba Bayisa Gemechu and Linda M. HainesSource: Annual Proceedings of the South African Statistical Association Conference 2014, pp 17 –24 (2014)More LessIn this paper the properties of A, D, Eoptimal and admissible designs for a factorial structured twocolour microarray experiments are considered. Two types of parameterizations namely, Glonek and Solomon (2004) and standard factorial experiment (orthogonal), are used in the investigation. The numerical results show that A, Eoptimal and admissible designs depend on the parameterizations used in the model. The allocation of the treatment combinations in a 2 x 2 factorial experiment to the available number of arrays therefore do not necessarily coincide with one another for the two parameterizations.

Analysis of extreme rainfall at East London, South Africa
Authors: Tadele A. Diriba, Legesse Kassa Debusho, Joel Botai and Abubeker HassenSource: Annual Proceedings of the South African Statistical Association Conference 2014, pp 25 –32 (2014)More LessThe aim of the extreme value analysis is to quantify and analyze the stochastic behaviour of extreme values. The estimation of the best appropriate distribution for extreme rainfall is done using the extreme value theory by applying the generalized extreme value (GEV) distribution with a blockmaxima. The GEV distribution was also modified to take into account the temporal nonstationary trend in the annual maxima. Since the extreme rainfall observations are naturally scarce it is expected that the use of a Bayesian inference may improve the efficiency of the parameters estimates of the distribution compared to the maximum likelihood method. Therefore, the Bayesian approach was also applied in the paper using the Markov Chain Monte Carlo for the GEV distribution. However the expected improvement in efficiency is not fully achieved in this study using the noninformative and informative priors. The blockmaxima method for extreme value analysis is often wasteful of data, specially when more data on the extremes are available, leading to large uncertainties on return level estimates. Therefore, rather than using annual maxima it may be better to consider the daily rainfall data because these data lead to less wastage of information.

Aoptimal designs for twocolour cDNA microarray experiments using the linear mixed effects model
Authors: Dibaba Bayisa Gemechu, Legesse Kassa Debusho and Linda M. HainesSource: Annual Proceedings of the South African Statistical Association Conference 2014, pp 33 –40 (2014)More LessTwocolour cDNA microarray experiments help scientist to study the expression level of thousands of genes simultaneously under different conditions. Microarray experiments have different design challenges, such as for example which mRNA samples should be cohybridized together and which treatment should be labelled with which dye fluorescent. Therefore, a carefully designed microarray experiment to obtain efficient and reliable data so as to assure the precise estimate of parameters that are of interest is needed. The present paper is concerned with Aoptimal block designs for twocolour microarray experiments, where array is treated as the experimental block. Linear mixed effects models were used to describe the experiments, by taking the arrays as random effects, when comparisons of all possible pairs of treatments are of particular interest. The numerical results show that the optimal block designs under the linear fixed effects model are not necessarily optimal under the linear mixed effects model setting.

A study on the apparent randomness of an animal sample
Authors: C. Kraamwinkel, I.N. FabrisRotelli, G. Fosgate, D. Knobel and K. HampsonSource: Annual Proceedings of the South African Statistical Association Conference 2014, pp 41 –48 (2014)More LessIt is mostly impossible to design sampling schemes for wildlife which accurately represent the whole population. Samples are almost always convenience samples as a researcher will obtain data from each animal they can obtain. The representability of the samples need to thus be measured in some way and reported together with the results obtained for a complete setting. This paper provides an introduction to this issue in wildlife research and provides some examples of such cases.

Hurst exponent for linear regression processes
Author Igor LitvineSource: Annual Proceedings of the South African Statistical Association Conference 2014, pp 49 –56 (2014)More LessHurst exponent is widely used in the timeseries analysis as a measure of predictability. In this paper we evaluate the Hurst exponent for linear processes described by linear regression models assuming different distributions of the errors. The method uses a combination of an analytical approach and computer simulations. We also found a link between the exponent and the kurtosis of the distributions.

JumpDiffusion basedsimulated Expected Shortfall (SES) method as an alternative measure to ValueatRisk model in predicting financial risk optimally in a stressed economic climate
Authors: Sibusiso V. Magagula and John O. OlaomiSource: Annual Proceedings of the South African Statistical Association Conference 2014, pp 57 –64 (2014)More LessValueatRisk (VaR) model is widely used to predict market risk capital under Basel Capital Adequacy (BCA) framework. However, during financial crisis this measure fails to predict market risk accurately due to its mathematical limitations. The Expected Shortfall (ES) measure was developed as an alternative to VaR, but ES modelling results are hard to backtest. Although the ES measure have fewer limitations than VaR but due to the backtesting issue could not be used in the BCA framework. Furthermore, estimates of VaR and ES are affected by estimation error because these measures use limited sample size results in sampling fluctuation. Therefore, the proposed model which is a combination of the jump diffusion model and the ES measure will minimise the estimation error. This model is called the SimulatedExpected Shortfall (SES) model. Although, the SES measure's estimates are more conservative than other models studied in the paper in predicting market risk during crisis. This measure have large Type I error in terms of being backtestable.

Population mean estimation from ranked set sampling with unknown auxiliary variable mean
Authors: J.O. Olaomi, Raghunath Arnab and Raghunath ArnabSource: Annual Proceedings of the South African Statistical Association Conference 2014, pp 65 –72 (2014)More LessRanked set sampling is used when the measurement or quantification of units of the variable under study is difficult but the ranking of units of sets of small sizes can be done easily by an inexpensive method. Stokes (1977), Prasad (1989), Kadillar, Unyazici and Cingi (2009) and Singh, Tailor and Singh (2014) considered the estimation of the population mean of the study variable Y (μy) assuming the population mean of the auxiliary variable X (μx) is known, having considered ranking as an auxiliary variable. In this paper, we have proposed improved methods of estimation of the population mean μy using the ranking variable x as an auxiliary variable when the population mean μx is unknown. An empirical investigation based on life data shows all proposed estimators are approximately unbiased and bring gain in efficiency of up to 50 percent over the conventional RSS estimator (the sample mean).