South African Statistical Journal: Most Recent Articles
https://journals.co.za/content/journal/sasj?TRACK=RSS
Please follow the links to view the content.Optimality in weighted L2-Wasserstein goodness-of-fit statistics
https://journals.co.za/content/journal/10520/EJC-1c22980770?TRACK=RSS
<div>In Del Barrio, Cuesta-Albertos, Matran and Rodriguez-Rodriguez (1999) and Del Barrio, Cuesta-Albertos and Matran (2000), the authors introduced a new class of goodness-of-fit statistics based on the <em>L<sub>2</sub></em>-Wasserstein distance. It was shown that the desirable property of loss of degrees-of-freedom holds only under normality. Furthermore, these statistics have some limitations in their applicability to heavier-tailed distributions. To overcome these problems, the use of weight functions in the statistics was proposed and investigated by De Wet (2000), De Wet (2002) and Csörgo (2002). In the former the issue of loss of degrees-of-freedom was considered and in the latter the application to heavier-tailed distributions. In De Wet (2000) and De Wet (2002) it was shown how the weight functions could be chosen in order to retain the loss of degrees-of-freedom property separately for location and scale families. The weight functions that give this property, are the ones that give asymptotically optimal estimators for respectively the location and scale parameters – thus estimation optimality. In this paper we show that in the location case, this choice of “estimation optimal” weight function also gives “testing optimality”, where the latter is measured in terms of approximate Bahadur efficiencies.</div>Sun, 01 Mar 2020 00:00:00 GMThttps://journals.co.za/content/journal/10520/EJC-1c22980770?TRACK=RSSTertius de Wet and Veronica Humble2020-03-01T00:00:00ZEfficiency behaviour of kernel-smoothed kernel distribution function estimators
https://journals.co.za/content/journal/10520/EJC-1c22a4395a?TRACK=RSS
<div>The asymptotic mean integrated squared error (AMISE) and the kernel efficiency (KE) of kernel distribution function estimators are well studied. In this note we define new nonparametric distribution function estimators by kernel-smoothing an initial kernel distribution function estimator. We show that, under certain conditions, the AMISE and the KE can be improved. A concrete example and a Monte Carlo simulation are worked out for illustration.</div>Sun, 01 Mar 2020 00:00:00 GMThttps://journals.co.za/content/journal/10520/EJC-1c22a4395a?TRACK=RSSPaul Janssen, Jan W. H. Swanepoel and Noël Veraverbeke2020-03-01T00:00:00ZLinear inference under matrix-stable errors
https://journals.co.za/content/journal/10520/EJC-1c22bb7343?TRACK=RSS
<div>Linear inference is the foundation stone for much of theoretical and applied statistics. In practice errors often have excessive tails and are lacking the moments required in conventional usage. For random vector responses such errors often are modeled via spherical -stable distributions with stability index 2 (0, 2], arising in turn through central limit theory but converging to non-Gaussian limits. Earlier work [Jensen, D.R. (2018). Biom. Biostat. Int. J. 7: 205–210] reexamined conventional linear models under n-dimensional -stable responses, to the effect that Ordinary Least Square (OLS) solutions and residual vectors under -stable errors also have -stable distributions, whereas F ratios remain exact in level and power as for Gaussian errors. The present study generalizes those findings to include multivariate linear models having matrix responses of order (n × k). Topics in inference focus on both location and scale matrices, the latter in connection with analogs of simple, multiple, and canonical correlations without benefit of second moments, seen nonetheless to gauge degrees of association under-stable symmetry.</div>Sun, 01 Mar 2020 00:00:00 GMThttps://journals.co.za/content/journal/10520/EJC-1c22bb7343?TRACK=RSSD.R. Jensen2020-03-01T00:00:00ZAn improved unrelated question randomized response model
https://journals.co.za/content/journal/10520/EJC-1c22d79d25?TRACK=RSS
<div>In this paper we restrict the design probabilities of Mahmood, Singh and Horn (1998) unrelated question randomized response model. Besides its simplicity, the resulted restricted model has two advantages over Mahmood et al. (1998) model with other design probabilities. First, the restricted model requires selecting only one simple random sample and not two which reduces the cost of survey. Second, the efficiency of the estimator of the proportion π<sub>s</sub> of the population bearing a sensitive characteristic is increased. In addition, efficiency comparisons showed that this estimator can be easily adjusted to be more efficient than other competitors that were developed after 1998. A simulation study is performed to determine the minimum sample size required for the estimator to lie inside the unit interval. Moreover, the restricted model is extended to stratified random sampling and the resulting estimator is shown to be more efficient than the Kim and Elam (2007) and Singh and Tarray (2016) estimators.</div>Sun, 01 Mar 2020 00:00:00 GMThttps://journals.co.za/content/journal/10520/EJC-1c22d79d25?TRACK=RSSKhadiga Sayed and Reda Mazloum2020-03-01T00:00:00ZThe recovery theorem with application to risk management
https://journals.co.za/content/journal/10520/EJC-1c22e11baa?TRACK=RSS
<div>The forward-looking nature of option prices provides an appealing way to extract risk measures. In this paper, we extract forecast densities from option prices that can be used in forecasting risk measures. More specifically, we extract a real-world return density forecast, implied from option prices, using the recovery theorem. In addition, we backtest and compare the predictive power of this real-world return density forecast with a risk-neutral return density forecast, implied from option prices, and a simple historical simulation approach. In an empirical study, using the South African FTSE/JSE Top 40 index, we found that the extracted real-world density forecasts, using the recovery theorem, yield satisfying forecasts of risk measures.</div>Sun, 01 Mar 2020 00:00:00 GMThttps://journals.co.za/content/journal/10520/EJC-1c22e11baa?TRACK=RSSVaughan van Appel and Eben Maré2020-03-01T00:00:00ZVariable selection in logistic regression models through the application of exact mathematical programming
https://journals.co.za/content/journal/10520/EJC-1c22eea916?TRACK=RSS
<div>A linearised approximation of the log-likelihood objective function is presented as a potential alternative to iterative fitting methods employed by logistic regression. The log-likelihood objective function is solved using linear programming and a modified version of the linearised logistic regression model is presented, which facilitates best subset variable selection. The resulting model is a mixed integer linear programming problem which incorporates a cardinality constraint on the number of variables. The suggested approach maintains many attractive properties, such as its ability to quantify the quality of the resulting variable selection solution, its independence of the subjective choice of p-values inherent to typical stepwise variable selection approaches, and its capability to edge closer to optimality within increasingly reduced computing times when the correct settings are applied, even for large input datasets.<br/><br/>Computational results are presented to demonstrate the advantages of employing an exact mathematical programming approach towards variable selection in logistic regression applications.</div>Sun, 01 Mar 2020 00:00:00 GMThttps://journals.co.za/content/journal/10520/EJC-1c22eea916?TRACK=RSSJ.V. Venter and S.E. Terblanche2020-03-01T00:00:00ZEstimating survival distributions for time-varying smart designs
https://journals.co.za/content/journal/10520/EJC-18401757f7?TRACK=RSS
<div>Treatment of complex diseases such as cancer, HIV, leukemia and depression usually follows complex treatment sequences. In two-stage randomization designs, patients are randomized to first-stage treatments, and upon response, a second randomization to the second-stage treatments is done. The clinical goal in such trials is to achieve a response such as complete remission of leukemia, 50% shrinkage of solid tumor or increase in CD4 count in HIV patients. These responses are presumed to predict longer survival. The focus in two-stage randomization designs with survival endpoints is on estimating survival distributions and comparing different treatment policies. In this article, we propose a parametric approach for estimating survival distributions in time-varying SMART designs. To evaluate the performance of our approach, a simulation study is conducted. The results of the simulation study reveal that the new approach gives survival probabilities that are less biased and more precise than the nonparametric methods. The new method is applied to a data set from a leukemia clinical trial.</div>Sun, 01 Sep 2019 00:00:00 GMThttps://journals.co.za/content/journal/10520/EJC-18401757f7?TRACK=RSSS. Vilakati and G. Cortese2019-09-01T00:00:00ZKernel estimation of residual extropy function under α-mixing dependence condition
https://journals.co.za/content/journal/10520/EJC-18400ce8ee?TRACK=RSS
<div>As in the context of introducing the concept of residual entropy in the literature, Qiu and Jia (2018b) introduced the concept, residual extropy to measure the residual uncertainty of a random variable. In this work, we propose a nonparametric estimator for the residual extropy, where the observations under consideration are exhibiting <em>α</em>-mixing (strong mixing) dependence condition. Asymptotic properties of the estimator is derived under suitable regular conditions. A Monte Carlo simulation study is carried out to evaluate the performance of the estimator using the mean squared errors.</div>Sun, 01 Sep 2019 00:00:00 GMThttps://journals.co.za/content/journal/10520/EJC-18400ce8ee?TRACK=RSSR. Maya and M.R. Irshad2019-09-01T00:00:00ZA mixture model with application to discrete competing risks data
https://journals.co.za/content/journal/10520/EJC-184010529a?TRACK=RSS
<div>In this paper, we modify the continuous time mixture competing risks model (Larson and Dinse, 1985) to handle discrete competing risks data. The main result of the model is an alternate regression expression for the cumulative incidence function. The structure of the regression expression for the cumulative incidence function under this model, and the proportional hazards assumption for the conditional hazard rates with piece-wise constant baseline conditional hazards, combine to allow for another means to assess the covariate effects on the cumulative incidence function. This benefit comes at some computational costs because the parameters are estimated via an EM algorithm. The proposed model is applied to real data and it is found that it improves the exercise of evaluating the covariate effects on the cumulative incidence function compared to other discrete competing risks models.</div>Sun, 01 Sep 2019 00:00:00 GMThttps://journals.co.za/content/journal/10520/EJC-184010529a?TRACK=RSSBonginkosi D. Ndlovu, Sileshi F. Melesse and Temesgen Zewotir2019-09-01T00:00:00ZBayesian testing for process capability indices
https://journals.co.za/content/journal/10520/EJC-184013efaf?TRACK=RSS
<div>Process capability indices have been widely used in the manufacturing industry. They measure the ability of a manufacturing process to produce items that meet certain specifications. A capability index relates the voice of the customer (specification limits) to the voice of the process. There is a need to understand and interpret process capability indices. Most of the existing work in this area has been devoted to classical frequentist large sample theory. An alternative approach to the problem of making inference about capability indices is the Bayesian approach. In this paper a Bayesian version of Tukey’s method is used for constructing simultaneous credibility intervals for all pairwise differences. A Bayesian procedure for testing all possible contrasts is also given. The problem of selecting the best supplier(s) has received considerable attention in the literature, but mainly from a classical frequentist point of view. A Bayesian simulation procedure is also illustrated to find the best supplier or group of suppliers. This method seems much easier to perform than the Monte Carlo integration method given in Wu, Shiau, Pearn and Hung (2016). In section 10, a sensitivity analysis regarding the prior choice is considered and in the last section, t-distributed data are analysed.</div>Sun, 01 Sep 2019 00:00:00 GMThttps://journals.co.za/content/journal/10520/EJC-184013efaf?TRACK=RSSA.J. van der Merwe, M.R. Sjölander and R. van Zyl2019-09-01T00:00:00ZOn the conditional distribution of the mean of the two closest among a set of three observations
https://journals.co.za/content/journal/10520/EJC-18403636c5?TRACK=RSS
<div>Chemical analyses of raw materials are often repeated in duplicate or triplicate. The assay values obtained are then combined using a predetermined formula to obtain an estimate of the true value of the material of interest. When duplicate observations are obtained, their average typically serves as an estimate of the true value. On the other hand, the “best of three” method involves taking three measurements and using the average of the two closest ones as estimate of the true value.<br/><br/>In this paper, we consider another method which potentially involves three measurements. Initially two measurements are obtained and if their difference is sufficiently small, their average is taken as estimate of the true value. However, if the difference is too large then a third independent measurement is obtained. The estimator is then defined as the average between the third observation and the one among the first two which is closest to it.<br/><br/>Our focus in the paper is the conditional distribution of the estimate in cases where the initial difference is too large. We find that the conditional distributions are markedly different under the assumption of a normal distribution and a Laplace distribution.</div>Sun, 01 Sep 2019 00:00:00 GMThttps://journals.co.za/content/journal/10520/EJC-18403636c5?TRACK=RSSI.J.H. Visagie and F. Lombard2019-09-01T00:00:00ZInterpretable multi-label classification by means of multivariate linear regression
https://journals.co.za/content/journal/10520/EJC-14af67f7cb?TRACK=RSS
<div>In this paper, the potential of using a multivariate regression approach in order to obtain interpretable output in a multi-label classification problem is investigated. We focus in our analysis on extensions of ordinary multivariate regression which take into account informative dependencies amongst labels. It is found that the regression approaches make a valuable contribution insofar as the importance of input variables for given labels can be evaluated. An empirical study facilitates comparison of the performance of the regression approaches in multi-label classification and, in terms of several evaluation measures, shows that they are also largely competitive with state-of-the-art multi-label classification procedures.</div>Tue, 26 Mar 2019 00:00:00 GMThttps://journals.co.za/content/journal/10520/EJC-14af67f7cb?TRACK=RSSSurette Bierman2019-03-26T00:00:00ZThe absence of diffusion in the South African short rate
https://journals.co.za/content/journal/10520/EJC-14af6a8487?TRACK=RSS
<div>In the field of Financial Mathematics, stochastic differential equations are used to describe the dynamics of interest rates. An example is a model for the short rate, which is a mathematically defined rate not directly observable in any market. However, observable rates such as short dated Treasury rates or the Johannesburg Interbank Agreement Rate (JIBAR) can be used as proxies for the short rate. The short rate dynamics are traditionally modelled by one-factor diffusion processes. These type of models remain popular due to the analytical tractability of the pricing formulae of interest rate derivatives under these models. To capture the leptokurtic nature of interest rate returns in the South African market, two types of models can be used: a pure jump model or a jump diffusion model. In this paper we investigate whether jumps are present and whether a diffusion component is evident. Our initial investigation showed that jumps were present in the South African market, and that no diffusion component was evident at low interest rate levels. This result was found using a Monte Carlo method to test for jumps. We therefore conclude that a pure jump process is an appropriate model for the South African short rate.</div>Tue, 26 Mar 2019 00:00:00 GMThttps://journals.co.za/content/journal/10520/EJC-14af6a8487?TRACK=RSSG.L. Grobler2019-03-26T00:00:00ZOn a characteristic property of distributions related to the Laplace
https://journals.co.za/content/journal/10520/EJC-14af6ce6e5?TRACK=RSS
<div>It is shown that the asymmetric Laplace distribution uniquely arises as the distribution of both a difference between independent positive random variables and of a random choice between those random variables, one of them having been given a negative sign. Related results on the positive half-line are also given.</div>Tue, 26 Mar 2019 00:00:00 GMThttps://journals.co.za/content/journal/10520/EJC-14af6ce6e5?TRACK=RSSM.C. Jones2019-03-26T00:00:00ZModelling environmental monitoring data coming from different surveys
https://journals.co.za/content/journal/10520/EJC-14af6fba87?TRACK=RSS
<div>With this work we propose a spatio-temporal model for Gaussian data collected in a small number of surveys. We assume the spatial correlation structure to be the same in all surveys. In the application concerning heavy metal concentrations in mosses, the data set is dense in the spatial dimension but sparse in the temporal one, thus our model-based approach corresponds to a correlation model depending on survey orders. One advantage of this approach is its computational simplicity. An interpretation for the space-time covariance function, decomposing the overall variance of the process as the product of the spatial component variance by the temporal component variance, is introduced. A simulation study, aiming to validate the model, provided better results in terms of accuracy with the novel covariance function. Maps of predicted heavy metal concentrations and of interpolation error, for the most recent survey, are presented. Data of this kind is recurrent in environmental sciences, which is why we argue that this will be a practical tool to be used very often.</div>Tue, 26 Mar 2019 00:00:00 GMThttps://journals.co.za/content/journal/10520/EJC-14af6fba87?TRACK=RSSLuís Margalho, Raquel Menezes and Inês Sousa2019-03-26T00:00:00ZA method for Bayesian regression modelling of composition data
https://journals.co.za/content/journal/10520/EJC-14af74f1cc?TRACK=RSS
<div>Many scientific and industrial processes produce data that is best analysed as vectors of relative values, often called compositions or proportions. The Dirichlet distribution is a natural distribution to use for composition or proportion data. It has the advantage of a low number of parameters, making it the parsimonious choice in many cases. This paper considers the case where the outcome of a process is Dirichlet, dependent on one or more explanatory variables in a regression setting. The paper explores some existing approaches to this problem, and then introduces a new simulation approach to fitting such models, based on the Bayesian framework. The paper illustrates the advantages of the new approach through simulated examples and an application in sport science. These advantages include: increased accuracy of fit, increased power for inference, and the ability to introduce random effects without additional complexity in the analysis.</div>Tue, 26 Mar 2019 00:00:00 GMThttps://journals.co.za/content/journal/10520/EJC-14af74f1cc?TRACK=RSSSean van der Merwe2019-03-26T00:00:00Z