South African Statistical Journal - latest Issue
Volumes & issues
Volume 50, Issue 2, 2016
Author Marta FerreiraSource: South African Statistical Journal 50, pp 173 –193 (2016)More Less
Clustering of high values occurs in many real situations and affects inference on extremal events. For stationary dependent sequences, under general local and asymptotic dependence conditions, the degree of clustering is measured through a parameter called the extremal index. The estimation of extreme events or parameters is usually based on a k number of top order statistics or on the exceedances of a high threshold u and is very sensitive to either of these choices. In particular,the bias increases with a growing k and a decreasing u. The use of the Jackknife methodology may help reduce bias. We analyse this method through a simulation study applied to several estimators of the extremal index. An application to real data sets illustrates the results.
Source: South African Statistical Journal 50, pp 195 –219 (2016)More Less
Based on a type II censored sample, Bayesian estimation for the scale parameter of the Rayleigh model is carried out under the assumption of the squared error loss function. A generalised hypergeometric distribution with its versatile shape of tails is introduced as a prior, and beta special cases are examined. A simulation study is carried out to investigate the sensitivity of four special cases of this beta prior family in terms of bias, frequent ist coverage and mean square error and to determine their effect on robustness. Prediction bounds are derived for the lifetime of unused components using this beta prior family. A data set is used to illustrate and support some of the findings.
Source: South African Statistical Journal 50, pp 221 –236 (2016)More Less
The kernel multivariate density estimation is an important technique to estimate the multivariate density function. In this investigation we will use Hellinger Distance as a measure of error to evaluate the estimator, we will derive the mean weighted Hellinger distance for the estimator, and we obtain the optimal bandwidth based on Hellinger distance. Also, we propose and study a new technique to select the matrix of bandwidths based on Hellinger distance, and compare the new technique with the plug-in and the least squares techniques.
Source: South African Statistical Journal 50, pp 237 –259 (2016)More Less
In this paper, we are interested in the estimation of locally stationary processes by the minimum Hellinger distance estimator (Beran, 1977) in spectral framework. This distance is originally applied to probability distributions. Here we apply this distance to spectral density functions belonging to a specified parametric spectral family. We generalize the minimum Hellinger distance estimation method to processes that only show a locally stationary behaviour. Asymptotic properties of the estimator are shown. The robustness of the estimator is investigated through a simulation study. An application on real data is carried out.
Source: South African Statistical Journal 50, pp 261 –271 (2016)More Less
This paper explores two avenues for the modification of tactics in Twenty20 cricket. The first idea is based on the realisation that wickets are of less importance in Twenty20 cricket than in other formats of cricket (e.g. one-day cricket and Test cricket). A consequence is that batting sides in Twenty20 cricket should place more emphasis on scoring runs and less emphasis on avoiding wickets falling. On the flip side, fielding sides should place more emphasis on preventing runs and less emphasis on taking wickets. Practical implementations of this general idea are obtained by simple modifications to batting orders and bowling overs. The second idea may be applicable when there exists a sizeable mismatch between two competing teams. In this case, the weaker team may be able to improve its win probability by increasing the variance of run differential. A specific variance inflation technique which we consider is increased aggressiveness in batting.
Source: South African Statistical Journal 50, pp 273 –283 (2016)More Less
Parametric compositional data analysis in a high dimensional simplex can be performed by employing the Dirichlet distribution, or alternatively, through the logistic normal distribution if the Dirichlet is not appropriate. In this paper, a multivariate gamma (MGAM) distribution is proposed as an alternative distribution for compositional data. In addition, the MGAM distribution is extended to a multivariate extreme value (MEV) distribution and goodness of fit statistics are calculated for comparison against the logistic normal distribution. An application is considered where the amount of gas produced from a coal gasication facility depends crucially on the size distribution of the coal, which is measured as compositional data and characterised by six variables. The observed sample space is divided into three regions of high (H), standard (S) and low (L) gas production by choosing appropriate thresholds, and new observations are classified among the regions.
Author Isabelle GarischSource: South African Statistical Journal 50, pp 285 –301 (2016)More Less
In this paper a group of l decision makers who are confronted by the problem of choosing a mutually acceptable solution to a statistical decision problem, is considered. If consensus is not reached, a solution through compromise is called for. A measure of similarity or agreement between each pair of decision makers is defined, where this measure can be used to assign weights to the decision makers. These weights give an indication of the importance of the decision makers in terms of relative agreement with others in the group. A solution through compromise can be found by using these weights in the calculation of a randomised decision.
Source: South African Statistical Journal 50, pp 303 –312 (2016)More Less
In this paper, we present two unfamiliar novel estimation techniques (UNET) for the constrained regression coefficients in the frame-work of a standard multiple linear regression model. Estimation of a linear regression problem with constraints on the regression coefficients are firstly derived by minimising a formulated goal function that minimises the total sum of the squared errors, plus the sum of the linear constraints multiplied by a Lagrangian. We also show that the solution to the system of equations can be obtained without differentiating the goal function, rather expressed interms of the known matrices. This is achieved by employing properties of a blocked linear system. The UNET is justified by a numerical simulated system of linear equations in 3-dimensions. The UNET yields estimates that are comparable to those generated by the Schur complement principle.
Statistical model for overdispersed count outcome with many zeros : an approach for marginal inferenceSource: South African Statistical Journal 50, pp 313 –337 (2016)More Less
Marginalised models are in great demand by many researchers in the life sciences, particularly in clinical trials, epidemiology, health-economics, surveys and many others, since they allow generalisation of inference to the entire population under study. For count data, standard procedures such as the Poisson regression and negative binomial model provide population average inference for model parameters. However, occurrence of excess zero counts and lack of independence in empirical data have necessitated their extension to accommodate these phenomena. These extensions, though useful, complicate interpretations of effects. For example, the zero-inflated Poisson model accounts for the presence of excess zeros, but the parameter estimates do not have a direct marginal inferential ability as the base model, the Poisson model. Marginalisations due to the presence of excess zeros are underdeveloped though demand for them is interestingly high. The aim of this paper,therefore, is to develop a marginalised model for zero-inflated univariate count outcome in the presence of overdispersion. Emphasis is placed on methodological development, efficient estimation of model parameters, implementation and application to two empirical studies. A simulation study is performed to assess the performance of the model. Results from the analysis of two case studies indicate that the refined procedure performs significantly better than models which do not simultaneously correct for over dispersion and presence of excess zero counts in terms of likelihood comparisons and AIC values. The simulation studies also supported these findings. In addition, the proposed technique yielded small biases and mean square errors for model parameters. To ensure that the proposed method enjoys widespread use, it is implemented using the SAS NLMIXED procedure with minimal coding efforts.