- A-Z Publications
- Annual Proceedings of the South African Statistical Association Conference
- Issue Home
Annual Proceedings of the South African Statistical Association Conference - latest Issue
Volume 2015, Issue 1, 2015
Author Franck AdekambiSource: Annual Proceedings of the South African Statistical Association Conference 2015, pp 1 –8 (2015)More Less
We illustrate how alternating renewal processes can be used for the actuarial modelling of health insurance policies. No previous research has applied the cumulative function and the moment generating function of the discounted value of the aggregate amount of benefit paid out up to the end of the nth sickness period, n = 1,2,3,.... However, from a practical point of view, these two expressions are difficult to evaluate. This research thus utilised an approximation of the discounted value of the aggregate amount of benefit paid out up to the end of the sickness period, and for the case of constant force of interest. The approximation will for example be useful to calculate the insurer's probability of ruin, which is the probability that the discounted value of the aggregate amount of benefit paid out exceeds the premium received and the insurer's initial capital. Erlang distributions with different parameters are used for both the periods of health and of sickness, and illustrations are presented in Tables 1, 2 and 3 for a constant force of interest.
Source: Annual Proceedings of the South African Statistical Association Conference 2015, pp 9 –16 (2015)More Less
The exponential distribution is a popular model both in practice and in theoretical work. As a result, a multitude of tests have been developed for testing the hypothesis that observed data are realised from this distribution. Many of the recently developed tests contain a tuning parameter, usually appearing in a weight function. These tests are often evaluated over a grid of values for this parameter. However, this method does not lend itself to objective comparisons because the power of the test is highly dependent on the value of the tuning parameter. In this paper we compare the performance of tests that contain a data-dependent choice of the tuning parameter to other classical tests (which do not contain a tuning parameter). It is found that the tests based on the data-dependent choice of the tuning parameter compare favourably to the classical tests.
Source: Annual Proceedings of the South African Statistical Association Conference 2015, pp 17 –24 (2015)More Less
Risk management tools such as value-at-risk (VaR) are highly dependent on the underlying distributional assumption and identifying a distribution that best captures all aspects of the given financial data may provide advantages to both investors and risk managers. In this paper, we investigate this possibility by establishing the best generalized hyperbolic distributions to fit gold price returns, while comparisons to stable distributions are also drawn. The adequacy of these distributions is assessed through the Anderson-Darling test, the Akaike information criterion, the Bayesian information criterion and backtesting of their respective VaR estimates.
Source: Annual Proceedings of the South African Statistical Association Conference 2015, pp 25 –32 (2015)More Less
The effect of right-censoring on non-parametric estimation of the hazard rate is investigated in this paper. Three well-known and widely-used kernel functions applied to the Nelson-Aalen estimator of the cumulative hazard rate were used to obtain smoothed hazard rate estimates. A simulation study was performed for the purpose of assessing the performance of the hazard rate estimation. In all these simulations, smoothed hazard rates were obtained while recording the frequency of optimal global bandwidths and assessing the performance by estimating the variance, bias and coverage over event times. An example illustrates some of the simulation results.
Properties of A- and D-optimal row-column designs for two-colour cDNA microarray experiments : robustness against missing arrays and efficiencySource: Annual Proceedings of the South African Statistical Association Conference 2015, pp 33 –39 (2015)More Less
Two-colour complementary deoxyribonucleic acid (cDNA) microarray experiments are the most important experiments that help scientists to study the expression level of thousands of genes simultaneously. If it is assumed that there is gene specific dye effect in a microarray experiment, then there will be two blocking factors, array and dye. In such cases, the microarray experiments can be considered as row-column designs, with dyes as rows and arrays as columns. Furthermore, the experiments can be described using a linear mixed effects model by taking the arrays as random effects, when comparisons of all possible pairs of treatments are of particular interest. One of the important criteria for a good design is its robustness against a missing observation which may occur due to insufficient resolution, image corruption, or scratches on the slide. This may result in disconnectedness of a design which will lead to loss of precision in estimation and/or of possible comparisons between treatments. The main objective of this paper is to investigate robustness properties of the A- and D-optimal row-column designs against one or two missing array(s). The numerical results show that the robustness of optimal designs against missing arrays depends on the unknown parameter, which is a function of the random array variance and the error variance.
Modeling extreme daily temperature using generalized Pareto distribution at Port Elizabeth, South AfricaSource: Annual Proceedings of the South African Statistical Association Conference 2015, pp 41 –48 (2015)More Less
The extremes of daily maximum temperature in summer and daily minimum temperature in winter were analyzed using the generalized Pareto distribution (GPD) to the Port Elizabeth weather station data, South Africa. Since extremes in minimum and maximum temperatures series do not follow a normal distribution, the non-parametric methods namely, Kendall's tau test and the Sen's slope estimator were used for the trend analysis. A significant positive trend was observed in the extreme annual minimum temperature. However, the inclusion of a linear trend in the log-scale parameter in the GPD model for the minimum daily winter temperature did not produce an improvement in the precision of parameter estimates. The results from the return level analysis show that by the end of twenty first century the extreme summer maximum temperature could be about 5 °C higher than the current in Port Elizabeth whereas the change in the winter minimum temperature will be less severe because the return level results suggest an increase of about 2 °C.
Source: Annual Proceedings of the South African Statistical Association Conference 2015, pp 49 –56 (2015)More Less
The LASSO is a penalized regression method which simultaneously performs shrinkage and variable selection. The output produced by the LASSO consists of a piecewise linear solution path, starting with the null model and ending with the full least squares fit, as the value of a tuning parameter is decreased. The performance of the selected model therefore depends greatly on the choice of this parameter. This paper attempts to provide an overview of methods which are available to select the value of the tuning parameter for either prediction or variable selection purposes. A simulation study provides a comparison of these methods and assesses their performance.
Modelling average minimum daily temperature using extreme value theory with a time varying thresholdSource: Annual Proceedings of the South African Statistical Association Conference 2015, pp 57 –64 (2015)More Less
In this paper we present an application of the Generalized Pareto Distribution (GPD) in the modelling of average minimum daily temperature in South Africa for the period January 2000 to August 2010. A penalized cubic smoothing spline is used as a time varying threshold as well as to cater for seasonality. We then extract excesses (residuals) above the cubic spline and fit a non-parametric mixture model to get a sufficiently high threshold. The data exhibit evidence of short-range dependence and high seasonality which lead to the declustering of the excesses above the sufficiently high threshold and fit the GPD to cluster maxima. The parameters are estimated using the maximum likelihood method. The estimate of the shape parameter shows that the Weibull family of distributions is appropriate in modelling the upper tail of the distribution of average minimum daily temperature in South Africa. The bootstrap resampling method is used as an assessment tool for uncertainty in the parameter estimation. This study has shown that the use of the penalized cubic smoothing spline as a time varying threshold to time series data which exhibits strong seasonality provides a good fit of the GPD to cluster maxima. This results in accurate estimates of return levels.
The risk performance of a heteroscedastic preliminary test estimator under the reflected normal loss functionSource: Annual Proceedings of the South African Statistical Association Conference 2015, pp 65 –72 (2015)More Less
The problem of heteroscedasticity is commonly encountered in regression models and it is known that, under heteroscedasticity, the Ordinary Least Squares estimator is relatively inefficient. This paper focuses on the risk performance of a preliminary test estimator for regression coefficients, after a preliminary test for heteroscedasticity has been performed. The risk performance under the symmetric and bounded Reflected Normal loss function is derived and it is numerically evaluated by making use of Monte Carlo simulations. From these results it is clear that the relative risk gains of the Two-stage Aitken estimator and the preliminary test estimator over the Ordinary Least Squares estimator generally increases with higher levels of heteroscedasticity.
Source: Annual Proceedings of the South African Statistical Association Conference 2015, pp 73 –80 (2015)More Less
In this article we introduce and discuss Random Survival Forests, a modern ensemble method for predicting right-censored survival data, and present an original application of the model in the prediction of surrenders of investment policies. The model's performance is benchmarked against the Cox model - a semi-parametric model that has been the mainstay of survival analysis since its introduction in the early 70s. Predictive performance is measured via an adaptation of the Brier Score for right-censored data using what is known as Inverse Probability of Censoring Weights. In this application the Random Survival Forest is shown to have superior predictive performance to the Cox model.