Identification of Multivariate Outliers: A Performance Study
DOI:
https://doi.org/10.17713/ajs.v34i2.406Abstract
Three methods for the identification of multivariate outliers (Rousseeuw and Van Zomeren, 1990; Becker and Gather, 1999; Filzmoser et al., 2005) are compared. They are based on the Mahalanobis distance that will be made resistant against outliers and model deviations by robust estimation of location and covariance. The comparison is made by means of a simulation study. Not only the case of multivariate normally distributed data, but also heavy tailed and asymmetric distributions will be considered. The simulations are focused on low dimensional (p = 5) and high dimensional (p = 30) data.
References
V. Barnett and T. Lewis. Outliers in Statistical Data. Wiley & Sons, New York, 3rd edition, 1994.
C. Becker and U. Gather. The masking breakdown point of multivariate outlier identification rules. J. Am. Statist. Assoc., 94(447):947–955, 1999.
C. Becker and U. Gather. The largest nonidentifiable outlier: A comparison of multivariate simultaneous outlier identification rules. Computational Statistics & Data Analysis, 36:119–127, 2001.
P.L. Davies. Asymptotic behavior of S-estimators of multivariate location and dispersion matrices. The Annals of Statistics, 15:1269–1292, 1987.
H. Doleisch, M. Gasser, and H. Hauser. Interactive feature specification for focus+context visualization of complex simulation data. In Proc. of the Joint IEEE TCVG – EG Symp. on Vis., pages 239–248, 2003.
P. Filzmoser, R.G. Garrett, and C. Reimann. Multivariate outlier detection in exploration geochemistry. Computers and Geosciences, 2005. In press.
R.G. Garrett. The chi-square plot: A tool for multivariate outlier recognition. Journal of Geochemical Exploration, 32:319–341, 1989.
A. Genz and F. Bretz. Numerical computation of multivariate t-probabilities with application to power calculation of multiple contrasts. Journal of Statistical Computation and Simulation, 63:361–378, 1999.
D. Gervini. A robust and efficient adaptive reweighted estimator of multivariate location and scatter. Journal of Multivariate Analysis, 84:116–144, 2003.
R. Gnanadesikan and J.R. Kettenring. Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics, 28:81–124, 1972.
J.T. Kent and D.E. Tyler. Constrained M-estimation for multivariate location and scatter. The Annals of Statistics, 24(3):1346–1370, 1996.
R.A. Maronna. Robust M-estimators of multivariate location and scatter. The Annals of Statistics, 4(1):51–67, 1976.
R.A. Maronna and V.J. Yohai. The behavior of the Stahel-Donoho robust multivariate estimator. J. Am. Statist. Assoc., 90:330–341, 1995.
D. Peña and F.J Prieto. Multivariate outlier detection and robust covariance matrix estimation (with discussion). Technometrics, 43(3):286–310, 2001.
D.M. Rocke and D.L. Woodruff. Identification of outliers in multivariate data. J. Am. Statist. Assoc., 91:1047–1061, 1996.
D.M. Rocke and D.L. Woodruff. A synthesis of outlier detection and cluster identification. Technical report, University of California, Davis, Davis CA 95616, 1999.
http://handel.cipic.ucdavis.edu/ dmrocke/Synth5.pdf.
P.J. Rousseeuw and B.C. Van Zomeren. Unmasking multivariate outliers and leverage points. J. Am. Statist. Assoc., 85(411):633–651, 1990.
P.J. Rousseeuw. Multivariate estimation with high breakdown point. In W. Grossmann, G. Pflug, I. Vincze, and W. Wertz, editors, Mathematical Statistics and Applications, volume B, pages 283–297, Budapest, 1985. Akadémiai Kiadó.
P.J. Rousseeuw and K. Van Driessen. A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41:212–223, 1999.
D. Swayne, D. Cook, and A. Buja. XGobi: Interactive dynamic data visualization in the X Windows system. Journal of Computational and Graphical Statistics, 7(1):113–130, 1998.
D.E. Tyler. Some issues in the robust estimation of multivariate location and scatter. In W. Stahel and S. Weisberg, editors, Directions in Robust Statistics and Diagnostics 2, pages 327–336. Springer, New York, 1991.
Downloads
Published
How to Cite
Issue
Section
License
The Austrian Journal of Statistics publish open access articles under the terms of the Creative Commons Attribution (CC BY) License.
The Creative Commons Attribution License (CC-BY) allows users to copy, distribute and transmit an article, adapt the article and make commercial use of the article. The CC BY license permits commercial and non-commercial re-use of an open access article, as long as the author is properly attributed.
Copyright on any research article published by the Austrian Journal of Statistics is retained by the author(s). Authors grant the Austrian Journal of Statistics a license to publish the article and identify itself as the original publisher. Authors also grant any third party the right to use the article freely as long as its original authors, citation details and publisher are identified.
Manuscripts should be unpublished and not be under consideration for publication elsewhere. By submitting an article, the author(s) certify that the article is their original work, that they have the right to submit the article for publication, and that they can grant the above license.