Identification of Multivariate Outliers: A Performance Study

Authors

  • Peter Filzmoser Vienna University of Technology, Austria

DOI:

https://doi.org/10.17713/ajs.v34i2.406

Abstract

Three methods for the identification of multivariate outliers (Rousseeuw and Van Zomeren, 1990; Becker and Gather, 1999; Filzmoser et al., 2005) are compared. They are based on the Mahalanobis distance that will be made resistant against outliers and model deviations by robust estimation of location and covariance. The comparison is made by means of a simulation study. Not only the case of multivariate normally distributed data, but also heavy tailed and asymmetric distributions will be considered. The simulations are focused on low dimensional (p = 5) and high dimensional (p = 30) data.

References

V. Barnett and T. Lewis. Outliers in Statistical Data. Wiley & Sons, New York, 3rd edition, 1994.

C. Becker and U. Gather. The masking breakdown point of multivariate outlier identification rules. J. Am. Statist. Assoc., 94(447):947–955, 1999.

C. Becker and U. Gather. The largest nonidentifiable outlier: A comparison of multivariate simultaneous outlier identification rules. Computational Statistics & Data Analysis, 36:119–127, 2001.

P.L. Davies. Asymptotic behavior of S-estimators of multivariate location and dispersion matrices. The Annals of Statistics, 15:1269–1292, 1987.

H. Doleisch, M. Gasser, and H. Hauser. Interactive feature specification for focus+context visualization of complex simulation data. In Proc. of the Joint IEEE TCVG – EG Symp. on Vis., pages 239–248, 2003.

P. Filzmoser, R.G. Garrett, and C. Reimann. Multivariate outlier detection in exploration geochemistry. Computers and Geosciences, 2005. In press.

R.G. Garrett. The chi-square plot: A tool for multivariate outlier recognition. Journal of Geochemical Exploration, 32:319–341, 1989.

A. Genz and F. Bretz. Numerical computation of multivariate t-probabilities with application to power calculation of multiple contrasts. Journal of Statistical Computation and Simulation, 63:361–378, 1999.

D. Gervini. A robust and efficient adaptive reweighted estimator of multivariate location and scatter. Journal of Multivariate Analysis, 84:116–144, 2003.

R. Gnanadesikan and J.R. Kettenring. Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics, 28:81–124, 1972.

J.T. Kent and D.E. Tyler. Constrained M-estimation for multivariate location and scatter. The Annals of Statistics, 24(3):1346–1370, 1996.

R.A. Maronna. Robust M-estimators of multivariate location and scatter. The Annals of Statistics, 4(1):51–67, 1976.

R.A. Maronna and V.J. Yohai. The behavior of the Stahel-Donoho robust multivariate estimator. J. Am. Statist. Assoc., 90:330–341, 1995.

D. Peña and F.J Prieto. Multivariate outlier detection and robust covariance matrix estimation (with discussion). Technometrics, 43(3):286–310, 2001.

D.M. Rocke and D.L. Woodruff. Identification of outliers in multivariate data. J. Am. Statist. Assoc., 91:1047–1061, 1996.

D.M. Rocke and D.L. Woodruff. A synthesis of outlier detection and cluster identification. Technical report, University of California, Davis, Davis CA 95616, 1999.

http://handel.cipic.ucdavis.edu/ dmrocke/Synth5.pdf.

P.J. Rousseeuw and B.C. Van Zomeren. Unmasking multivariate outliers and leverage points. J. Am. Statist. Assoc., 85(411):633–651, 1990.

P.J. Rousseeuw. Multivariate estimation with high breakdown point. In W. Grossmann, G. Pflug, I. Vincze, and W. Wertz, editors, Mathematical Statistics and Applications, volume B, pages 283–297, Budapest, 1985. Akadémiai Kiadó.

P.J. Rousseeuw and K. Van Driessen. A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41:212–223, 1999.

D. Swayne, D. Cook, and A. Buja. XGobi: Interactive dynamic data visualization in the X Windows system. Journal of Computational and Graphical Statistics, 7(1):113–130, 1998.

D.E. Tyler. Some issues in the robust estimation of multivariate location and scatter. In W. Stahel and S. Weisberg, editors, Directions in Robust Statistics and Diagnostics 2, pages 327–336. Springer, New York, 1991.

Downloads

Published

2016-04-03

How to Cite

Filzmoser, P. (2016). Identification of Multivariate Outliers: A Performance Study. Austrian Journal of Statistics, 34(2), 127–138. https://doi.org/10.17713/ajs.v34i2.406

Issue

Section

Articles