Treatment of Multivariate Outliers in Incomplete Business Survey Data
DOI:
https://doi.org/10.17713/ajs.v45i1.86Abstract
The distribution of multivariate quantitative survey data usually is not normal. Skewed and semi-continuous distributions occur often. In addition, missing values and non-response is common. All together this mix of problems makes multivariate outlier detection difficult. Examples of surveys where these problems occur are most business surveys and some household surveys like the Survey for the Statistics of Income and Living Condition (SILC) of the European Union. Several methods for multivariate outlier detection are collected in the R-package modi. This paper gives an overview of modi and its functions for outlier detection and corresponding imputation. The use of the methods is explained with a business survey dataset. The discussion covers pre- and post-processing to deal with skewness and zero-inflation, advantages and disadvantages of the methods and the choice of the parameters.References
Béguin C, Hulliger B (2004). Multivariate Outlier Detection in Incomplete Survey Data: the
Epidemic Algorithm and Transformed Rank Correlations." Journal of the Royal Statistical
Society, Series A: Statistics in Society, 167(2), 275{294.
Béguin C, Hulliger B (2008). The BACON-EEM Algorithm for Multivariate Outlier Detec-
tion in Incomplete Survey Data." Survey Methodology, Vol. 34, No. 1, 91{103.
Campbell N (1989). Bush_re mapping using noaa avhrr data." Technical report, Common-
wealth Scienti_c and Industrial Research Organisation, North Ryde.
Chambers R (1986). Outlier Robust Finite Population Estimation." Journal of the American
Statistical Association, 81(396), 1063{1069.
Charlton J (ed.) (2003). Towards Effective Statistical Editing and Imputation Strate-
gies - Findings of the Euredit project, volume 1 and 2. EUREDIT consortium.
Http://www.cs.york.ac.uk/euredit/results/results.html.
Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA (2005). Robust Statistics: The Ap-
proach Based on Inuence Functions. Wiley.
Hulliger B (2013). modi: Multivariate outlier detection and imputation for incomplete survey
data. R package version 1.2/r6, URL http://R-Forge.R-project.org/projects/modi/.
Hulliger B, Schoch T (2013). Mechanisms for multivariate outliers and missing values." In
Proceedings of the NTTS2013 Conference, Brussels.
Little R, Smith P (1987). Editing and imputation for quantitative survey data." Journal of
the American Statistical Association, 82, 58{68.
Luzi O, De Waal T, Hulliger B, Di Zio M, Pannekoek J, Kilchmann D, Guarnera U, Hoogland
J, Manzari A, Tempelman C (2007). Recommended Practices for Editing and Imputation in
Cross-Sectional Business Surveys. Italian Statistical Institute ISTAT,. Institutions: ISTAT,
CBS, SFSO, Eurostat.
Maronna R, Zamar R (2002). Robust Estimates of Location and Dispersion for High-
Dimensional Datasets." Technometrics, 44(4), 307{317.
Todorov V, Filzmoser P (2009). An Object-Oriented Framework for Robust Multivariate
Analysis." Journal of Statistical Software, 32(3), 1{47. ISSN 1548-7660. URL http:
//www.jstatsoft.org/v32/i03.
Todorov V, Templ M, Filzmoser P (2011). Detection of multivariate outliers in business
survey data with incomplete information." Advances in Data Analysis and Classi_cation
(ADAC), Vol. 5(1), 37{56.
Downloads
Additional Files
Published
How to Cite
Issue
Section
License
The Austrian Journal of Statistics publish open access articles under the terms of the Creative Commons Attribution (CC BY) License.
The Creative Commons Attribution License (CC-BY) allows users to copy, distribute and transmit an article, adapt the article and make commercial use of the article. The CC BY license permits commercial and non-commercial re-use of an open access article, as long as the author is properly attributed.
Copyright on any research article published by the Austrian Journal of Statistics is retained by the author(s). Authors grant the Austrian Journal of Statistics a license to publish the article and identify itself as the original publisher. Authors also grant any third party the right to use the article freely as long as its original authors, citation details and publisher are identified.
Manuscripts should be unpublished and not be under consideration for publication elsewhere. By submitting an article, the author(s) certify that the article is their original work, that they have the right to submit the article for publication, and that they can grant the above license.