On some Variants of the EM Algorithm for the Fitting of Finite Mixture Models
DOI:
https://doi.org/10.17713/ajs.v32i1&2.454Abstract
Finite mixture models are being increasingly used in statistical inference and to provide a model-based approach to cluster analysis. Mixture models can be fitted to independent data in a straightforward manner via the expectation-maximization (EM) algorithm. In this paper, we look at ways of speeding up the fitting of normal mixture models by using variants of the EM, including the so-called sparse and incremental versions. We also consider an incremental version based on imposing a multiresolution kd-tree structure on the data. Examples are given in which the EM algorithm can be speeded up by a factor of more than fifty for large data sets of small dimension.
References
D.S. Broomhead and M. Kirby. A new approach to dimensionality reduction: theory and algorithms. SIAM J. Appl. Math., 60:2114–2142, 2000.
S. Dasgupta. Experiments with random projection. In C. Boutillier and M. Goldszmidt, editors, Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence,
pages 143–151. Morgan Kaufmann, San Francisco, 2000.
A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. Royal Statist. Soc. B, 39(1):1–38, 1977.
L.O. Jimenez and D.A. Landgrebe. Hyperspectral data analysis and supervised feature reduction via projection pursuit. IEEE Trans. Geoscience and Remote Sensing, 37(6): 2653–2667, 1999.
Z. Liang, J.R. MacFall, and D.P. Harrington. Parameter estimation and tissue segmentation from multispectral MR images. IEEE Trans. Medical Imaging, 13(3):441–449,
A. McCallum, K. Nigam, and L.H. Ungar. Efficient clustering of high-dimensional data sets with application to reference matching. In R. Bayardo R. Ramakrishnan, S. Stolfo
and I. Parsa, editors, Proc. of the Sixth ACM SIGKDD Intern. Conf. on Knowledge Discovery and Data Mining, pages 169–178. ACM Press, New York, 2000.
G.J. McLachlan and K.E. Basford. Mixture Models: Inference and Applications to Clustering. Marcel Dekker, New York, 1998.
G.J. McLachlan and T. Krishnan. The EM Algorithm and Extensions. J.Wiley, New York, 1997.
G.J. McLachlan and D. Peel. Finite Mixture Models. J. Wiley, New York, 2000.
X.L. Meng. On the rate of convergence of the ECM algorithm. Ann. Statist., 22(1): 326–339, 1994.
X.L. Meng and D.B. Rubin. Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika, 80(2):267–278, 1993.
A.W. Moore. Very fast EM-based mixture model clustering using multiresolution kdtrees. In S.A. Solla M.S. Kearns and D.A. Cohn, editors, Advances in Neural Information
Processing Systems 11, pages 543–549. MIT Press, Cambridge, Massachusetts, 1999.
R.M. Neal and G.E. Hinton. A view of the EM algorithm that justifies incremental, sparse, and other variants. In M.I. Jordan, editor, Learning in Graphical Models, pages 355–
Kluwer, Dordrecht, 1998.
S.K. Ng and G.J. McLachlan. On the choice of the number of blocks with the incremental EM algorithm for the fitting of normal mixtures. Statistics and Computing, in Press,
S.J. Nowlan. Soft Competitive Adaptation: Neural Network Learning Algorithms based on Fitting Statistical Mixtures. PhD Thesis, Carnegie Mellon University, Pittsburgh,
P. Sand and A.W. Moore. Repairing faulty mixture models using density estimation. In C.E. Brodley and A.P. Danyluk, editors, Proc. 18th Intern. Conf. on Machine Learning,
pages 457–464. Morgan Kaufmann, San Francisco, 2001.
B. Thiesson, C. Meek, and D. Heckerman. Accelerating EM for large databases. Machine Learning, 45(3):279–299, 2001.
Downloads
Published
How to Cite
Issue
Section
License
The Austrian Journal of Statistics publish open access articles under the terms of the Creative Commons Attribution (CC BY) License.
The Creative Commons Attribution License (CC-BY) allows users to copy, distribute and transmit an article, adapt the article and make commercial use of the article. The CC BY license permits commercial and non-commercial re-use of an open access article, as long as the author is properly attributed.
Copyright on any research article published by the Austrian Journal of Statistics is retained by the author(s). Authors grant the Austrian Journal of Statistics a license to publish the article and identify itself as the original publisher. Authors also grant any third party the right to use the article freely as long as its original authors, citation details and publisher are identified.
Manuscripts should be unpublished and not be under consideration for publication elsewhere. By submitting an article, the author(s) certify that the article is their original work, that they have the right to submit the article for publication, and that they can grant the above license.