On Speed of Stochastic CART Model Search

Márton Ispány; Ilona Krasznahorkay

doi:10.17713/ajs.v36i1.318

On Speed of Stochastic CART Model Search

Authors

Márton Ispány University of Debrecen, Hungary
Ilona Krasznahorkay University of Debrecen, Hungary

DOI:

https://doi.org/10.17713/ajs.v36i1.318

Abstract

Decision trees have proved to be commonly used nonlinear tools for supervised learning. This technique is a way to divide the space of the predictor variables into bricks in order to achieve as homogeneous partitions as possible. We improved the CART method proposed by Breiman et al. (1984) using a stochastic search, first suggested by Chipman et al. (1998) in the Bayesian framework. In this paper estimates are given for the rate of
convergence and the mixing time of the MCMC method defined on decision trees.

References

Breiman, L., Friedman, J. H., Olsen, A. O., and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth International Group.

Chipman, H. A., George, E. I., and McCulloch, R. E. (1998). Bayesian CART model search. Journal of the American Statistical Association, 93, 935-960.

Cormen, T. H., Leierson, C. E., and Rives, L. R. (1990). Introduction to Algorithms. The Massachusetts Institute of Technology.

Hastie, T., Tibshirani, R., and Friedman, J. (2001). The Elements of Statistical Learning. Data Mining, Inference and Prediction. New York: Springer.

Hoeffgen, K. U., Simon, H. U., and Van Horn, K. S. (1995). Robust trainability of single neurons. Journal of Computer System Sciences, 50, 114-125.

Hyafil, L., and Rivest, R. L. (1976). Constructing optimal binary trees is NP-complete. Information Processing Letters, 5, 15-17.

Jerrum, M. (1998). Mathematical foundations of the markov chain monte carlo method. In Probabilistic methods for algorithmic discrete mathematics.

Jerrum, M. (2003). Counting, Sampling and Integrating: Algorithms and Complexity. Basel: Birkhäuser.

Jerrum, M., and Sinclair, A. (1996). The Markov chain Monte Carlo method: An approach to approximate counting and integration. In D. S. Hochbaum (Ed.), Approximation Algorithm for NP-hard Problems (p. 482-520). Boston.

Kurzynski, M. W. (1983). The optimal strategy of a tree classifier. Pattern Recognition, 16, 81-87.

Loh, W. Y., and Vanichsetakal, N. (1988). Tree-structured classification via generalized discriminant analysis. Journal of the American Statistical Association, 83, 715-725.

Murphy, P. M., and McCraw, R. L. (1991). Designing storage efficient decision trees. IEEE Transactions on Computers, 40, 315-320.

Murthy, S. K. (1998). Automatic construction of decision trees from data: A multidisciplinary survey. In Data Mining and Knowledge Discovery, 2. Boston: Kluwer Academic Publishers.

Roberts, G. O. (1996). Markov chain concepts related to sampling algorithms. In W. R. Gilks, S. Richardson, and D. J. Spiegelhalter (Eds.), Markov Chain Monte Carlo in Practice (p. 45-57). London: Chapman & Hall/CRC.

Safavian, S. R., and Landgrebe, D. (1991). A survey of decision tree classifier methodology. IEEE Transaction on Systems, Man and Cybernetics, 21, 660-674.

Saloff-Coste, L. (1997). Lectures on finite Markov chains. In P. Bernard (Ed.), Lectures on Probability Theory and Statistics (p. 301-413). Berlin: Springer.

Sinclair, A. (1992). Improved bounds for mixing rates of Markov chains and multicommodity flow. Combinatorics, Probability and Computing, 1, 351-370.

Downloads

Published

2016-04-03

Issue

Vol. 36 No. 1 (2007)

Section

Articles

License

The Austrian Journal of Statistics publish open access articles under the terms of the Creative Commons Attribution (CC BY) License.

The Creative Commons Attribution License (CC-BY) allows users to copy, distribute and transmit an article, adapt the article and make commercial use of the article. The CC BY license permits commercial and non-commercial re-use of an open access article, as long as the author is properly attributed.

Copyright on any research article published by the Austrian Journal of Statistics is retained by the author(s). Authors grant the Austrian Journal of Statistics a license to publish the article and identify itself as the original publisher. Authors also grant any third party the right to use the article freely as long as its original authors, citation details and publisher are identified.

Manuscripts should be unpublished and not be under consideration for publication elsewhere. By submitting an article, the author(s) certify that the article is their original work, that they have the right to submit the article for publication, and that they can grant the above license.