Classification of Publications Based on Statistical Analysis of Scientific Terms Distributions

Authors

  • Vaidas Balys Institute of Mathematics and Informatics, Vilnius
  • Rimantas Rudzkis Institute of Mathematics and Informatics, Vilnius

DOI:

https://doi.org/10.17713/ajs.v37i1.292

Abstract

The problem of classification of scientific texts is considered. Models and methods based on probabilistic distributions of scientific terms in text are discussed. The comparative study of proposed and a few of popular alternative algorithms was performed. The results of experimental study over real-world data are reported.

References

Chang, C., and Lin, C. (2001). LIBSVM: a library for support vector machines. (Software available at http://www.csie.ntu.edu.tw/»cjlin/libsvm)

Hazewinkel, M. (2004). Dynamic stochastic models for indexes and thesauri, identification clouds, and information retrieval and storage. In R. Baeza-Yates (Ed.), Recent

Advances in Applied Probability (p. 181-204).

Joachims, T. (1998). Text categorization with suport vector machines: Learning with many relevant features. In ECML ’98: Proceedings of the 10th European Conference

on Machine Learning (pp. 137–142). New York: Springer-Verlag.

Mitchell, T. M. (1996). Machine Learning. McGraw-Hill.

Rudzkis, R., Balys, V., and Hazewinkel, M. (2006). Stochastic modelling of scientific terms distribution in publications. In J. M. Borwin and W. M. Farmer (Eds.),

Mathematical Knowledge Management (Vol. 4108, p. 152-164). Springer Berlin /Heidelberg.

Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Comput. Surv., 34, 1-47.

Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. New York: Springer-Verlag.

Yang, Y. (1994). Expert network: effective and efficient learning from human decisions in text categorization and retrieval. In SIGIR ’94: Proceedings of the 17th annual

international ACM SIGIR conference on Research and Development in information retrieval (pp. 13–22). New York: Springer-Verlag.

Yang, Y. (1999). An evaluation of statistical approaches to text categorization. Information Retrieval, 1, 69-90.

Yang, Y., and Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. In ICML ’97: Proceedings of the 14th International Conference on

Machine Learning (pp. 412–420). Morgan Kaufmann Publishers Inc.

Downloads

Published

2016-04-03

How to Cite

Balys, V., & Rudzkis, R. (2016). Classification of Publications Based on Statistical Analysis of Scientific Terms Distributions. Austrian Journal of Statistics, 37(1), 109–118. https://doi.org/10.17713/ajs.v37i1.292

Issue

Section

Articles