Classification of Publications Based on Statistical Analysis of Scientific Terms Distributions

Vaidas Balys; Rimantas Rudzkis

doi:10.17713/ajs.v37i1.292

Classification of Publications Based on Statistical Analysis of Scientific Terms Distributions

Authors

Vaidas Balys Institute of Mathematics and Informatics, Vilnius
Rimantas Rudzkis Institute of Mathematics and Informatics, Vilnius

DOI:

https://doi.org/10.17713/ajs.v37i1.292

Abstract

The problem of classification of scientific texts is considered. Models and methods based on probabilistic distributions of scientific terms in text are discussed. The comparative study of proposed and a few of popular alternative algorithms was performed. The results of experimental study over real-world data are reported.

References

Chang, C., and Lin, C. (2001). LIBSVM: a library for support vector machines. (Software available at http://www.csie.ntu.edu.tw/»cjlin/libsvm)

Hazewinkel, M. (2004). Dynamic stochastic models for indexes and thesauri, identification clouds, and information retrieval and storage. In R. Baeza-Yates (Ed.), Recent

Advances in Applied Probability (p. 181-204).

Joachims, T. (1998). Text categorization with suport vector machines: Learning with many relevant features. In ECML ’98: Proceedings of the 10th European Conference

on Machine Learning (pp. 137–142). New York: Springer-Verlag.

Mitchell, T. M. (1996). Machine Learning. McGraw-Hill.

Rudzkis, R., Balys, V., and Hazewinkel, M. (2006). Stochastic modelling of scientific terms distribution in publications. In J. M. Borwin and W. M. Farmer (Eds.),

Mathematical Knowledge Management (Vol. 4108, p. 152-164). Springer Berlin /Heidelberg.

Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Comput. Surv., 34, 1-47.

Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. New York: Springer-Verlag.

Yang, Y. (1994). Expert network: effective and efficient learning from human decisions in text categorization and retrieval. In SIGIR ’94: Proceedings of the 17th annual

international ACM SIGIR conference on Research and Development in information retrieval (pp. 13–22). New York: Springer-Verlag.

Yang, Y. (1999). An evaluation of statistical approaches to text categorization. Information Retrieval, 1, 69-90.

Yang, Y., and Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. In ICML ’97: Proceedings of the 14th International Conference on

Machine Learning (pp. 412–420). Morgan Kaufmann Publishers Inc.

Downloads

Published

2016-04-03

Issue

Vol. 37 No. 1 (2008)

Section

Articles

License

The Austrian Journal of Statistics publish open access articles under the terms of the Creative Commons Attribution (CC BY) License.

The Creative Commons Attribution License (CC-BY) allows users to copy, distribute and transmit an article, adapt the article and make commercial use of the article. The CC BY license permits commercial and non-commercial re-use of an open access article, as long as the author is properly attributed.

Copyright on any research article published by the Austrian Journal of Statistics is retained by the author(s). Authors grant the Austrian Journal of Statistics a license to publish the article and identify itself as the original publisher. Authors also grant any third party the right to use the article freely as long as its original authors, citation details and publisher are identified.

Manuscripts should be unpublished and not be under consideration for publication elsewhere. By submitting an article, the author(s) certify that the article is their original work, that they have the right to submit the article for publication, and that they can grant the above license.

How to Cite

Classification of Publications Based on Statistical Analysis of Scientific Terms Distributions. (2016). Austrian Journal of Statistics, 37(1), 109–118. https://doi.org/10.17713/ajs.v37i1.292