Extracting Information from Interval Data Using Symbolic Principal Component Analysis
DOI:
https://doi.org/10.17713/ajs.v46i3-4.673Abstract
We introduce generic definitions of symbolic variance and covariance for random interval-valued variables, that lead to a unified and insightful interpretation of four known symbolic principal component estimation methods: CPCA, VPCA, CIPCA, and SymCovPCA. Moreover, we propose the use of truncated versions of symbolic principal components, that use a strict subset of the original symbolic variables, as a way to improve the interpretation of symbolic principal components. Furthermore, the analysis of a real dataset leads to a meaningful characterization of Internet traffic applications, while highligting similarities between the symbolic principal component estimation methods considered in the paper.References
Bertrand P, Goupil F (2000). Descriptive Statistics for Symbolic Data. In HH Bock, E Diday (eds.), Analysis of Symbolic Data, Studies in Classification, Data Analysis, and Knowledge
Organization, pp. 106-124. Springer Berlin Heidelberg.
Billard L (2008). Sample Covariance Functions for Complex Quantitative Data. In Proceedings of World IASC Conference, Yokohama, Japan, pp. 157-163.
Billard L, Diday E (2003). From the Statistics of Data to the Statistics of Knowledge: Symbolic Data Analysis. Journal of the American Statistical Association, 98, 470-487.
Billard L, Diday E (2006). Symbolic Data Analysis: Conceptual Statistics and Data Mining. John Wiley & Sons.
Cadima JFCL, Jolliffe IT (2001). Variable Selection and the Interpretation of Principal Subspaces. Journal of Agricultural, Biological, and Environmental Statistics, 6(1), 62-79.
Cazes P, Chouakria A, Diday E, Schektman Y (1997). Extension de l'Analyse en Composantes Principales à des Données de Type Intervalle." Revue de Statistique Appliquée, 45(3), 5-24.
Chouakria A (1998). Extension des Méthodes d'Analyse Factorielle à des Données de Type Intervalle. Ph.D. thesis, Université Paris-Dauphine.
De Carvalho FdA, Brito P, Bock HH (2006). Dynamic Clustering for Interval Data Based on L2 Distance. Computational Statistics, 21(2), 231-250.
Diday E (1987). The Symbolic Approach in Clustering and Related Methods of Data Analysis. In Proceedings of First conference IFCS,Aachen, Germany. H. Bock ed.North-Holland.
Le-Rademacher J, Billard L (2012). Symbolic Covariance Principal Component Analysis and Visualization for Interval-Valued Data. Computational and Graphical Statistics, 21(2), 413-432.
Pascoal C (2014). Contributions to Variable Selection and Robust Anomaly Detection in Telecommunications. Ph.D. thesis, Instituto Superior Técnico, Universidade de Lisboa, Portugal.
Pascoal C, Oliveira M, Valadas R, Filzmoser P, Salvador P, Pacheco A (2012). Robust Feature Selection and Robust PCA for Internet Traffic Anomaly Detection. In INFOCOM, 2012 Proceedings IEEE, pp. 1755-1763. ISSN 0743-166X.
Vilela M (2015). Classical and Robust Symbolic Principal Component Analysis for Interval Data. Master's thesis, Instituto Superior Técnico, Universidade de Lisboa, Portugal.
Wang H, Guan R, Wu J (2012). CIPCA: Complete-Information-based Principal Component Analysis for Interval-valued Data. Neurocomputing, 86, 158-169.
Downloads
Published
How to Cite
Issue
Section
License
The Austrian Journal of Statistics publish open access articles under the terms of the Creative Commons Attribution (CC BY) License.
The Creative Commons Attribution License (CC-BY) allows users to copy, distribute and transmit an article, adapt the article and make commercial use of the article. The CC BY license permits commercial and non-commercial re-use of an open access article, as long as the author is properly attributed.
Copyright on any research article published by the Austrian Journal of Statistics is retained by the author(s). Authors grant the Austrian Journal of Statistics a license to publish the article and identify itself as the original publisher. Authors also grant any third party the right to use the article freely as long as its original authors, citation details and publisher are identified.
Manuscripts should be unpublished and not be under consideration for publication elsewhere. By submitting an article, the author(s) certify that the article is their original work, that they have the right to submit the article for publication, and that they can grant the above license.