Robust Maximum Association Between Data Sets: The R Package ccaPP

  • Andreas Alfons Erasmus Universiteit Rotterdam
  • Christophe Croux KU Leuven
  • Peter Filzmoser Vienna University of Technology

Abstract

An intuitive measure of association between two multivariate data sets can be defined as the maximal value that a bivariate association measure between any one-dimensional projections of each data set can attain. Rank correlation measures thereby have the advantage that they combine good robustness properties with good efficiency. The software package ccaPP provides fast implementations of such maximum association measures for the statistical computing environment R. We demonstrate how to use package ccaPP to compute the maximum association measures, as well as how to assess their significance via permutation tests.

References

Alfons A (2014). ccaPP: (Robust) Canonical Correlation Analysis via Projection Pursuit. R package version 0.3.1, URL http://CRAN.R-project.org/package=ccaPP.

Alfons A, Croux C, Filzmoser P (2014). Robust Maximum Association Estimators. Submitted manuscript.

Andrews D, Herzberg A (1985). Data. Springer-Verlag, New York.

Blomqvist N (1950). On a Measure of Dependence Between Two Random Variables. The Annals of Mathematical Statistics, 21(4), 593–600.

Eddelbuettel D, Sanderson C (2014). RcppArmadillo: Accelerating R with High-Performance C++ Linear Algebra. Computational Statistics & Data Analysis, 71, 1054–1063.

Francois R, Eddelbuettel D, Bates D (2014). RcppArmadillo: Rcpp Integration for Armadillo Templated Linear Algebra Library. R package version 0.4.500.0, URL http: //CRAN.R-project.org/package=RcppArmadillo.

Gonzalez I, Dejean S (2012). CCA: Canonical correlation analysis. R package version 1.2, URL http://CRAN.R-project.org/package=CCA.

Gonzalez I, Dejean S, Martin P, Baccini A (2008). CCA: An R Package to Extend Canonical Correlation Analysis. Journal of Statistical Software, 23(12), 1–14.

Huber P, Ronchetti E (2009). Robust Statistics. 2nd edition. John Wiley & Sons, New York. ISBN: 978-0-470-12990-6.

Johnson R, Wichern D (2002). Applied Multivariate Statistical Analysis. 5th edition. Prentice Hall, Upper Saddle River, New Jersey.

Klami A, Virtanen S, Kaski S (2013). Bayesian Canonical Correlation Analysis. Journal of Machine Learning Research, 14(Apr), 965–1003.

Knight W (1966). A Computer Method for Calculating Kendall’s Tau with Ungrouped Data. Journal of the American Statistical Association, 61(314), 436–439.

L’Ecuyer P, Simard R, Chen E, Kelton W (2002). An Object-Oriented Random-Number Package with Many Long Streams and Substreams. Operations Research, 50(6), 1073– 1075.

R Core Team (2014). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.

Taskinen S, Kankainen A, Oja H (2003). Sign Test of Independence Between Two Random Vectors. Statistics and Probability Letters, 62(1), 9–21.

Virtanen S, Klami A (2013). CCAGFA: Bayesian canonical correlation analysis and group factor analysis. R package version 1.03, URL http://CRAN.R-project.org/package= CCAGFA.
Published
2016-02-29
How to Cite
Alfons, A., Croux, C., & Filzmoser, P. (2016). Robust Maximum Association Between Data Sets: The R Package ccaPP. Austrian Journal of Statistics, 45(1), 71-79. https://doi.org/https://doi.org/10.17713/ajs.v45i1.90
Section
Special Issue on R