A Modification of Linfoot ’ s Informational Correlation Coefficient

Performance of the Linfoot’s informational correlation coefficient is experimentally studied at the bivariate normal distribution. It is satisfactory in the case of a strong correlation and on large samples. To reduce the bias of estimation, a symmetric version of this correlation measure is proposed. On small and large samples, this modified informational correlation coefficient outperforms Linfoot’s correlation measure at the bivariate normal distribution in the wide range of the correlation coefficient.


Introduction
Pearson's correlation coefficient is a well-defined measure of the linear dependence between continuous random variables X and Y , as well as the closely related to it rank measures: namely, the quadrant, Spearman and Kendall correlation coefficients.However, if one is interested either in processing discrete data or in revealing the possible nonlinear relationship between random variables, then difficulties may arise both in the implementation of those classical measures as well as in their interpretation.
In what follows, we focus on the informational measures of association between random variables (Shannon 1948).The dependence measure by Joe (1989) exploits the concept of the relative entropy that measures the similarity of two random variables with the distributions p(x) and q(x) in the discrete case Silvey (1964) uses the measure of dependence between two random variables defined by the ratio of their joint density and the product of their marginal densities The introduced measure is defined as ∆ = E[d(x)], where [p(y|x) − p(y)] dy.
Mutual information (I(X, Y )) for any pair of discrete and continuous random variables X and Y is defined as follows The informational correlation coefficient (ICC), firstly introduced by Linfoot (1957), is defined as follows Note that ICC is equal to the classical Pearson's correlation coefficient at the bivariate normal distribution: ρ ICC (X, Y ) = ρ.

Problem setting
Although the Linfoot's correlation coefficient ICC was introduced more than 60 years ago, its properties as a statistical measure of correlation have not yet been studied; it was not checked how well this measure estimates the correlation coefficient based on a sample of a given size.
Below, using Monte Carlo method on small and large samples, we experimentally examine the unbiasedness of ICC at the bivariate normal distribution over the wide range of its correlation coefficient.
Moreover, in order to improve the performance of ICC, namely to reduce its bias, we propose and study its modified symmetric version denoted as M ICC.

Description of the computational algorithm
All numerical experiments are performed using R language, especially its "entropy" library.
The first problem is how to compute mutual information, which is used in (1).This is solved by applying a shrink-algorithm (Hausser and Strimmer 2010).
There exist several different algorithms of computing I(X, Y ); in our work, we choose the most precise one, not the fastest (for comparative analysis, see (Hausser and Strimmer 2010)).
The general algorithm can be described as follows: 1. Generate a sample of a fixed size: N = 20, 60, 100, 400, 1000, 10000.2. Extract x-and y-components from the sample, which are dependent random variables with the correlation coefficient ρ. 3. Construct the table of frequencies, the discrete analog of the joint distribution: we take a rectangle [x min , x max ] × [y min , y max ] on plane and divide it into n x × n y "bins" of equal size.Thus, the table of dimension n x , n y is built, each element of which is equal to the number of points in the corresponding bin. 4. Mutual information I(X, Y ) and ICC are computed using this table of frequencies.This sequence is repeated 1000 times, allowing us to compute Monte Carlo estimates of the mean and variance of ICC: computations are performed for ρ = 0, 0.1, 0.2, . . ., 0.9, 1; the number of bins is taken equal to 400.Typical results are exhibited in Figure 1.

Monte Carlo results for ICC
From Figure 1 it follows that estimation biases are considerably large (on small samples, they can even be greater than 0.5).Relatively small biases are observed only on large samples N = 1000 and N = 10 000.Satisfactory performance is observed in the case of a strong correlation-the ICC biases decrease with the growth of the sample size.
We may also add that the coefficient of variance is less than 0.2 for all examined combinations of (ρ, N ).
A remark on the choice of the number of bins.The shrink-algorithm takes the table of frequencies as an input.It appeared that the algorithm performance depends on the relation N/K 2 , where K is the linear dimension of the table.We observed that results are almost independent of the changes of K, as they depend only on the coefficient B = N/K 2 .For ρ = 0.5, the value B = 7 is optimal.Given a data sample, we can choose an appropriate value of K, which is a tuning parameter of our algorithm.

Main result
Mutual entropy, also known as the Kullback-Leibler distance, has a serious disadvantageit is not symmetric, i.e., D K L(p||q) = D K L(q||p).Thus, the Kullback-Leibler divergence is used (Kullback 1959) Div(p||q) = D KL (p||q) + D KL (q||p). (2) Analogously, a symmetric version of mutual information can be introduced as it is natural to use a symmetric measure of correlation for estimation of the Pearson correlation coefficient, a symmetric measure of interdependence between random variables-in this case we expect lesser biases of estimation.
Our idea is to repeat Linfoot's derivation of Equation ( 1), replacing the mutual information I(X, Y ) with its symmetric version J(X, Y ).In this case, the following result holds.
Theorem A symmetric analog of the Linfoot's informational correlation coefficient (1) called as the modified informational correlation coefficient (M ICC) is given by with the particular case ρ M ICC = ρ at the bivariate normal distribution.
Here W is the Lambert's function-the inverse function for xe x ; it cannot be expressed in terms of elementary functions.Its properties are well developed, and there are special methods to compute it (see (Corless, Gonnet, Hare, Jeffrey, and Knuth 1996)).

Proof
The first step is to express the mutual information via the correlation coefficient similarly to that obtained by Linfoot ( 1957) Compute I * (X, Y ) at the bivariate normal with density f (x, y) = N (x, y; 0, 0, σ 2 x , σ 2 y , ρ) Consider the following three integrals: .
Analogously, we have Next, Thus, we get Now we express the correlation coefficient ρ via the symmetrized mutual information J(X, Y ) inversing the above Equation: and setting 2/t = p, we finalize the derivation where W is the Lambert function, namely the inverse function for pe p .Finally, we arrive at Equation (3).

Concluding remarks
• The statistical performance of the Linfoot's informational correlation coefficient is studied: considerable biases of estimation are observed, especially on small samples.• To reduce the biases of ICC, a modified symmetric version of it, namely MICC, is proposed, which proved to provide much lesser estimation biases as compared to its prototype.• The proposed modified informational correlation coefficient MICC is recommended for processing Big Data, as the obtained results show that its best performance is achieved on large samples.• Work in process: since the problem of estimation of nonlinear dependencies between random variables still remains important, it seems advantageous to use informational measures of correlation in this case, and the comparative study of the performance of those and other measures of nonlinear correlation, such as the coefficient of determination, Sarmanov's correlation coefficient (Sarmanov 1958) and distance correlation coefficient of (Székely et al. 2007), is in process.