Interesting Properties of Variation Characteristics

At some university faculties, involving economy, descriptive statistics appear in a separate course treated as an introduction to statistical inference. It involves, among others, some measures of location, variation and asymmetry for collected data. However, as yet, the academic books in the subject do not provide sufficient information on the upper and lower bounds of these characteristics, or the information provided is not precise (in terms “usually”, “the most often” etc.). The aim of this paper is to fill this gap and give a precise answer to these questions. The second problem concerns the statistical teaching and training. Our experience shows that some elegant formulae, being simple for interpretation, are not always convenient for computation and vice versa. So we suggest to work with two equivalent forms: one for theoretical consideration and the other one for computational purposes, respectively. In this way the consideration will be clear while the computation will be less faulty and time-consuming. In fact the whole work can be done by using a simple calculator. A special attention is given to the mean absolute deviation and to the Gini index.


Introduction
At some university faculties, involving economy, descriptive statistics appear in a separate course treated as an introduction to statistical inference.It involves, among others, some measures of location, variation and asymmetry for collected data.However, as yet, the academic books in the subject do not provide sufficient information on the upper and lower bounds of these characteristics, or the information provided is not precise (in terms "usually", "the most often" etc.).The aim of this paper is to fill this gap and give a precise answer to these questions.
The second problem concerns the statistical teaching and training.Our experience shows that some elegant formulae, being simple for interpretation, are not always convenient for computation and vice versa.So we suggest to work with two equivalent forms: one for theoretical consideration and the other one for computational purposes, respectively.In this way the consideration will be clear while the computation will be less faulty and time-consuming.In fact the whole work can be done by using a simple calculator.A special attention is given to the mean absolute deviation and to the Gini index.

Data Characteristics and their Attributes
Descriptive statistics refer to a finite sequence of observations, say x = (x 1 , . . ., x n ) ∈ R n , called data.The aim of descriptive statistics is a visual (i.e.graphical) and quantitative presentation of the data.In this paper we focus on the second one.
A real number x 0 = f (x 1 , . . ., x n ) expressing a desired property of the data is said to be a data characteristic.In consequence any f may be a potential characteristic of a respective kind (for instance of location, variation or asymmetry).The members of each class may be chosen in two ways: 1. by defining some desirable attributes and then considering all functions for which the attributes are met, Austrian Journal of Statistics, Vol. 39 (2010), No. 4, 341-348 2. by choosing from the characteristics being in a common use, for instance, from the sample moments, order statistics, or their possible combinations.
With regards to the first way, it seems that any reasonable data characteristic should be invariant with respect to a permutation of the observations, i.e. it should satisfy the initial condition for any permutation x i 1 , . . ., x in of the numbers x 1 , . . ., x n .We shall assume that the condition (I) is met for all data characteristics considered in this paper.

Location Characteristics
The characteristics of this type refer to a central point of the data.Attributes of such characteristics may be introduced either directly, or indirectly.Let us start from the first one.For a location characteristic the following attributes are desirable: a for all x and for any a (transition) ) for all x and for any positive c (positive homogeneity).
Of course the properties (I) and (L.1) -(L.3) do not determine the function f uniquely.Not only the usual mean and median, but also some other functions of order statistics possess these attributes.The set of potential location characteristics may be decreased by some additional attributes, for instance by for arbitrary data x = (x 1 , . . ., x n ) and y = (y 1 , . . ., y n ) (additivity).

Characteristics Minimizing the Common Distance From Observations
For given data x = (x 1 , . . ., x n ) and a given number x 0 the common distance from x 0 to x may be defined as the Euclidean distance between the points P = (x 1 , . . ., x n ) and In order to determine the point x 0 realizing this minimum we shall consider the function Theorem 2. The function f attains its unique minimum for α = x.Proof.We get the chain of equations This implies the desired result.It seems that the common distance defined above, although easy to work with, is not natural.Instead, one can consider the usual (Euclidean) total distance from x 0 to all points x 1 , . . ., x n on the x-axis, i.e.
∑ n i=1 |x i − x 0 |.We shall show that the minimum of this distance is realized if x 0 is a median of the data x = (x 1 , . . ., x n ).This property of a median is known in the literature as minimizing the mean absolute deviation (MAD).The MAD of a median was derived in many papers and books, among others in Bickel and Doksum (1977) and Norton (1984), but these proofs were rather complicated.A first simple and elegant proof of this fact is due to Joag-Dev (1989).To present it let us consider the function (1) Denote by x (i) , i = 1, . . ., n, the i-th order statistic, i.e. the i-th element in (x (1) , . . ., x (n) ) formed from the data (x 1 , . . ., x n ) by sorting its elements in nondecreasing order, i.e. with Theorem 3. The function h attains its minimum for any β satisfying the condition β = x ((n+1)/2) if n is odd, and β belongs to the closed interval Austrian Journal of Statistics, Vol. 39 (2010), No. 4, 341-348 Proof.Let us consider the intervals where k is the integer part of n/2.Then h(β) may be presented as Thus by the triangular inequality we get h with the equality, if and only if, β ∈ I k if n is even, while β = x ((n+1)/2) if n is odd.This completes the proof of the Theorem.
We note that in the even case the point β minimizing the function h is not uniquely determined.However, the values of the function (1) in all the points are the same.One of these points is the usual median med(x) defined by As a direct consequence of Theorem 3 we get the following: Corollary 1.The minimal value of the function h is given by the formula where x (i) is the i-th order statistic of the data x.

Variation Characteristics
Any nonnegative invariant function f = f (x 1 , . . ., x n ) may be considered as a variation characteristic for the data x = (x 1 , . . ., x n ) if it possesses the following attributes: From (V.1) and (V.2) one can derive the following property (V.3) f (a, . . ., a) = 0 for any scalar a.
One can easily verify that, among others, the following characteristics satisfy the above requirements: The property (V.2) means that the characteristic f depends strongly on the scale, i.e. on the measurement unit of the observations.Very often we are interested in variation characteristics which are independent on the scale, i.e. instead of (V.2), they satisfy the condition To meet this condition we introduce the following notion.Definition 1. Data x = (x 1 , . . ., x n ) is said to be nonnegative if all observations x i , i = 1, . . ., n, are nonnegative and at least one of them is positive.
In this section we shall assume that any data under consideration is nonnegative.For such a data a characteristic independent on the scale may be obtained from arbitrary variation characteristic by dividing it over the data mean x > 0. Applying this rule to the above variation characteristics we get the following characteristics independent on the scale: x , relative mean absolute deviation r(x) = d(x) x , relative half average deviation between observations (2) Now we are going to present an interesting relation between G and the well known Gini index.The Gini index is usually introduced in the following way.
For nonnegative data x = (x 1 , . . ., x n ) consider its cumulate sums s 1 , . . ., s n , i.e. the sequence of numbers defined by the formula , where x (i) is the i-th order statistic.Definition 2. Denote by P 1 the area of the convex polygon with vertices (0, 0), (1, s 1 ), . . ., (n, s n ) and by P the area of the triangle with vertices (0, 0), (n, 0) and (n, s n ).The ratio P 1 /P is called Gini index of the data x.
We observe that if all observations x i , i = 1, . . ., n, are equal then P 1 = 0, and, in consequence, the Gini index is 0. On the other hand, if x 1 = • • • = x n−1 = 0 and x n > 0 then the Gini index is equal to (n − 1)/n.Thus, we get the following conclusion.
However, the above definition of Gini index is not convenient for computation.We shall derive a more useful formula and show that it coincides with G.
Theorem 4. The Gini index coincides with the sample characteristic G defined by (2) and it can be expressed in the form . (3) Proof.We observe that Austrian Journal of Statistics, Vol. 39 (2010), No. 4, 341-348 Now by the identity we get Therefore the Gini index may be presented in the form (3) and it remains to show that it coincides with G defined by ( 2).In fact we only need to verify that We shall prove it by induction with respect to n.
For n = 2 the formula (4) may be verified directly.Now suppose it holds for n = k.We shall show that it also holds for n = k + 1.Without loss of generality one can assume that x (k+1) = x k+1 .Then, by the inductive assumption, This implies the identity (4) and, in consequence, the statement of Theorem 4. By simple operations on the formula (3) we get two additional expressions for the Gini index G.We shall collect all these results in the form of the following theorem.
Theorem 5.For any nonnegative data x = (x 1 , . . ., x n ) the following identities hold Proof.The first identity was just proved in Theorem 4 and the second one is evident.For the third one we only need to observe that 2x ∑ n i=1 i = (n + 1) 6 Bounds of some Variation Characteristics In this section we shall restrict our attention to nonnegative data, i.e. to the case when all observations are nonnegative and at least one of them is positive.A natural question is whether the coefficients v(x), r(x) and G(x) presented in Section 5 are normalized, i.e. whether they belong to the interval [0, 1] and whether the values 0 and 1 are attained.
The question about the first coefficient was negatively answered by Stepniak (2007).This result may be stated in the following form: Theorem 6.The attainable upper bound for the coefficient of variation v(x) for nonnegative data Here we shall present an alternative proof of this theorem.For this aim we need an auxiliary result.
For arbitrary integers k and l, such that 1 ≤ k < l ≤ n and a positive ε less or equal to x (l) − x (k) (if such exists), define the operation y(x) = y(x; k, l, ε) = (y 1 , . . ., y n ), where It is evident that ȳ = x.We shall prove Lemma 1.Under the above assumption, Now the desired result follows by the assumption about ε.
Proof of Theorem 6.For nonnegative data x = (x 1 , . . ., x n ) with mean x consider the auxiliary data t = (t 1 , . . ., t n ) = (0, . . ., 0, nx).It is clear that t = x.First we shall show that ∑ n i=1 (t i − t) 2 = n(n − 1)x 2 .Notice that Now let us observe that x may be obtained from t by a finite number of operations of type y = y(x; k, l, ε).Thus, by Lemma 1, the upper bound follows.
From Theorem 6 we get the following corollary.
Corollary 2. The ratio v(x)/ √ n − 1 is a normalized coefficient of variation.