Fuzziness and Roughness of Non-precise Quantities

Connections between fuzzy sets and rough sets are studied in description of non-precise quantities. Characterization of empirical distribution of non-precise observations in non-precise categories is proposed.


Introduction
Many problems arising in scientific investigation generate non-precise data incorporating non-statistical uncertainty.A non-precise observation of a quantitative variable can be described by a special type of membership function defined on the set of all real numbers called a fuzzy number or a fuzzy interval.This membership function is a fuzzy set.Fuzzy set theory originated by Zadeh (1965) relies on ordering relations that express intensity (degree) of membership of an object in a set.Several researchers studied applications of statistical methods to fuzzy data.Among them, for example, Kruse and Meyer (1987), Kacprzyk and Federizzi (1988), Frühwirth-Schnatter (1992) and Römer and Kandel (1995).Viertl (1996) provided comprehensive guidelines for exploratory analysis and statistical inference of non-precise data.Pawlak (1982) has proposed rough set methodology as a new approach in handling analysis of non-precise concepts.In this methodology any non-precise concept is characterized by a pair of precise concepts called the lower and the upper approximations.Rough set theory is based on equivalence relations describing partitions made of classes of indiscernible objects.This new approach proved to be useful in many applications (see Slowinski, 1992;Ziarko, 1994;Polkowski and Skowron, 1998).
The purpose of data analysis is to gain information from data.The concept of information is connected with the concept of uncertainty.Information may be incomplete, not fully reliable, vague or deficient in other ways.These various information deficiencies result in different types of uncertainty (see Klir and Yuan, 1995).Rough sets and fuzzy sets are two distinct models of imperfect knowledge.Indiscernibility, studied by the theory of rough sets, refers to the granularity (coarseness) of knowledge.Vagueness, studied by the theory of fuzzy sets, is due to the fact that categories of natural language are usually sets with smooth (not crisp) boundaries.Many measures of uncertainty have been developed in order to quantify the amount of information in non-precise data.It has been proven in numerous ways that the sensible measure of uncertainty in probability theory is the Shannon entropy (see Shannon, 1948).Nonprobabilistic entropies and indetermination measures in the setting of fuzzy sets theory were proposed by Empotz (1981).A review of measures of fuzziness can be found in Klir (1995).Pawlak (1991) introduced a measure of roughness of a non-precise concept in an approximation space.Recently, research articles combining fuzziness and roughness in decision making applications have appeared in Pal and Skowron (1999).
This paper shows applications of techniques from the theory of fuzzy sets and the theory of rough sets to basic analysis of non-precise data.Each non-precise observation described by a fuzzy set can be viewed as a non-precise quantity in the approximation space (R, I) where R is the set of all real numbers and I is the identity relation on R R.
Then the following questions arise: How fuzzy and how rough is the non-precise quantity?Is there a meaningful way of reducing its fuzziness and roughness?Suppose that a finite number n of non-precise observations of a quantitative random variable X is given.These observations create a non-precise sample S. One of the basic tasks in analysis of empirical data is to group observations from S into a few conveniently designated categories (intervals) defined on the domain of X.For example, categories C 1 = small values, C 2 = medium values and C 3 = large values.Each category represents a vague concept and therefore should be characterized by a fuzzy interval.This leads to the following questions: How to describe empirical distribution of non-precise observations in non-precise categories?How to derive a non-precise frequency function?
A notion of fuzzy frequency function in the case of precise categories and non-precise observations was introduced by Viertl (1996).His work can be generalized to non-precise categories.A non-precise frequency function is defined as a fuzzy set on the set N n = f1; : : : ; ng.From the point of view of rough sets, this function is a non-precise concept in the approximation space (N n , I) where I is the identity relation on N n N n .One can ask: How fuzzy and how rough is the non-precise frequency function?
This paper answers the questions posted above using some basic techniques from the theory of rough sets and the theory of fuzzy sets.The necessary background for both theories is given in Section 2. Section 3 deals with roughness and fuzziness of nonprecise quantities.For 2 (0; 1] measure of -fuzziness and measure of -roughness are introduced.It is shown that imprecision of a fuzzy quantity can be reduced by its approximation by an -sharper version called a generalized -cut.Section 4 presents a method for calculating non-precise counts of non-precise observations from a sample S in non-precise categories from a family C.The conditions under which C becomes a fuzzy cover, fuzzy partition, or a weak fuzzy partition of the range of S are explained.Then a quantitative evaluation of fuzziness and roughness of empirical distribution of non-precise data is given.The proposed techniques are illustrated on a small sample of non-precise observations in Section 5.
The goal of this study is twofold.Firstly, it aims to show connections between fuzzy sets and rough sets in description of non-precise quantities.Secondly, it provides some basic tools for exploratory analysis of non-precise data.

Fuzzy Sets
Let X be a set of objects.A fuzzy set A on X is defined by membership function A : X !0; 1]: (1) A crisp set is a special case of fuzzy set in which the membership function is restricted to f0; 1g.The largest membership coefficient in a given fuzzy set A is called its height and is denoted by h A .When h A = 1, A is called a normal fuzzy set, otherwise, it is called subnormal.Given a number 2 (0; 1], an -cut of fuzzy set A is defined for all x 2 X by A fuzzy set A can be reconstructed from its -cuts as follows: for all x 2 X : where sup denotes supremum.
The support and the core of a fuzzy set A, denoted by S A and C A respectively, are crisp subsets of X such that for all x 2 X : The standard complement, A, of a fuzzy set A with respect to the set X is defined for all x 2 X by the equation Given two fuzzy sets, A and B, their standard intersection A \ B, and standard union A B, are defined for all x 2 X by (A \ B)(x) = minfA(x); B(x)g; (7) and We say that fuzzy set A is a subset of fuzzy set B, denoted A B, if A(x) B(x) for all x 2 X.
For any fuzzy set A defined on a finite universal set X, its scalar cardinality jAj is given by the formula When X is an infinite set and P is a measure on X, then its cardinality is defined by Fuzzy sets that are defined on the set of real numbers R are called fuzzy quantities.
A fuzzy interval is any normal fuzzy quantity with bounded support whose -cuts for all 2 (0; 1] are closed crisp interval of real numbers.A fuzzy number A is a special fuzzy interval for which A(x) = 1 for exactly one x 2 R.
and for = 1 Figure 1 depicts trapezoidal fuzzy interval 151; 161; 169; 179] (solid line) together with its -cut at = 0:6 (dashed line).For quantitative evaluation of the amount of uncertainty in a fuzzy set, a measure of fuzziness of a fuzzy set is needed.In general, a measure of fuzziness is a nonnegative real function defined on the set of all fuzzy subsets of X which satisfies the following properties: 1. (A) = 0 if and only if A is a crisp set, 2. (A) attains its maximum if and only if A(x) = 0:5 for all x 2 X, 3. (A) (B) when set A is sharper than set B, which means that A(x) B(x) when B(x) 0:5 and A(x) B(x) when B(x) 0:5 for all x 2 X.
More information about fuzzy sets can be found in Dubois and Prade (1980) or Klir and Yuan (1995).

Rough Sets
Let U denote a nonempty universal set, and let R U U be an equivalence relation.The pair (U; R) is called a knowledge base or an approximation space.Given an arbitrary set A U, it may not be possible to describe A precisely in the approximation space (U; R).Instead, one may characterize A by a pair of lower and upper approximations defined as follows: where x] R is the equivalence class containing x.The pair (R(A); R(A)) is called the rough set with a reference set A.
The greater is the borderline region of a set the lower is the accuracy of approximation of the set in approximation space (U; R).Pawlak (1991) introduced an accuracy measure as follows: Obviously, 0 R (A) 1.If R (A) = 1, the borderline region of A is empty and the set A is R-definable.In order to express the degree of inexactness (roughness) of a set Pawlak (1991) suggested the measure and referred to it as a roughness of A.
A rough set A can be characterized by a single membership function for all x 2 U.It is obvious that A(x) = 1 if and only if x 2 R(A) and A(x) = 0 if and only if x = 2 R(A).An element x 2 U belongs to the borderline region BN R (A) if and only if 0 < A(x) < 1.
Research on the theory of rough sets increases steadily and several extensions of Pawlak's classical rough sets theory have been developed (see Polkowski and Skowron, 1998).
3 Fuzziness and Roughness Klir and Yuan (1995) suggested that fuzziness of a fuzzy set can be measured by distance between its membership function and the membership function (characteristic function) of the nearest crisp set.This idea can be generalized as follows: Definition 1.Let A be a fuzzy set and let 2 (0; 1].Then -fuzziness of A is given by where A is -cut of A and d : 0; 1] 0; 1] !0; 1) is a metric distance.
In this paper the Hamming distance, defined by the sum of absolute values of differences will be used.
It is easy to verify the following properties: 1.
Because in this case the nearest crisp set to a trapezoidal fuzzy interval A is its -cut A 0:5 fuzziness of A is evaluated by (A) = 0:5 (A) = (b a) + (d c): Assume approximation space (R; I) where R is the set of all real numbers and I is the identity relation on R.Then, a fuzzy set A defined on R can be characterized by a pair of lower and upper approximations I(A) = fx 2 R : x] I Ag; Because I is the identity relation, the membership functions (characteristic functions) of sets in ( 22) and ( 23) are and respectively.Then I(A) is the core of A and I(A) is the support of A. According to (18) roughness of A in approximation space (R; I) is given by Measure of roughness of a fuzzy quantity can be generalized as follows.
Definition 2. Let A be a fuzzy quantity and let 2 (0; 1].Then -roughness of A in approximation space (R; I) is given by where A is -cut of A.
Because this paper will consider only the approximation space (R; I) , the notation (A) instead of I (A) and the notation (A) instead of I will be used.It is easy to verify the following properties: 1. 1 (A) = 1 (A) for 2 0:5; 1), Proof.From Definition 3 follows that (G (A)) = fx : G (A)(x) g = fx : A(x) g = A .Therefore (G (A)) = d(G (A); A ).
For A(x) we have: A(x) G (A)(x) A (x) = 1.For A(x) < we have: From Definition 3 follows that if G (A)(x) > 0 then A(x) > 0: Therefore support of G (A) is a subset of support of A. Then (G (A)) = 1 j(G (A)) j jS G (A) j 1 jA j jS A j = (A).Because -cut of G (A) is the same as -cut of A, we have that After some algebraic manipulations we get (40).
For each i 2 N n = f1; : : : ; ng support of X i is interval of real numbers (a i ; b i ).Let a S = min i fa i g and b S = max i fb i g.The the interval R S = a S ; b S ] will be called the range of sample S. Range R S can be divided in k non-precise categories (2 k < n) described by vague linguistic expressions, and characterized by fuzzy intervals.For example, C 1 = very small values, C 2 = small values, C 3 = medium values, C 4 = large values and C 5 = very large values.
Let C = fC j g k j=1 be a family of non-precise categories defined on R S .Then 1.If for each x 2 R S there is a C j 2 C such that C j (x) > 0, family C is called a fuzzy cover of R S .
2. If for each x 2 R S : 0 < P k j=1 C j (x) 1, family C is called a weak fuzzy partition of R S .
3. If for each x 2 R S : P k j=1 C j (x) = 1, family C is called a fuzzy partition of R S .
Assignment of non-precise observations (fuzzy numbers) to non-precise categories (fuzzy intervals) is based on degree of inclusion of fuzzy sets.There are numerous inclusion grades in literature (see Dubois and Prade, 1980).This paper uses the degree of

Application
The construction of a membership function of a non-precise observation depends on the field of application (see Viertl, 1996).It is assumed in this section that each non-precise observation X i is a triangular fuzzy number a i ; b i ; d i ].Constants a i ; b i and d i can be estimated by a human observer or calculated from a set of repeated measurements (crisp real numbers) associated with i-th object.Then, for example, a i = first quartile, b i = median, and d i = third quartile of measurements.
Let the quantitative variable X of interest be a water level of a river measured (observed) in centimeters.A random sample S of 12 non-precise measurements described by triangular fuzzy numbers X i is in Table 1.Graphical representation of this sample is on Figure 3. Fuzziness of S calculated according to ( 32) is (S) = 14:17.Because all non-precise observations are characterized by triangular fuzzy numbers, roughness of S is (S) = 1 (see formula ( 32)).This amount of fuzziness and roughness of S might be too high for further analysis.In order to reduce them, one may approximate each fuzzy observation X i by its generalized -cut G (X i ).Let us assume, for example, = 0:7.
Generalized 0:7-cuts with their fuzziness and roughness are in Table 2. Then (G 0:7 (X i )) = 8:14; (64) (G 0:7 (X i )) = 0:823: (65) Fuzziness of S has been reduced by 44.6 % and its roughness by 17.7%.In further exploratory analysis of S one may use instead of original fuzzy numbers X i their approximations by trapezoidal fuzzy intervals G 0:7 (X i ).    1.There is only one real number x 1 such that X i (x 1 ) = C j (x 1 ) = h 1 2 (0; 1) and a i d j .For example, X 11 and C 2 in our sample.Then 2. There is only one real number x 2 such that X i (x 2 ) = C j (x 2 ) = h 2 2 (0; 1) and a j d i .For example, X 9 and C 3 in our sample.Then 3. There are two real numbers x 1 ; x 2 such that x 1 < x 2 , X i (x 1 ) = C j (x 1 ) = h 1 , X i (x 2 ) = C j (x 2 ) = h 2 and h 1 ; h 2 2 (0; 1).For example, X 4 and C 2 in our sample.
Then jX i \ C j j = (68) Values of x 1 ; x 2 and values of h 1 ; h 2 for each combination of X i and C j are in Table 3 and Table 4, respectively.
From a theoretical point of view, the notions of -sharpness, -fuzziness and generalized -cut have been introduced.Each of these notions deserves careful study and will be the object of further investigation.
From a practical point of view, this paper presented some simple techniques for evaluation of uncertainty in a sample of non-precise data.Computational steps illustrated on a small example provide guidelines for application of suggested methods to analysis of real data.
Let a; b; c; d are real numbers such that a b c d. Then a trapezoidal fuzzy interval A denoted by quadruple a; b; c; d] has the membership function

Theorem 1 .
Let A be a trapezoidal fuzzy interval characterized by quadruple a; b; c; d]. ) A (x)j dx = (b a) + (d c) 2 + (1 ) 2 : (20) Proof.According to (13) A is interval of real numbers f 1 ( ); g 1 ( )] = a + (b a) ; d (d c) ] = 1 ; 2 ].Then (1980)  called the quantities (b a) and (d c) the left and the right spread of A, respectively.Then fuzziness of A calculated by (22) is actually the total spread of A. Fuzziness of a triangular fuzzy number A = a; b; d] is (A) = d a = jS A j.

Theorem 2 .
Let A be a trapezoidal fuzzy interval characterized by quadruple a; b; c; d]. is a crisp set and therefore C A = A .Because A is a trapezoidal fuzzy interval its -cut is crisp interval of real numbers A = a + (b a) ; d (d c) ].Therefore jC A j = d (d c) (a + (b a) ): Support of A is crisp interval of real numbers S A = (a; d) and jS A j = d a: Then

Figure 4 :
Figure 4: Non-precise categories on the range of S

Table 1 :
Non-precise observations of water level