Trends in Fuzzy Statistics

: After introducing and developing fuzzy set theory, a lot of studies have been done to combine statistical methods and fuzzy set theory. This works, called fuzzy statistics, have been developed in some branches. In this article we review essential works on fuzzy estimation, fuzzy hypotheses testing, fuzzy regression, fuzzy Bayesian statistics, and some relevant ﬁelds. Zusammenfassung: Nach Entwicklungen in Fuzzy Mengen-Lehre sind eine Vielfalt von Untersuchungen gemacht worden, um Stochastische Methoden und Fuzzy Mengen-Lehre zu kombinieren. Diese Arbeiten, die in verschiedenen Zweigen zu ﬁnden sind, werden mit Fuzzy-Stochastik zusammengefaßt. In dieser Arbeit geben wir eine ¨Ubersicht ¨uber Fuzzy-Absch¨atzungen, Fuzzy-Tests von Hypothesen, Fuzzy-Regression, Fuzzy-Bayesianische Stochastik und einige verwandte Gebiete.


Introduction
Statistical analysis, in traditional form, is based on crispness of data, random variables, hypotheses, decision rules, parameters, and so on.As there are many different situations in which the above assumptions are rather irrealistic, there have been some attempts to analyze these situations with fuzzy set theory.
In the present paper, we try to give an overview of the combination of statistical methods and fuzzy set theory.
It should be mentioned that, we only focus on the fuzzy statistics.So, we will not consider fuzzy probability, fuzzy random variables, fuzzy stochastic processes, probabilistic interpretation of fuzzy sets, random sets approach to fuzzy sets, and so on.
With this in mind, we start in Section 2, remarking some works on statistical point estimation theory in the fuzzy environment.In Section 3, a few works on interval estimation are reviewed.Researches on fuzzy hypotheses testing are discussed in Section 4. In Section 5, we consider topic works on fuzzy regression.Section 6 contains a review of some researches on fuzzy Bayesian statistics.Finally, in Section 7, we mention some special works on pure and applied fuzzy statistics.
Let us point out two comments: 1.There are two valuable monographs by Kruse and Meyer (1987), and by Viertl (1996).In these two books, readers find attempts to unify many works in the field of descriptive statistics with fuzzy data, fuzzy random variables, statistical inference for fuzzy data, and some related fields.2. There are a lot of discussions on the similarities and differences between probability outcomes cannot be exactly perceived, but they may be assimilated with fuzzy information.Gil (1992) studies the connection between fuzzy numbers and random intervals, as well.Watanabe (1996) derives some results on fuzzy random variables from the statistical point of view, and specially studies fuzzy interval estimation.Viertl (1996) discusses the problem of confidence regions based on fuzzy data.
It should be mentioned that, McCain (1983) uses the phrase "fuzzy confidence interval" for an alternative interpretation of fuzzy numbers, without statistical interpretation.

Hypothesis Testing
Testing of statistical hypotheses using fuzzy sets, is developed in different approaches.Casals et al. (1986a,b), and Casals and Gil (1989) discuss statistical hypothesis testing based on a model represented by fuzzy events.They extend Neyman-Pearson Lemma and Bayes method to this case.Gil et al. (1985b) study a goodness of fit test with fuzzy observations, as well.Saade and Schwarzlander (1990) develop fuzzy hypothesis testing for hybrid data under which one hypothesis is a mixture of a random and a fuzzy component.In their formulation the likelihood ratio, which is normally utilized in such types of decision making problems, is fuzzified and compared to a threshold.Son et al. (1992), using a generalized Neyman-Pearson lemma, present a locally most powerful fuzzy test and study its application in signal detection.Watanabe and Imaizumi (1993) introduce a testing method of a fuzzy hypothesis for random data.They didn't precisely define the probabilities of type I and type II errors.They have defined the generalized power function as the expected value of a fuzzy critical function.In their method, the conclusion from the test is also fuzzy.Takayanagi and Cliff (1994) explain the human imprecision of the choice boundary in hypothesis testing.In particular, they attempt to determine how membership functions and the fuzziness index clearly show the human decision ambiguities that are reflected in formal statistical investigations.Romer and Kandel (1995) investigate the impacts of vague data on the statistical task of hypotheses testing.Arnold (1995Arnold ( , 1996Arnold ( , 1998)), for the first time, present an approach how to test fuzzily formulated hypotheses with crisp data in which the probabilities of type I and type II errors are defined.His approach focuses on one-parameter exponential family.He studies the application of his method to some one-sided and two-sided tests.The same problem was considered by Delgado et al. (1985) with another approach.Taheri and Behboodian (1999) formulate the problem of fuzzy hypotheses testing when the hypotheses are fuzzy and the observations are crisp.In order to establish optimality criteria, they give new definitions for probability of type I and type II errors.Then, on the basis of these new errors, they state and prove Neyman-Pearson Lemma for fuzzy hypotheses testing.Paris (2001) applies their method to analyze the effect of squeezing the channel in binary communication.
With the help of a central limit theorem, Korner (2000) obtains an asymptotic test of hypotheses about the fuzzy expectation with respect to fuzzy data.
Based on a generalized metric for fuzzy numbers, Montenegro et al. (2001) study some two-(fuzzy)sample hypothesis tests for means concerning a fuzzy random variable in two populations.Yao and Wu (2001) consider hybrid data which contain randomness and fuzziness, and give a fuzzy sequential test.They use the signed distance ranking to fuzzify the usual sequential test and obtain the fuzzy sequential test, and then use defuzzification to obtain the sequential test in the fuzzy sense.Last et al. (1999), using a data mining approach, study fuzzy hypothesis testing and its application in medical diagnoses.See also Schenker et al. (2000).Grzegorzewski (2000) proposes a method for testing hypotheses with fuzzy data, which leads not to a binary decision but to a fuzzy decision result in which he defines a grade of acceptability for the null and the alternative hypotheses.See also Grzegorzewski (2002).Filzmoser and Viertl (2003) present an approach for testing hypotheses at the basis of fuzzy data, by introducing the fuzzy p-value.Taheri (2003) investigates the problem of testing of fuzzy hypotheses, and extend the sequential probability ratio test for such hypotheses.
For a review on the works on fuzzy hypotheses testing with Bayesian approach, see Section 6.
Finally, it should be mentioned that, some statisticians explain that, the precise hypotheses are not real hypotheses.They propose to consider some imprecise, but real, hypotheses.See, for example, Berger andDelampady (1987), andBerger et al. (1997).

Regression
In traditional statistical inference, as we know, the regression models are used frequently in the researches of the relations among several variables in a system.Observing some of the variables, we can make estimates and predictions for the others.If a system under consideration is not governed by random variables and/or crisp observation but is governed by possibility variables and/or imprecise observation, it is more natural to seek a fuzzy regression analysis for such a system.Fuzzy regression, in a general way, can be classified into two categories: I) Fuzzy regression when the relations of the variables are subject to fuzziness, II) Fuzzy regression when the variables themselves are fuzzy.
With another point of view the works on fuzzy regression, in a general way, can be classified into two categories: I) The possibilistic regression analysis based on possibility concepts (since membership functions of fuzzy sets are often described as possibility distributions, (Dubois and Prade, 1988), this approach is called possibilistic regression analysis), II) Least squares method for minimizing errors for the estimated outputs.
The characterization of the above cases is developed from different perspectives, and thus there exist several conceptual and methodological approaches to fuzzy regression.It should be mentioned that, in some approaches more than one case is considered to provide a fuzzy regression model.In the following, we review some topics in this field.
Fuzzy regression analysis was first proposed by Tanaka et al. (1980Tanaka et al. ( , 1982) ) where a fuzzy linear system is used as a regression model, (see also Tanaka, 1987;Tanaka et al., 1987).They consider a regression model in which the relation of the variables are subject to fuzziness, i.e., the model with crisp input and fuzzy parameters.Their approaches are developed in several ways, see for example, Peters (1994), Luczynski and Matloka (1995), Tanaka et al. (1995), Tanaka and Lee (1999), and Yen et al. (1999).
An application of the mentioned model to sale forecasting is discussed in Heshmaty and Kandel (1985).A modified form of possibilistic regression is proposed by Sakawa and Yano (1991).
It should be mentioned that, possibilistic regression usually leads to a mathematical programming problem.Bardossy (1990) introduce the general form of regression equations, and shows how the problem of fuzzy regression can be formulated as a mathematical programming problem.
Interval regression analysis is considered by Tanaka and Lee (1998).In addition, Wang and Tsaur (2000b) provide an insight into regression intervals so that regression interval analysis, data type analysis and variable selections can be analytically performed.Also, Entani and Tanaka (2000) extend the exponential possibility regression to the interval outputs.
Another direction of fuzzy regression is fuzzy least squares approach.This method is based on the notion of distance between the predicted fuzzy outputs and the observed fuzzy outputs and the goodness-of-fit.Diamond (1987) propose models for least squares fitting for crisp input fuzzy output and for fuzzy input-output where the distance of fuzzy numbers is defined to measure the best fit for models.Diamond's method has been revisited in Diamond and Korner (1997).Bardossy et al. (1992) define a new class of distance on fuzzy numbers, and consider the model involving fuzzy input and fuzzy parameters.Xu (1997) discusses the problem of least squares fitting of fuzzy-valued data by developing an special curve regression model for fitting this type of data.Hong et al. (2001) consider a least-squares approach to the regression model with fuzzy-input and fuzzy parameters, using shape preserving operations.Chang and Ayyub (1997) develop a method for hybrid least squares regression, based on the weighted fuzzy arithmetic and the least squares fitting criterion.See also Chang (2001).Wang and Tsaur (2000a) propose a modified fuzzy least squares method for a crisp input fuzzy output model.
Fuzzy prediction based on regression models is studied by Yager (1982).In addition, Arnold and Stahlecker (1998) present an approach haw empirical data and fuzzy prior information may be combined for prediction purpose.Jajuga (1986) present another approach which is useful in the case of heterogeneous observations.Celmins (1987) deals with quadratic membership functions based on least squares fitting with indicators of discord, data spread dilator, etc. Nather and Korner (1998) extend classical estimates in the linear regression with crisp and fuzzy input and fuzzy output cases, with a least squares approach.Now, let us review some specially works in fuzzy regression.Wang and Li (1990), based on the concept of possibility variable, its distribution, and the independence among possibility variables, study the regression analysis of fuzzy valued variables.Bardossy et al. (1992) define a new class of distance on fuzzy numbers and consider the model involving fuzzy input and fuzzy parameters.Cheng and Lee (1999) propose a fuzzy adaptive network approach to fuzzy regression.They use radial basic function network in fuzzy regression analysis without predefined functional relationship between the input and output, as well (Cheng and Lee, 2001).
Sometimes it is hard to evaluate the goodness of fit of a fuzzy regression model.Toyoura and Watada (2000) propose two indices to evaluate a fuzzy regression model.In addition, Sadeghpour Gildeh and Gien (2002a) provide a goodness of fit index to evaluate the goodness of fit between the observed values and the estimated values in a fuzzy regression model.D' Urso and Gastaldi (2000) propose a doubly linear adaptive fuzzy regression model, based on two linear models: a core regression model and spread regression model, to explain the centers of the fuzzy observations and for their spreads.Guo and Tanaka (2001) propose a fuzzy DEA (data envelopment analysis) model to deal with the efficiency evaluation problem with the given fuzzy input and output data.They extend their model with considering the relationship between DEA and regression analysis.
Peters ( 2001) present a forecasting model based on fuzzy pattern recognition and weighted linear regression.In this model fuzzy pattern recognition is used to find homogeneous fuzzy classes in a heterogeneous data set, and then for each class a weighted regression analysis is conducted.In addition, the forecasting results obtained by the class regression analysis are aggregated to obtain the overall estimation of the regression model.Roychowdhury and Pedrycz (2002) attempt to combine regression and fuzzy rules to model temporal systems.Arnold and Gerke (2003) provide a method of testing fuzzy linear hypotheses in the linear regression models.
Finally, it should be mentioned that, Reden and Woodall (1994) review and examine some approaches to fuzzy regression and discuss their strength and weakness relative to each other.Kim et al. (1996), discuss and contrast the characteristics of statistical linear regression and fuzzy linear regression, in terms of basic assumption, parameter estimation, and applications.

Bayes Methods
There are a lot of researches regarding Bayesian methods combined with fuzzy set theory.Some works are in the context of statistics.We review some of them in this section.
First, it should be mentioned that, Viertl (1987) explains the necessity of developing a fuzzy Bayesian inference.See also Walley (1991) for some brief discussions on ordinary and fuzzy decision analysis.Tanaka et al. (1979), and Uemura (1991Uemura ( , 1993a,b) ,b) formulate the fuzzy-Bayes decision rule to facilitate determination of the loss function of a Bayes decision rule in a fuzzy environment.Buckley (1983a,b) investigates some problems of statistical inference with fuzzy data, in a fuzzy decision making problem.Gil et al. (1985a) extend the Bayesian method for the point estimation when the available information is fuzzy.
Bayes formula for fuzzy probability measure is studied by Piasecki (1986Piasecki ( , 1987)), and, in a more general manner, by Mesiar and Piasecki (1990).Viertl and Hule (1991) generalize the Bayes' theorem to the case of fuzzy data.They extend the concept of HPD-regions to this case.In addition, based on a generalized integration concept for functions with fuzzy values, Viertl (1999) study the Bayes theorem, confidence regions, and predictions for fuzzy prior distribution and fuzzy data.See also Viertl and Hareter (2002).Schnatter (1993) combines Bayes procedure and fuzzy set theory to provide some methods both for samples of fuzzy data and for prior distributions with non-precise parameters.Dubois and Prade (1997) introduce a Bayesian conditioning operation in possibility theory, adapted to the idea of focusing on a body of knowledge for a reference class described by some evidences.
A Bayes and minimax procedure for testing simple hypotheses is given by Casals et al. (1986a,b).They consider the problem of testing the ordinary (crisp) statistical hypotheses when the observations do not provide exact but rather fuzzy information.Delgado et al. (1985) consider the problem of fuzzy hypotheses testing with ordinary data.Casals (1993) works on the same problem but with fuzzy observations, in the context of fuzzy decision problem (Tanaka et al., 1979).Taheri and Behboodian (2001) study the problem of hypotheses testing, from a Bayesian point of view, when the observations are ordinary (crisp) and the hypotheses are fuzzy.They extend their approach to the case of fuzzy observations and fuzzy hypotheses (Taheri and Behboodian, 2002).Lapiga and Polyakov (1992) use statistical modeling of membership functions to diminish subjectivity and to determine the membership function analytical forms in fuzzy decision-making.Gertner and Zhu (1997), based on two extensions of likelihood functions, have generalized Bayesian estimates for use when sample information and prior distribution are fuzzy.They apply their method to forest surveys.
Bayesian analysis of a decision problem with fuzzy-valued utilities or losses is studied by Lopez-Diaz and Gil (1998).They consider and illustrate the Bayesian analysis of estimation and hypotheses testing problems, as some special cases.
Bayesian fuzzy kriging is studied by Bandemer and Gebhardt (2000).They combine two approaches generalizing the usual kriging technique for prediction in fields: the Bayesian approach incorporating prior knowledge and the fuzzy set approach reflecting imprecise observations.Lapointe and Bobee (2000) develop the counterpart of the Bayes' rule in the possibilistic framework with the use of conditional possibility distributions.

Some Other Fields
There are some studies on fuzzy statistical information theory.For example Menendez et al. (1992) extend the Jensen difference divergence measure and define two information measures to compare statistical experiments when the available information is not exact.They work on sufficiency of a fuzzy information system, as well (Menendez et al., 1989).Yen and Wang (1998) propose several information theoretic optimal fuzzy models construction by extending some statistical information criteria.See also Menendez (1998) for another approach in this subject.Lubiano et al. (1999), andSadeghpour Gildeh andGien (2002b) study the Rao-Blackwell type theorem for fuzzy random variables.
Fuzzy expected value (FEV) is introduced by Kandel and Byatt (1978).Properties of FEV is studied by Schnider and Kandel (1988a,b).Also, they proceed to show how to approximate the FEV in a fuzzy environment by introducing the fuzzy expected interval, and study some applications to the management of uncertainty in fuzzy expert systems (Schnider andKandel, 1988b, 1993).Using FEV, Schnider and Craig (1992) introduce a method for histogram equalization.Viertl (1989) consider the estimation of the reliability function when observed data is fuzzy.Niculescu and Viertl (1992) study the reliability models with fuzzy data and compare two fuzzy sample mean estimators, in this field.Wu (1997) studies the system reliability by considering the failed or functioning probability of each component in the system as nonnegative fuzzy numbers under the definition of fuzzy-valued probability measure.Dunyak et al. (1999) study the system reliability based on the fuzzy probability, too.It should be mentioned that, some works have been done on possibilistic reliability analysis, see for example Cappelle andKerre (1993, 1997).
The extended methods in the construction of control charts for fuzzy observations are proposed by Wang and Raz (1990), and Kanagava et al. (1993).In this field, see also Woodall et al. (1997).Peizhuang and Xihui (1984) establish the concept of the set-valued statistics on the basis of the theory of random sets, and Sizhong (1987) extend this concept to fuzzy setvalued statistics.On the basis of the random intervals, Xihe (1989) demonstrate stability of membership frequency in a proper mathematical model.Bergh and Berg (2000) introduce the competitive fuzzy exception learning algorithm based on fuzzy frequency distributions.See also Berg et al. (2001).Bodjanova (2000) suggests a method for constructing a generalized histogram displaying the distribution of a sample of fuzzy numbers into some fuzzy intervals.Some discussions on descriptive statistics with fuzzy data, are presented by Kruse and Meyer (1987), and by Viertl (1996).
A couple of works have been done on the sampling techniques, with non-precise data.Lopez-Diaz and Gil (1998) define a fuzzy unbiased estimator of the sample mean in random sampling with replacement from a finite population.See also Lubiano and Gil (1999).Garcia et al. (2001) illustrate estimating the expected value of fuzzy random variables in the stratified random sampling.Schnatter (1991) discusses how fuzziness of data is propagated when statistical inference for samples of fuzzy data is carried out.She illustrates how methods from descriptive statistics can be generalized for fuzzy data (Schnatter, 1992).Manton et al. (1991Manton et al. ( , 1994)), describe and illustrate the use of fuzzy partition statistical methods, as a means of analyzing high dimensional data with categorical responses.Some methods of statistical inference with fuzzy data, are reviewed by Viertl (2002a).For a review in more details, see Viertl (2002b).Behboodian and Mohammadpour (2002) consider a distribution depends on some unknown crisp parameters and a fuzzy parameter with a known membership function.They present a method for estimation of the fuzzy parameter, and a procedure for hypothesis testing about one of the unknown crisp parameters.
Fuzzy order statistics and its applications, specially in fuzzy clustering, is studied by Kersten (1999).Chen (1995Chen ( , 2000) ) introduces a general framework of fuzzy analysis of statistical evidence methodologies for pattern classification and knowledge discovery.His method is based on the possibility measure, which does not require a precise belief model and, in a sense, it includes the Bayesian classifiers as special case.Chen et al. (2000) combine fuzzy methods and statistical methods of ANOVA and factor analysis to propose a method for objective evaluation of fabric softness.Rojas et al. (1999) study the relevancy and relative importance of the operators, specially defuzzifiers and T-norms, involved in the fuzzy inference process.
Fuzzy spatial statistics, which is usually referred to as geostatistics, is proposed by Lee (1995).He suggests neural learning combined with fuzzy representation for handling the variogram, which is essentially a covariance correlation, and the kriging, which is an unbiased method for estimating the missing data (Lee, 2000).Burrough (2001) shows that fuzzy set theory is a useful tool for spatial analysis when probabilistic approaches are inappropriate or impossible.