LIP6 1998/014: THÈSE de DOCTORAT de l'UNIVERSITÉ PARIS 6
LIP6 /
LIP6 research
reports
232 pages - Mars/March 1998 -
French document.
PostScript : 493 Ko /Kb
Contact : par mail / e-mail
Thème/Team: Apprentissage et Acquisition de Connaissances
Titre français : Apprentissage inductif en présence de données imprécises :
Construction et utilisation d'arbres de décision flous
Titre anglais : Inductive learning in presence of imprecise data:
Methods to build and to use fuzzy decision trees
Abstract : The automatic extraction of knowledge from a set of data enables us to discover relationships between these data. Furthermore, these relationships can be generalized to new data. Learning by decision trees is such an inductive method of generalization. Given a set of data described by means of values for attributes and associated with a value for a class, learning by decision trees enables us to construct a set of rules that express the relationships between the values of attributes and the values of the class. In this thesis, we propose to extend the classical ID3 method of decision tree learning by means of fuzzy subset theory. This theory enables us to take into account imprecision in the description of data and, thus, to construct fuzzy decision trees.
A new formalization of the algorithms of construction of decision trees is proposed. It highlights the various parameters that differentiate an algorithm from another: the measure of discrimination is used to minimize the size of the trees and to select coherent nodes for the tree to obtain good results during the generalization step; the splitting strategy defines how to split the training set during the construction of the tree; and the stopping criterion defines when the development of a path of the tree has to be stopped. By means of this formalization, introduction of fuzzy subset theory can be done with respect to the specificity of each kind of parameters.
The method to use fuzzy decision trees when classifying previously unknown cases is given. It is based on the use of a measure of satisfiability to evaluate the adequation between description values of the case and the corresponding values that appear in nodes of the tree. A study of the stability of this method of inference is presented. It highlights the robustness of the decision given by such a fuzzy system.
Afterwards, we propose a method to study the measures of discrimination used during the construction of decision trees. This method is given as a hierarchical model of functions.
It enables us to construct new measures and to validate their pertinence for the evaluation of the discriminating power of an attribute. Moreover, it makes easier the introduction of the fuzzy subset theory in such measures.
Fuzzy decision tree construction requires a fuzzy partition on the universe of values of numerical or numerical-symbolic attribute. However, such a partition is often unknown or it is obtained with difficulties from an expert of the corresponding domain. This is why we introduce a new automatic method of construction of such fuzzy partitions on a set of values. This method is based on the use of operators from mathematical morphology theory. In our context, these operators are formalized by means of the formal language theory. This method allows filtering a training set so as to highlight cores of membership functions related to values of the class.
These works have been implemented as a computer system named Salammbô. This system enables us to construct fuzzy decision trees and to make a choice between several measures of discrimination to build a tree. Classification of new cases is done with the obtained fuzzy tree by means of various kinds of inference rule operators.
To mitigate the consequence of the difficulties lying in the simultaneously measure of the discriminating power of an attribute related to more than two values of the class, we implemented a computer system named Tanit. This system enables us to construct a forest of fuzzy decision trees by means of several Salammbô systems. A fuzzy tree is constructed by each Salammbô to recognize only two values of the class: one value versus all the others joined together.
During the classification of new cases, the membership degrees of values of the class given by each Salammbô have been aggregated by the Tanit system. Several methods of aggregation have been implemented and a study of the independence of such a multi-classifier system is done. This study is based on the previous works of Kampé de Fériet related to the independence of a set of observers.
Both Salammbô and Tanit systems have been tested on various kinds of training sets. These tests enable us to highlight the greater understandability and explainability of decisions given by a fuzzy decision tree. Moreover, it can be shown that the size of fuzzy trees is lower than the size of classical trees, and their classification rate is also improved.
Key-words : Fuzzy subset theory, Inductive learning, Fuzzy decision tree, Measures of discrimination, Fuzzy partitioning
Publications internes LIP6 1998 / LIP6 research reports 1998