VU Viet Vu
Supervision : Bernadette BOUCHON-MEUNIER
Co-supervision : LABROCHE Nicolas
Semi-supervised clustering and active learning
Clustering is a central task of the data mining and knowledge discovery process. Nowadays, the abundance of data and the increasing size of datasets require clustering algorithms to improve and to adapt the following aspects: quality, speed, and ability to large datasets. For these reasons, the domain of clustering is extremely active.
Semi-supervised clustering has become a very interesting area of research in the last ten years whose goal is to develop clustering algorithms that allow to incorporate domain knowledge from a human expert to improve clustering performance. This knowledge can be expressed as a set of labeled data (the seeds) or a set of constraints. In the latter case, there are two main types of constraints: must-link (ML) which indicate that two points of the dataset must be in the same cluster and cannot-link (CL), which conversely require that two points belong to two different clusters. The current works in the field are particularly interested in the adaptation of existing clustering methods for the management of constraints or labeled data. However, these methods generally bear the same limitations as the method they are inspired from. Furthermore, these approach often rely on a random selection of expert knowledge (constraints, seeds) that may lead to poor performance.
To address these problems, this PhD thesis focuses on two main contributions: (1) new intelligent methods for the selection of constraints or labeled data (seeds) integrated to active learning algorithms (2) new semi-supervised clustering algorithms that improve the methods described in the literature.
In the context of intelligent collection of constraints, we propose a new utility measure of a constraint based on a k-nearest neighbors graph to identify transition zones between clusters where clustering algorithms traditionally make mistakes. This measure forms the basis of our active constraint selection algorithm that has been validated on data sets from the UCI Machine Learning Repository and in the context of a prototype application for image databases. Similarly, we propose three new methods for selecting data to be labeled that were also evaluated on real data and through a prototype application on image databases.
Finally, this thesis describes two new semi-supervised clustering algorithms: SSGC based on seeds and MCLA based on constraints that have smaller complexities, an easier parameters setting and performance comparable or better than the reference algorithms in semi-supervised clustering.
Keywords: clustering algorithm, semi-supervised clustering , active learning, seeds, constraints, k-nearest neighbors graph.
Defence : 07/05/2011
Jury members :
Pascale Kuntz, PR. Polytech'Nantes, Rapporteur
Eric Gaussier, PR. Univ. Grenoble I, Rapporteur
Céline Robardet, MCF INSA Lyon, Examinateur
Matthieu Cord, PR. UPMC, Examinateur
Bernadette Bouchon-Meunier, DR CNRS, Examinateur
Nicolas Labroche, MCF UPMC, Examinateur
2009-2014 Publications
-
2014
- V. Antoine, N. Labroche, V.‑V. Vu : “Evidential seed-based semi-supervised clustering”, 7th International Conference on Soft Computing and Intelligent Systems (SCIS), Kitakyushu, Japan, pp. 706-711, (IEEE), (ISBN: 978-1-4799-5955-6) (2014)
-
2013
- V. Vu, N. Labroche, B. Bouchon‑Meunier, V.‑Th. Vu, N. Hien : “Graph-based Semi-supervised Clustering”, Journal of Science and Technology, Ha Noi University of Education, Viet Nam, Special Number (2013)
-
2012
- V. Vu, N. Labroche, B. Bouchon‑Meunier : “Improving Constrained Clustering with Active Query”, Pattern Recognition, vol. 45 (4), pp. 1749-1758, (Elsevier) (2012)
-
2011
- V. Vu : “Clustering semi-supervisé et apprentissage actif”, thesis, phd defence 07/05/2011, supervision Bouchon-meunier, Bernadette, co-supervision : Labroche, Nicolas (2011)
-
2010
- V.‑V. Vu, N. Labroche, B. Bouchon‑Meunier : “Active Learning for Semi-Supervised K-Means Clustering”, 2010 22nd International Conference on Tools with Artificial Intelligence (ICTAI), Arras, France, pp. 12-15, (IEEE) (2010)
- V. Vu, N. Labroche, B. Bouchon‑Meunier : “An Efficient Active Constraint Selection Algorithm for Clustering”, Proceedings of the 20th International Conference on Pattern Recognition (ICPR-2010), Istanbul, Turkey, pp. 2969-2972, (IEEE) (2010)
- V. Vu, N. Labroche, B. Bouchon‑Meunier : “Boosting Clustering by Active Constraint Selection”, 19th European Conference on Artificial Intelligence (ECAI 2010), Lisbon, Portugal, pp. 297-302, (IOS Press) (2010)
-
2009
- V. Vu, N. Labroche, B. Bouchon‑Meunier : “Leader Ant Clustering with Constraints”, IEEE RIVF International Conference on Computing and Telecommunication Technologies, Da Nang, Viet Nam, pp. 79-86, (IEEE) (2009)