TRINH Anh Phuc
Supervision : Patrick GALLINARI
Classifieur probabiliste et Séparateur à Vaste Marge. Application à la classification de texte et à l'étiquetage d'image
This thesis proposes estimators of posterior probabilities for separator Large Margin Classifiers. It includes a theoretical and an experimental part.
The first contribution we present is to introduce a probabilistic classifier based on SVM for multi-class classification. The approach we use is the one against one approach, where for a problem with k classes k (k - 1) / 2 classifiers are trained. The binary outputs of these classifiers form voting features based on which a class decision will be computed. We introduce a new voting space that enables an enhanced representation of classifier decisions so as to take into account the relations between classes. We propose a method to learn from this binary space an estimate of the posterior probabilities of classes.
The second contribution concerns the problem of multi-label classification and the dependencies between labels. The prediction of structured outputs in recent years has been an extremely active area and many models based on extensions of SVMs or graphical models have been proposed. Many of these models have a complexity that prevents any application on real data. We introduce a multi-label classifier based on an undirected graphical model formalism. We propose approximate learning and inference methods of limited complexity. They make use of the probabilistic binary classifiers developed before.
The third contribution is the experimental validation of these ideas and algorithms. A first application allows us to test our multi-class probabilistic classifiers. This Challenge is a DEFT competition on the French classification of texts. The data on which we worked ,deal with classification and gender theme of journalistic corpora. The second application we addressed concerns the labeling of images by using information of dependency between the labels. It corresponds to a task proposed in the international competition ImageCLEF08 2. We propose a graphical model suitable for this task allows us to validate this model on a multi-label problem.
Defence : 02/17/2012
Jury members :
Thierry Paquet, Professeur, Université de Rouen [Rapporteur]
Sylvie Thiria, Professeur, Université Versailles Saint Quentin en Yvelines [Rapporteur]
Patrick Gallinari, Professeur, Université Pierre et Marie Curie
Thierry Artières, Professeur, Université Pierre et Marie Curie
2008-2012 Publications
-
2012
- A. Trinh : “Classifieur probabiliste et Séparateur à Vaste Marge. Application à la classification de texte et à l’étiquetage d’image”, thesis, phd defence 02/17/2012, supervision Gallinari, Patrick (2012)
-
2009
- A. Trinh, D. Buffoni, P. Gallinari : “Probabilistic Multi-classifier by SVM from voting rule to voting features”, Extraction et gestion des connaissances (EGC'2009), vol. RNTI-E-15, Revue des Nouvelles Technologies de l'Information, Strasbourg, France, pp. 433-434 (2009)
-
2008
- A. Trinh, D. Buffoni, P. Gallinari : “Classifieur probabiliste avec Support Vector Machine (SVMs) et Okapi”, Actes de conference TALN08, Avignon, France, pp. 75-84 (2008)
- A. Trinh : “La classification de texte d’opinion par les Séperateur à Vaste Marge”, Actes d'atelier FODOP08, INFORSID08., Fontainebleau, France, pp. 10-10 (2008)