DO Trinh Minh Tri

PhD student at Sorbonne University
Team : MALIRE
https://lip6.fr/Trinh-Minh-Tri.Do

Supervision : Thierry ARTIÈRES

Regularized bundle methods for large-scale learning problems with an application to large margin training of hidden Markov models

Machine learning is most often cast as an optimization problem, e.g. the minimization of a function of a set of model parameters, where one wants to find optimal model parameters. A today challenge is to make such a framework practical for large dataset. To do that we need optimization methods that scale well with the problem size, number of data, dimension of data, etc. This thesis addresses the challenge of scalability and efficiency of machine learning approaches, with a particular focus on sequence labeling tasks for signal processing. This thesis describes our works which concern first, designing scalable optimization tools and second, designing new discriminative learning framework for signal labeling tasks. In the optimization part we aimed at developing efficient optimization algorithms for minimizing a regularized objective. We chose to focus on unconstrained optimization learning formulation since it appeared to us a promising approach for dealing with structured data in a large scale setting. Our works have been inspired by the cutting plane technique, which constituted a basis for recent advances in optimization methods for machine learning such as the convex regularized bundle method (CRBM). In a preliminary step we focus on the simpler setting of a convex, non smooth everywhere, objective function. We detail an improved variant of CRBM which limits the computation and memory costs while retaining good, and proved, convergence rate. Then we go further by considering the non-convex and non smooth everywhere optimization problem. It is a much harder problem but it has also much broader applications, especially in the context of signal processing. We describe a new optimization method for this setting, called Non-convex Regularized Bundle Method (NRBM). It is based on similar ideas as in the convex case with few specificities for handling non-convexity. We provide partial but interesting theoretical results on the convergence of the method in this complex setting. We also discuss variants of the method and show that adding a line search procedure increases convergence rate in practice. Next we describe applicative works on discriminative models for sequence and signal labeling tasks such as handwriting and speech. Part of these works rely on the optimization tools we developed for convex and non convex optimization. Our works extend seminal works on structured prediction such as conditional random fields or max margin Markov networks for signal processing. We first explore the use of conditional random fields for handling complex inputs (non linearly separable), multi modality, partially observed training data, and segmental features. Next we consider the large margin training of continuous density Hidden Markov Models (CDHMMs), a state of the art model for signal processing tasks. Although large margin training of CDHMM is promising, previous works usually relied on severe approximations so that it is still an open problem. By formalizing the learning problem as a non-convex regularized objective function minimization, we may use our optimization tool for solving efficiently this difficult problem. Our method was validated on two standard datasets in speech and handwriting demonstrating its efficiency and its scalability, with for instance up to 6 millions frames and up to 2 millions parameters in handwriting and speech recognition experiments.


Phd defence : 06/17/2010

Jury members :

Prof. Thierry ARTIÈRES
Prof. Matthieu CORD
Prof. Patrick GALLINARI
Dr. Gunnar RATSCH (Rapporteur)
Prof. Gerhard RIGOLL
Dr. Jean-Philippe VERT (Rapporteur)

Departure date : 09/30/2010

2005-2012 Publications