BORDES Antoine
Supervision : Patrick GALLINARI
New Algorithms for Large-Scale Support Vector Machines
There exists a deep need for machine learning methods able to learn with millions of training instances so that they could enjoy the huge available data sources. In this thesis, we propose solutions to reduce training time and memory requirements of learning algorithms while keeping strong performances in accuracy. In particular, among all the machine learning models, we focus on Support Vector Machines (SVMs) that are standard methods mostly used for automatic classification. Throughout this dissertation, we propose different original algorithms for learning SVMs, depending on the final task they are destined to. First, we study the learning process of Stochastic Gradient Descent for the particular case of linear SVMs. This leads us to define and validate the new SGD-QN algorithm. Then we introduce a brand new learning principle: the Process/Reprocess strategy. We present three algorithms implementing it. The Huller and LaSVM are designed towards training SVMs for binary classication. For the more complex task of structured output prediction, we refine intensively LaSVM: this results in the LaRank algorithm. We finally introduce the original framework of learning under ambiguous supervision which we apply to the task of semantic parsing of natural language. Each algorithm introduced in this thesis achieves state-of-the-art performances, especially in terms of training speed.
Defence : 02/09/2010
Jury members :
Stéphane Canu, Professeur et directeur du LITIS à l'INSA de Rouen. [Rapporteur]
John Shawe-Taylor, Professeur et directeur du CSML à l'University College London au Royaume-Uni. [Rapporteur]
Jacques Blanc-Talon, Responsable scientifique à la DGA/MRIS.
Léon Bottou, Distinguished senior researcher à NEC Labs of America aus Etats-Unis.
Matthieu Cord, Professeur au LIP6.
Patrick Galinari, Professeur et directeur du LIP6.
Bernhard Schölkopf, Professeur et et directeur du MPI for Biological Cybernetics en Allemagne.
2007-2017 Publications
-
2017
- G. Lample, N. Zeghidour, N. Usunier, A. Bordes, L. Denoyer, M. Ranzato : “Fader Networks: Generating Image Variations by Sliding Attribute Values”, 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, United States, pp. 5969-5978 (2017)
-
2010
- A. Bordes : “Nouveaux Algorithmes pour l’Apprentissage de Machines à Vecteurs Supports sur de Grandes Masses de Données”, thesis, phd defence 02/09/2010, supervision Gallinari, Patrick (2010)
- A. Bordes, N. Usunier, J. Weston : “Label Ranking under Ambiguous Supervision for Learning Semantic Correspondences”, Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, pp. 103-110, (Omnipress) (2010)
- N. Usunier, A. Bordes, L. Bottou : “Guarantees for Approximate Incremental SVMs”, 13th International Conference on Artificial Intelligence and Statistics, vol. 9, JMLR: Workshop and Conference Proceedings, Chia Laguna Resort, Sardinia, Italy, pp. 884-891 (2010)
- A. Bordes, N. Usunier, R. Collobert, J. Weston : “Towards Understanding Situated Natural Language”, 13th International Conference on Artificial Intelligence and Statistics, vol. 9, JMLR: Workshop and Conference Proceedings, Chia Laguna Resort, Sardinia, Italy, pp. 65-72 (2010)
-
2009
- A. Bordes, L. Bottou, P. Gallinari : “SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent”, Journal of Machine Learning Research, vol. 10, pp. 1737-1754, (Microtome Publishing) (2009)
-
2008
- A. Bordes, N. Usunier, L. Bottou : “Sequence Labelling SVMs Trained in One Pass”, ECML PKDD 2008 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, vol. 5211, Lecture Notes in Computer Science, Anvers, Belgium, pp. 146-161, (Springer) (2008)
-
2007
- A. Bordes, L. Bottou, P. Gallinari, J. Weston : “Solving multiclass support vector machines with LaRank”, ICML 2007 - 24th International Conference on Machine Learning, Corvallis, United States, pp. 89-96 (2007)