GÉRALD Thomas
Supervision : Patrick GALLINARI
Co-supervision : BASKIOTIS Nicolas
Representation Learning for large scale classification
The past decades have seen the rise of new technologies that simplify information sharing. Today, a large part of the data is accessible to a large number of users. In this thesis, we propose to study the problems of document annotations that lead to many applications such as information retrievals ones. We will be interested in the field of extreme classification which characterizes the task of automatic classification when the number of labels is important. Many difficulties arise from the size and complexity of considered data: the prediction time, storage and relevance of the prediction are the most representative. Recent research dealing with this issue is now based on three types of approaches: ensembling approaches learning a large set of simple classifiers; “hierarchical” methods organizing a structure of simple classifiers; approaches by representations plunging documents into small spaces. In this thesis, we will study the approaches of classification by representation. Through our contributions, we will propose different approaches to overcome the problems of prediction time and representation space structure. First, we will study discrete representations with the objective to find the best possible representations while ensuring a low inference time. In a second step, we will consider hyperbolic representations in order to take advantage of the qualities of this space for structured data.
Defence : 11/17/2020
Jury members :
Massih Reza Amini (Professeur à l'Université Grenoble Alpes, AMA) [Rapporteur]
Pascale Kuntz-Cosperec (Professeure à Polytech Nantes, Laboratoire des Sciences du Numérique de Nantes) [Rapporteur]
Patrick Gallinari (LIP6, MLIA)
Nicolas Baskiotis (LIP6, MLIA)
Julien Tierny (Chargé de Recherche à Sorbonne université, LIP6, équipe APR)
Xiangliang Zhang (Associate Professor à King Abdullah University of Science and Technology, CEMSE)
2017-2020 Publications
-
2020
- Th. Gérald : “Apprentissage de Représentation pour la classification large échelle”, thesis, phd defence 11/17/2020, supervision Gallinari, Patrick, co-supervision : Baskiotis, Nicolas (2020)
- N. Miolane, N. Guigui, A. Le Brigant, J. Mathe, B. Hou, Y. Thanwerdas, S. Heyder, O. Peltre, N. Koep, H. Zaatiti, H. Hajri, Y. Cabanes, Th. Gerald, P. Chauchat, Ch. Shewmake, D. Brooks, B. Kainz, C. Donnat, S. Holmes, X. Pennec : “Geomstats: A Python Package for Riemannian Geometry in Machine Learning”, Journal of Machine Learning Research, vol. 21 (223), pp. 1-9, (Microtome Publishing) (2020)
- N. Miolane, N. Guigui, H. Zaatiti, Ch. Shewmake, H. Hajri, D. Brooks, A. Le Brigant, J. Mathe, B. Hou, Y. Thanwerdas, S. Heyder, O. Peltre, N. Koep, Y. Cabanes, Th. Gerald, P. Chauchat, B. Kainz, C. Donnat, S. Holmes, X. Pennec : “Introduction to Geometric Learning in Python with Geomstats”, SciPy 2020 - 19th Python in Science Conference, Austin, Texas, United States, pp. 48-57 (2020)
-
2019
- Th. Gerald, N. Baskiotis : “Joint Label/Example Hyperbolic Representation for Extreme Classification”, Conférence sur l’Apprentissage automatique 2019, Toulouse, France (2019)
-
2018
- Th. Gerald, N. Baskiotis, L. Denoyer : “Apprentissage stochastique de représentation binaire pour la classification multi-classe dans un grand nombre de catégories”, Conférence sur l’Apprentissage automatique 2018, Rouen, France (2018)
-
2017
- Th. Gerald, N. Baskiotis, L. Denoyer : “Binary Stochastic Representations for Large Multi-class Classification”, Neural Information Processing, vol. 10634, Lecture Notes in Computer Science, Guangzhou, China, pp. 155-165, (Springer International Publishing), (ISBN: 978-3-319-70086-1) (2017)