SIMON Etienne
Supervision : Vincent GUIGUE
Co-supervision : PIWOWARSKI Benjamin
Deep Learning for Natural Language Understanding
Capturing concepts' interrelations is a fundamental of natural language understanding. It constitutes a bridge between two historically separate approaches of artificial intelligence: the use of symbolic and distributed representations. However, tackling this problem without human supervision poses several issues, and unsupervised models have difficulties echoing the expressive breakthroughs of supervised ones. This thesis addresses two supervision gaps we identified: the problem of regularization of sentence-level discriminative models and the problem of leveraging relational information from dataset-level structures. The first gap arises following the increased use of discriminative approaches, such as deep neural network classifiers, in the supervised setting. These models tend to collapse without supervision. To overcome this limitation, we introduce two relation distribution losses to constrain the relation classifier into a trainable state. The second gap arises from the development of dataset-level (aggregate) approaches. We show that unsupervised models can leverage a large amount of additional information from the structure of the dataset, even more so than supervised models. We close this gap by adapting existing unsupervised methods to capture topological information using graph convolutional networks. Furthermore, we show that we can exploit the mutual information between topological (dataset-level) and linguistic (sentence-level) information to design a new training paradigm for unsupervised relation extraction.
Defence : 07/05/2022
Jury members :
Alexandre Allauzen, Professeur des universités, Université Paris-Dauphine PSL, ESPCI [rapporteur]
Benoît Favre, Maître de conférences, Aix-Marseille Université [rapporteur]
Pascale Sébillot, Professeure des universités, IRISA, INSA Rennes
Xavier Tannier, Professeur des universités, Sorbonne Université
Benjamin Piwowarski, Chargé de recherche, CNRS, Sorbonne Université
Vincent Guigue, Maître de conférences, Sorbonne Université
2019-2022 Publications
-
2022
- E. Simon : “Apprentissage de réseaux profonds pour l’indexation conceptuelle de texte”, thesis, phd defence 07/05/2022, supervision Guigue, Vincent, co-supervision : Piwowarski, Benjamin (2022)
-
2019
- É. Simon, V. Guigue, B. Piwowarski : “Unsupervised Information Extraction: Regularizing Discriminative Approaches with Relation Distribution Losses”, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 1378-1387, (Association for Computational Linguistics) (2019)
- É. Simon, V. Guigue, B. Piwowarski : “Extraction d’information non supervisée avec des modèles discriminants”, CAp 2019 - 21e Conférence sur l'Apprentissage automatique, Toulouse, France (2019)