BORDES Patrick

PhD student at Sorbonne University
Team : MLIA
https://lip6.fr/Patrick.Bordes

Supervision : Patrick GALLINARI

Co-supervision : PIWOWARSKI Benjamin

Deep Multimodal Learning for Joint Textual and Visual Reasoning

In the last decade, the evolution of Deep Learning techniques to learn meaningful data representations for text and images, combined with an important increase of multimodal data, mainly from social network and e-commerce websites, has triggered a growing interest in the research community about the joint understanding of language and vision. The challenge at the heart of Multimodal Machine Learning is the intrinsic difference in semantics between language and vision: while vision faithfully represents reality and conveys low-level semantics, language is a human construction carrying high-level reasoning.
One the one hand, language can enhance the performance of vision models. The underlying hypothesis is that textual representations contain visual information. We apply this principle to two Zero-Shot Learning tasks. In the first contribution on ZSL, we extend a common assumption in ZSL, which states that textual representations encode information about the visual appearance of objects, by showing that they also encode information about their visual surroundings and their real-world frequence. In a second contribution, we consider the transductive setting in ZSL. We propose a solution to the limitations of current transductive approaches, that assume that the visual space is well-clustered, which does not hold true when the number of unknown classes is high.
On the other hand, vision can expand the capacities of language models. We demonstrate it by tackling Visual Question Generation (VQG), which extends the standard Question Generation task by using an image as complementary input, by using visual representations derived from Computer Vision.

Defence : 11/26/2020

Jury members :

Mr Yannis Avrithis (INRIA Rennes-Bretagne Atlantique) [Rapporteur]
Mr Loic Barrault (University of Sheffield) [Rapporteur]
Mr Patrick Gallinari (LIP6, MLIA)
Mr Benjamin Piwowarski (LIP6, MLIA, CNRS)
Mrs Diane Bouchacourt (FAIR)
Mme Catherine Pelachaud (ISIR)

Departure date : 11/26/2020

2017-2020 Publications