GISSELBRECHT Thibault

PhD student at Sorbonne University
Team : MLIA
https://lip6.fr/Thibault.Gisselbrecht

Supervision : Patrick GALLINARI

Co-supervision : LAMPRIER Sylvain

Bandit algorithms for real time data capture on social media

In this thesis, we study the problem of real time data capture on social media. Due to the different limitations imposed by those media, but also to the very large amount of information, it is not possible to collect all the data produced by social networks such as Twitter. Therefore, to be able to gather enough relevant information related to a predefined need, it is necessary to focus on a subset of the information sources. In this work, we focus on user-centered data capture and consider each account of a social network as a source that can be listened to at each iteration of a data capture process, in order to collect the corresponding produced contents. This process, whose aim is to maximize the quality of the information gathered, is constrained at each time step by the number of users that can be monitored simultaneously. The problem of selecting a subset of accounts to listen to over time is a sequential decision problem under constraints, which we formalize as a bandit problem with multiple selections. Therefore, we propose several bandit models to identify the most relevant users in real time. First, we study of the case of the so-called stochastic bandit, in which each user corresponds to a stationary distribution. Then, we introduce two contextual bandit models, one stationary and the other non stationary, in which the utility of each user can be estimated more efficiently by assuming some underlying structure in the reward space. In particular, the first approach introduces the notion of profile, which corresponds to the average behavior of each user. On the other hand, the second approach takes into account the activity of a user at a given instant in order to predict his future behavior. Finally, we are interested in models that are able to take into account complex temporal dependencies between users, with the use of a latent space within which the information transits from one iteration to the other. Moreover, each of the proposed approaches is validated on both artificial and real datasets.

Defence : 03/24/2017

Jury members :

M. Philippe Preux - Université de Lille 3 [Rapporteur]
M. Liva Ralaivola - Laboratoire d'Informatique de Marseille [Rapporteur]
Mme Michèle Sebag - CNRS
M. Olivier Sigaud - Université Pierre et Marie Curie
M. Sylvain Lamprier - Université Pierre et Marie Curie
M. Patrick Gallinari - Université Pierre et Marie Curie

Departure date : 03/30/2017

2015-2019 Publications