PANTIN Jérémie

PhD student at Sorbonne University
Team : LFI
https://webia.lip6.fr/~pantin/
https://webia.lip6.fr/~pantin/

Supervision : Christophe MARSALA

Detection and semantic characterisation of textual outliers

Outlier detection is a recurring problem in machine learning, involving the identification of data points significantly different from the rest of the dataset. In this context, we focus on identifying such outliers with textual data, which faces several challenges, including the formalisation and definition of textual outliers. There exists a distinct difference between syntactic and semantic outliers. To address this ambiguity, we propose a new taxonomy for identifying these outliers.
Within this framework, we identify various types of outliers and associated levels of difficulty, and we introduce a novel method to study them. With this method, it becomes possible to leverage a vast array of datasets, highlighting the strengths and weaknesses of anomaly detection and outlier detection approaches. Outlier detection can be performed using ensemble methods, where multiple text representations can be simultaneously employed with various detection techniques, enhancing efficiency and robustness against challenging outliers.
We introduce a novel approach that leverages robust learning and ensemble learning. We connect this work with XAI and data representation studies. Lastly, we present an application of our work in the domain of unsupervised abstractive summarization. In this scenario, outlier analysis aids in filtering out non-relevant sentences, resulting in an improvement in the quality of the summary.

Defence : 09/11/2023

Jury members :

LAURENT Anne (Université de Montpellier) [Rapporteur]
SMITS Gregory (IMT Atlantique) [Rapporteur]
AMANN Bernd (Sorbonne Université)
MARSALA Christophe (Sorbonne Université)

Departure date : 09/30/2024

2022-2024 Publications