POUPART Yoann
Team : SMA
Arrival date : 09/01/2024
- Sorbonne Université - LIP6
Boîte courrier 169
Couloir 25-26, Étage 4, Bureau 416
4 place Jussieu
75252 PARIS CEDEX 05
FRANCE
Tel: +33 1 44 27 36 67, Yoann.Poupart (at) nulllip6.fr
https://perso.lip6.fr/Yoann.Poupart
Supervision : Nicolas MAUDET
Interpretability for Deep Multi-Agent Systems
Multi-agent systems (MAS) have been democratised in recent years thanks to the natural language interfacing made possible by large language models (LLM). While their ability to solve complex tasks is undeniable, the dynamics emerging from these systems can be hard to predict, and guarantees are needed. Jailbreak, adversariality, or power-seeking are concerning failure modes of MAS, and evaluating these capabilities remains a difficult problem. In this respect, interpretability could be one of the best tools to monitor and control several agents simultaneously and automatically. Indeed the models' internals convey the information used for its prediction and can be used symbolically for gaining understanding or control.