VERGER Mélina
Team : MOCAH
Arrival date : 11/01/2021
- Sorbonne Université - LIP6
Boîte courrier 169
Couloir 26-00, Étage 5, Bureau 525
4 place Jussieu
75252 PARIS CEDEX 05
FRANCE
Tel: +33 1 44 27 84 23, Melina.Verger (at) nulllip6.fr
https://melinaverger.github.io/
Supervision : Vanda LUENGO
Co-supervision : BOUCHET François LALLÉ Sébastien
Algorithmic fairness analyses of supervised machine learning in education
This thesis aims to evaluate and reduce algorithmic unfairness of machine learning models widely used in education. Such predictive models, based on increasing amounts of educational data and learning traces, are intended to improve the human learning experience. They can be used, for example, to predict dropout or to personalize the learning experience by adapting educational content to meet the needs of each learner. However, it has been shown repeatedly that these models can produce biased and discriminatory predictions, most often by producing consistently worse predictions for Black people compared to White people, and for women compared to men. It has therefore become crucial to evaluate the fairness of predictive models' results, according to the different groups present in the data. State-of-the-art research has focused on comparing the predictive performances of models between groups. For example, for a binary classifier and for male/female groups, the rate of correct predictions is calculated for each group, and the difference between these rates would indicate unfairness. Although this approach is predominant in the literature, it only captures unfairness in terms of predictive performance, while unfairness can manifest in other ways and in more nuanced forms than simple score differences, which needs further exploration. The main objective of this thesis is thus to deepen the understanding and evaluation of algorithmic unfairness, and to then identify its potential presence in under-studied contexts. These contexts include both sensitive attributes and learner populations that have been little or not considered at all. To this end, we designed a new algorithmic fairness metric, in short MADD, which is based on the distributions of the results of supervised learning models. This distribution-based approach additionally allows for graphical analyses to better understand the unfairness quantified by MADD. We have demonstrated, both theoretically and experimentally, the validity of this metric and discovered that potential unfairness observed in the data are not always reflected in the model outcomes, as was the case with gender bias in our experiments. Moreover, we developed a technique to mitigate unfairness using MADD, along with new methods to evaluate fairness with multiple sensitive attributes simultaneously. Indeed, the literature typically considers each attribute separately, while Crenshaw’s (1989, 1991) theory of intersectionality argues that combined influences produce unique and different forms of discrimination for certain groups. Our experimental results show that some combinations of attributes increase, reduce, or maintain the level of unfairness initially observed. Finally, we conducted fairness analyses for new sensitive attributes, whether demographic or related to the learning context, and with new learner populations from African countries, the Philippines, Haiti, and France, thanks to data collected from a MOOC (massive open online course) and the Canvas LMS platform. These experiments revealed unfairness that had not been previously discovered, thus shedding light on potential real unfairness in these contexts. To facilitate replication of our work and the application of our methods in other contexts, we created an open-source Python library, named maddlib. The data (except for those from the Philippines) and our documented source code are also available online.
2022-2024 Publications
-
2024
- V. Švábenský, M. Verger, Maria Mercedes T. Rodrigo, Clarence James G. Monterozo, R. Baker, M. Saavedra, S. Lallé, A. Shimada : “Evaluating Algorithmic Bias in Models for Predicting Academic Performance of Filipino Students”, Proceedings of the 17th International Conference on Educational Data Mining (EDM 2024), Atlanta, GA, United States (2024)
- M. Verger, Ch. Fan, S. Lallé, F. Bouchet, V. Luengo : “A Comprehensive Study on Evaluating and Mitigating Algorithmic Unfairness with the MADD Metric”, Journal of Educational Data Mining, vol. 16 (1), pp. 365–409, (International Educational Data Mining Society) (2024)
- S. Lallé, F. Bouchet, M. Verger, V. Luengo : “Fairness of MOOC Completion Predictions Across Demographics and Contextual Variables”, Proceedings of the 25th International Conference on Artificial Intelligence in Education, vol. 14829, Lecture Notes in Computer Science, Recife, Brazil, pp. 379-393, (Springer Nature Switzerland) (2024)
-
2023
- M. Verger, Ch. Fan, S. Lallé, F. Bouchet, V. Luengo : “A Fair Post-Processing Method based on the MADD Metric for Predictive Student Models”, 1st International Tutorial and Workshop on Responsible Knowledge Discovery in Education (RKDE 2023) at ECML PKDD 2023, Turino, Italy (2023)
- M. Verger, S. Lallé, F. Bouchet, V. Luengo : “Is Your Model "MADD"? A Novel Metric to Evaluate Algorithmic Fairness for Predictive Student Models”, Proceedings of the 16th International Conference on Educational Data Mining, Bengaluru, India, (ISBN: 978-1-7336736-4-8) (2023)
- M. Verger, F. Bouchet, S. Lallé, V. Luengo : “Caractérisation et mesure des discriminations algorithmiques dans la prédiction de la réussite à des cours en ligne”, EIAH2023 : 11e Conférence sur les Environnements Informatiques pour l'Apprentissage Humain, Brest, France (2023)
-
2022
- M. Verger : “Investiguer la notion d’équité algorithmique dans les environnements informatiques pour l’apprentissage humain”, Actes des neuvièmes rencontres jeunes chercheur·e·s en EIAH, Lille, France, pp. 44-51 (2022)