GROLLEMUND Vincent
Supervision : Jean-François PRADAT-PEYRE
Data mining and modeling of poorly structured or unstructured data
Supervised learning models are usually trained on data with limited constraints. Unfortunately, data are generally scarce, incomplete and biased in real-world use cases, which hampers efficient model design. Such data can and should still be leveraged to discover relevant patterns, glean insight and develop meaningful conclusions. In this thesis, we investigate an unsupervised learning approach to isolate minority samples encompassed within a larger population. Our review includes two different use cases: Amyotrophic Lateral Sclerosis prognosis and identification of potential innovation funding recipients. Despite differences in their purpose, these contexts face similar issues: poor data availability of partial and unrepresentative samples. In both cases, the aim is to detect samples from a minority population: patients with a poorer 1-year prognosis and companies that are more likely to be successful funding applicants. Data are projected into a lower-dimensional space using Uniform Manifold Approximation and Projection (UMAP), a nonlinear dimension reduction technique. Differences in data distributions are harnessed and used to isolate the target minority population, using Density Based Clustering of Applications with Noise (DBSCAN) and alpha shapes. Correlations between input and target variables become visible within the projection space and minority samples are isolated from the remaining data. As a result, in spite of poor data quality, we provide additional insight with regard to recently diagnosed patients and potential funding applicants.
Defence : 06/25/2021
Jury members :
Mme Hélène BLASCO, PU-PH, Université de Tours [rapporteur]
M Patrice BERTAIL, PU, Université de Nanterre [rapporteur]
Mme Emmanuelle ENCRENAZ, MCF-HDR, Sorbonne Université
M François DELBOT, MCF, Sorbonne Université
M Pierre-François PRADAT, PU-PH, Sorbonne Université
M Gaétan LE CHAT, Dr, FRS Consulting
M Jean-François PRADAT-PEYRE, Sorbonne Université
2019-2021 Publications
-
2021
- V. Grollemund : “Exploration et modélisation de données peu ou pas structurées”, thesis, phd defence 06/25/2021, supervision Pradat-peyre, Jean-François (2021)
- V. Grollemund, G. Le Chat, M.‑S. Secchi‑Buhour, F. Delbot, J.‑F. Pradat‑Peyre, P. Bede, P.‑F. Pradat : “Manifold learning for amyotrophic lateral sclerosis functional loss assessment”, Journal of Neurology, vol. 268 (3), pp. 825-850, (Springer Verlag) (2021)
-
2020
- V. Grollemund, G. Chat, M.‑S. Secchi‑Buhour, F. Delbot, J.‑F. Pradat‑Peyre, P. Bede, P.‑F. Pradat : “Development and validation of a 1-year survival prognosis estimation model for Amyotrophic Lateral Sclerosis using manifold learning algorithm UMAP”, Scientific Reports, vol. 10 (1), pp. 13378, (Nature Publishing Group) (2020)
- V. Grollemund, G. Chat, J.‑F. Pradat‑Peyre, F. Delbot : “Manifold Learning for Innovation Funding: Identification of Potential Funding Recipients”, Artificial Intelligence Applications and Innovations, vol. 583, IFIP Advances in Information and Communication Technology, Neos Marmaras, Greece, pp. 119-127, (Springer International Publishing) (2020)
-
2019
- V. Grollemund, P.‑F. Pradat, G. Querin, F. Delbot, G. Le Chat, J.‑F. Pradat‑Peyre, P. Bede : “Machine Learning in Amyotrophic Lateral Sclerosis: Achievements, Pitfalls, and Future Directions”, Frontiers in Neuroscience, vol. 13, pp. 135, (Frontiers) (2019)