SAULPIC David
Supervision : Christoph DÜRR
Co-supervision : COHEN-ADDAD Vincent
Approximation Algorithms and Sketches for Clustering
This thesis presents contributions to the theoretical study of clustering problems.
The broad objective of these problems is to partition a data set into groups, such that data in the same group are similar. The k-medians and k-means problems are common ways of formally modeling this problem, which we focus on in this thesis.
In the first part of the thesis, we show the first linear time approximation scheme when the input is in a constant dimensional Euclidean space (or, more generally, doubling space), i.e., a very fast algorithm that computes a very good approximation of the optimal solution. We extend the techniques used to treat the problem from the point of view of differential privacy.
In the second part, we aim at designing algorithm to compute simplified representations of the input, which preserve the structure of the problem: we introduce several techniques to reduce the number of input data, while ensuring that solving the problem after the reduction is almost equivalent to solving it on the original set. We also show that in several cases, our techniques are optimal.
In the particular case of Euclidean spaces, a different way to simplify the input is to reduce the dimension (preserving the structure of the input in the same way). We present the first deterministic algorithm to achieve near-optimal dimension. Finally, we show how to use those compression schemes in order to answer fast statistical queries.
Defence : 09/13/2022
Jury members :
Robert Krauthgamer, Weizmann Institute [Rapporteur]
Laurent Viennot, INRIA [Rapporteur]
Cristina Bazgan, Université Paris Dauphine
Monika Henzinger, Universität Wien
Chris Schwiegelshohn, Aarhus University
Vincent Cohen-Addad, Google
Christoph Dürr, Sorbonne Université
2019-2022 Publications
-
2022
- D. Saulpic : “Algorithmes d’Approximation et Sketches Pour les Problèmes de Clustering”, thesis, phd defence 09/13/2022, supervision Dürr, Christoph, co-supervision : Cohen-addad, Vincent (2022)
- Th. Ehrhard, S. Attias, E. Bampis, V. Cohen‑Addad, B. Escoffier, C. Mathieu, F. Pascual, A. Pass‑Lanneau, D. Saulpic : “Découpage électoral des circonscriptions législatives en France: déséquilibres démographiques et contraintes territoriales”, Revue Française de Science Politique, vol. Vol. 72 (3), pp. 333-364, (Presses de Sciences Po) (2022)
- V. Cohen‑Addad, A. Epasto, S. Lattanzi, V. Mirrokni, A. Munoz Medina, D. Saulpic, Ch. Schwiegelshohn, S. Vassilvitskii : “Scalable Differentially Private Clustering via Hierarchically Separated Trees”, KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington DC, United States, pp. 221-230, (ACM), (ISBN: 978-1-4503-9385-0) (2022)
- V. Cohen‑Addad, F. Mallmann‑Trenn, D. Saulpic : “A Massively Parallel Modularity-Maximizing Algorithm with Provable Guarantees”, PODC'22: Proceedings of the 2022 ACM Symposium on Principles of Distributed Computing, Salerno, Italy, pp. 356-365, (ACM) (2022)
- V. Cohen‑Addad, F. Mallmann‑Trenn, D. Saulpic : “Community Recovery in the Degree-Heterogeneous Stochastic Block Model”, Proceedings of Machine Learning Research, vol. 178, Londres, United Kingdom, pp. 1662-1692, (PMLR) (2022)
- V. Cohen‑Addad, K. Larsen, D. Saulpic, Ch. Schwiegelshohn : “Towards Optimal Lower Bounds for k-median and k-means Coresets”, Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2022, Rome, Italy, pp. 1038-1051, (Association for Computing Machinery), (ISBN: 9781450392648) (2022)
- V. Cohen‑Addad, A. Gupta, L. Hu, H. Oh, D. Saulpic : “An Improved Local Search Algorithm for
k -Median”, ACM-SIAM Symposium on Discrete Algorithms (SODA22), Alexandria (virtual event), VA, United States, pp. 1556-1612, (Society for Industrial and Applied Mathematics), (ISBN: 978-1-61197-707-3) (2022)
-
2021
- V. Cohen‑Addad, A. Feldmann, D. Saulpic : “Near-linear Time Approximation Schemes for Clustering in Doubling Metrics”, Journal of the ACM (JACM), vol. 68 (6), pp. 1-34, (Association for Computing Machinery) (2021)
- A. Feldmann, D. Saulpic : “Polynomial time approximation schemes for clustering in low highway dimension graphs”, Journal of Computer and System Sciences, vol. 122, pp. 72-93, (Elsevier) (2021)
- V. Cohen‑Addad, D. Saulpic, Ch. Schwiegelshohn : “A new coreset framework for clustering”, STOC 2021: Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, Rome ( Virtual ), Italy, pp. 169-182, (ACM) (2021)
- V. Cohen‑Addad, D. Saulpic, Ch. Schwiegelshohn : “Improved Coresets and Sublinear Algorithms for Power Means in Euclidean Spaces”, Advances in Neural Information Processing Systems, vol. 34, Virtual, France, pp. 21085-21098, (Curran Associates, Inc.), (ISBN: 9781713845393) (2021)
-
2020
- A. Feldmann, D. Saulpic : “Polynomial Time Approximation Schemes for Clustering in Low Highway Dimension Graphs”, 28th Annual European Symposium on Algorithms (ESA 2020), vol. 173, Leibniz International Proceedings in Informatics (LIPIcs), Pisa, Italy, pp. 46:1-46:22, (Schloss Dagstuhl--Leibniz-Zentrum für Informatik) (2020)
-
2019
- V. Cohen‑Addad, N. Hjuler, N. Parotsidis, D. Saulpic, Ch. Schwiegelshohn : “Fully Dynamic Consistent Facility Location”, NeurIPS'19 - 33rd Conference on Neural Information Processing Systems, Vancouver, United States (2019)
- V. Cohen‑Addad, A. Feldmann, D. Saulpic : “Near-linear time approximations schemes for clustering in doubling metrics”, FOCS'19, Baltimore, United States (2019)