PhD topic: Efficient management of memory and storage for CRDTs

PhD Student: Saalik Hatia
Advisor: Marc Shapiro (Sorbonne-Université & Inria)


The main output of this PhD will be the design and implementation of a highly-available geo-replicated industry-grade file system, by combining the best features of Antidote and RingFS. Antidote is a CRDT-based database, designed for geo-replication and high availability. Scality's RingFS is a battle-proven, high-throughput, failure-resilient file system. Managing memory and storage is one of the main challenges. Currently, Antidote stores its updates in an infinite journal. In this project, we require instead to bound the size of the journal, and to introduce persistent storage of checkpoints and of file system state; we also require to avoid redundant copying for performance reasons. In order to maintain correctness, we will identify the invariants of each individual component, and the global invariants that link them together. For instance, a crucial invariant is that every version that may be needed by the file system must persist in the storage layers. We will then write pseudocode of the system, and apply verification tools to guarantee that the invariants hold. This will be combined with a practical implementation that conforms to the pseudocode, which we will validate experimentally. This is a particular instance of a more general problem, which we plan to address: the efficient management of memory and storage for CRDTs in general. CRDTs are more challenging than classical objects, because an update is not restricted to an assignment, but can be an arbitrarly complex operation; concurrent updates are allowed and must be merged; this requires managing multiple versions that are not totally ordered. Managing this complexity while avoiding redundant copies is especially challenging.
Marc.Shapiro =at=