FUGUET TORTOLERO Cesar

PhD student at Sorbonne University
Team : ALSOC
https://lip6.fr/Cesar.Fuguet-Tortolero

Supervision : Alain GREINER

Introduction of fault-tolerance mechanisms for permanent failures in coherent shared-memory many-core architectures

The always increasing performance demands of applications such as cryptography, scientific simulation, network packets dispatching, signal processing or even general-purpose computing has made of many-core architectures a necessary trend in the processor design. These architectures can have hundreds or thousands of processor cores, so as to provide important computational throughputs with a reasonable power consumption.
However, their important transistor density makes many-core architectures more prone to hardware failures. There is an augmentation in the fabrication process variability, and in the stress factors of transistors, which impacts both the manufacturing yield and lifetime. A potential solution to this problem is the introduction of fault-tolerance mechanisms allowing the processor to function in a degraded mode despite the presence of defective internal components.
We propose a complete in-the-field reconfiguration-based permanent failure recovery mechanism for shared-memory many-core processors. This mechanism is based on a firmware (stored in distributed on-chip read-only memories) executed at each hardware reset by the internal processor cores without any external intervention. It consists in distributed software procedures, which locate the faulty components (cores, memory banks, and network-on-chip routers), reconfigure the hardware architecture, and provide a description of the functional hardware infrastructure to the operating system.
Our proposal is evaluated using a cycle-accurate SystemC virtual prototype of an existing many-core architecture. We evaluate both its latency, and its silicon cost.

Defence : 11/25/2015

Jury members :

M. Philippe Coussy, Lab-STICC, Univ Bretagne-Sud [Rapporteur]
M. Gilles Sassatelli, LIRMM, Univ Montpellier 2 [Rapporteur]
M. Fabien Clermidy, CEA
Mme. Agnès Fritsch, Thales Communications & Security
M. Lionel Lacassagne, LIP6, Univ Paris 6
M. Alain Greiner, LIP6, Univ Paris 6

Departure date : 11/25/2015

2014-2021 Publications