ROMERA Thomas

Supervision : Lionel LACASSAGNE

Co-supervision : MEUNIER Quentin

Algorithm-Architecture Adaptation for Optical Flow on Embedded GPUs

Over the past two decades, commercial cameras have seen major advances in image and video quality, mainly thanks to technological progress in various components such as optics, digital storage, image stabilization, circuitry and digital sensors. The most notable advances have been in digital light sensors. To further improve the quality of camera images, innovations in image processing and computer vision are needed.
One of the major algorithmic blocks in this field is the estimation of pixel movement in a video, also known as optical flow. This block adds temporal information between frames of a video sequence that can be used to stabilize, denoise, unblur or increase the resolution. Most optical flow estimation algorithms are very effective in terms of quality, but their high processing time limits their real-time implementation on embedded platforms.
The work carried out in this thesis focuses on the optimization and efficient implementation of optical flow estimation algorithms on embedded graphics processors. Two iterative algorithms have been studied: the TV-L1 estimation method and the Horn-Schunck estimation method. The primary objective of this work is to achieve real-time processing (less than 40 ms per frame) on low-power platforms, while maintaining acceptable image resolution and flow estimation quality for the intended applications.
Various optimization strategies have been explored. High-level algorithmic transformations, such as operator fusion and pipelining, have been implemented to maximize data reuse and enhance spatial and temporal locality. Additionally, GPU-specific low-level optimizations, including the utilization of vector instructions and numbers, as well as efficient memory access management, have been incorporated. The impact of floating-point number representation (single-precision versus half-precision) has also been investigated.
The implementations have been assessed in terms of execution time, power consumption, and optical flow accuracy. In addition to acceleration enabling real-time processing of near-4K resolution images on embedded platforms, the use of half-precision numbers delivers higher-quality results in the same amount of time compared to single-precision implementations.
This work has highlighted the importance of GPU-specific optimizations for computer vision algorithms, as well as the use of floating-point numbers with reduced precision. To the best of our knowledge, this work is a first concrete example of how reducing the precision of floating-point numbers can lead to higher-quality results.

Defence : 10/13/2023

Jury members :

David DEFOUR, Professeur, LAMPS, Université de Perpignan Via Domitia [Rapporteur]
Claude TADONKI, Chargé de Recherche, CRI, Mines ParisTech [Rapporteur]
Roselyne CHOTIN, Maître de Conférences, LIP6, Sorbonne Université
Olivier SENTIEYS, Professeur, IRISA, INRIA, Université de Rennes
Daniel ETIEMBLE, Professeur Émérite, LRI, Université Paris-Saclay
Patrice MENARD, Directeur Technique, LERITY-Alcen
Lionel LACASSAGNE, Professeur, LIP6, Sorbonne Université
Quentin MEUNIER, Maître de Conférences, LIP6, Sorbonne Université

Departure date : 12/31/2023

2018-2023 Publications