Staff directory

ROMERA Thomas

PhD Student at Sorbonne University
Team : ALSOC
https://scholar.google.com/citations?user=3a-ZCD8AAAAJ&hl=en

Supervision : Lionel LACASSAGNE
Co-supervision : MEUNIER Quentin

Algorithm-Architecture Adaptation for Optical Flow on Embedded GPUs

Over the past two decades, commercial cameras have seen major advances in image and video quality, mainly thanks to technological progress in various components such as optics, digital storage, image stabilization, circuitry and digital sensors. The most notable advances have been in digital light sensors. To further improve the quality of camera images, innovations in image processing and computer vision are needed.

One of the major algorithmic blocks in this field is the estimation of pixel movement in a video, also known as optical flow. This block adds temporal information between frames of a video sequence that can be used to stabilize, denoise, unblur or increase the resolution. Most optical flow estimation algorithms are very effective in terms of quality, but their high processing time limits their real-time implementation on embedded platforms.

The work carried out in this thesis focuses on the optimization and efficient implementation of optical flow estimation algorithms on embedded graphics processors. Two iterative algorithms have been studied: the TV-L1 estimation method and the Horn-Schunck estimation method. The primary objective of this work is to achieve real-time processing (less than 40 ms per frame) on low-power platforms, while maintaining acceptable image resolution and flow estimation quality for the intended applications.

Various optimization strategies have been explored. High-level algorithmic transformations, such as operator fusion and pipelining, have been implemented to maximize data reuse and enhance spatial and temporal locality. Additionally, GPU-specific low-level optimizations, including the utilization of vector instructions and numbers, as well as efficient memory access management, have been incorporated. The impact of floating-point number representation (single-precision versus half-precision) has also been investigated.

The implementations have been assessed in terms of execution time, power consumption, and optical flow accuracy. In addition to acceleration enabling real-time processing of near-4K resolution images on embedded platforms, the use of half-precision numbers delivers higher-quality results in the same amount of time compared to single-precision implementations.

This work has highlighted the importance of GPU-specific optimizations for computer vision algorithms, as well as the use of floating-point numbers with reduced precision. To the best of our knowledge, this work is a first concrete example of how reducing the precision of floating-point numbers can lead to higher-quality results.

Phd defence : 10/13/2023

Jury members :

David DEFOUR, Professeur, LAMPS, Université de Perpignan Via Domitia [Rapporteur]
Claude TADONKI, Chargé de Recherche, CRI, Mines ParisTech [Rapporteur]
Roselyne CHOTIN, Maître de Conférences, LIP6, Sorbonne Université
Olivier SENTIEYS, Professeur, IRISA, INRIA, Université de Rennes
Daniel ETIEMBLE, Professeur Émérite, LRI, Université Paris-Saclay
Patrice MENARD, Directeur Technique, LERITY-Alcen
Lionel LACASSAGNE, Professeur, LIP6, Sorbonne Université
Quentin MEUNIER, Maître de Conférences, LIP6, Sorbonne Université

Departure date : 12/31/2023

2018-2023 Publications

2023
- Th. Romera : “Adéquation algorithme architecture pour flot optique sur GPU embarqué”, thesis, phd defence 10/13/2023, supervision Lacassagne, Lionel, co-supervision : Meunier, Quentin (2023)
- Th. Romera, A. Petreto, F. Lemaitre, M. Bouyer, Quentin L. Meunier, L. Lacassagne, D. Etiemble : “Optical flow algorithms optimized for speed, energy and accuracy on embedded GPUs”, Journal of Real-Time Image Processing, vol. 20 (2), pp. 32, (Springer Verlag) (2023)
2021
- Th. Romera, A. Petreto, F. Lemaitre, M. Bouyer, Q. Meunier, L. Lacassagne : “Implementations Impact on Iterative Image Processing for Embedded GPU”, European Signal Processing Conference (EUSIPCO), Dublin, Ireland (2021)
2020
- A. Petreto, Th. Romera, F. Lemaitre, M. Bouyer, B. Gaillard, P. Menard, Q. Meunier, L. Lacassagne : “Real-time embedded video denoiser prototype”, 9^th International Symposium - Optronics in Defense and Security (Optro), Paris, France (2020)
2019
- Th. Romera, A. Brière, J. Denoulet : “Dynamically Reconfigurable RF-NoC with Distance-Aware Routing Algorithm”, 14^th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC 2019), York, United Kingdom (2019)
- A. Petreto, Th. Romera, F. Lemaitre, I. Masliah, B. Gaillard, M. Bouyer, Q. Meunier, L. Lacassagne : “Débruitage temps réel embarqué pour vidéos fortement bruitées”, COMPAS 2019, Anglet, France (2019)
- A. Petreto, Th. Romera, I. Masliah, B. Gaillard, M. Bouyer, Q. Meunier, L. Lacassagne, F. Lemaitre : “A New Real-Time Embedded Video Denoising Algorithm”, DASIP 2019 - The Conference on Design and Architectures for Signal and Image Processing, Montréal, Canada (2019)
2018
- A. Petreto, A. Hennequin, Th. Koehler, Th. Romera, Y. Fargeix, B. Gaillard, M. Bouyer, Q. Meunier, L. Lacassagne : “Energy and Execution Time Comparison of Optical Flow Algorithms on SIMD and GPU Architectures”, Conference on Design and Architectures for Signal and Image Processing (Dasip 2018), Porto, Portugal (2018)
- N. Rambaux, D. Galayko, G. Guignan, J. Vaubaillon, L. Lacassagne, Ph. Keckhut, A. Levasseur‑Regourd, A. Hauchecorne, M. Birlan, G. Augarde, S. Barnier, S. Ben Kemmoum, A. Bigot, P. Boisse, M. Capderou, A. Chu, F. Colas, F. DESHOURS, Y. Fargeix, A. Hennequin, Th. Koehler, M. Lumbroso, J.‑F. Mariscal, D. Portela‑Moreira, J. Raffard, J.‑L. Rault, Th. Romera, C. Tob, B. Zanda : “METEORIX: a cubesat mission dedicated to the detection of meteors”, COSPAR 2018, 42^nd Assembly, Pasadena, United States (2018)
- A. Petreto, A. Hennequin, Th. Koehler, Th. Romera, Y. Fargeix, B. Gaillard, M. Bouyer, Q. Meunier, L. Lacassagne : “Comparaison de la consommation énergétique et du temps d’exécution d’un algorithme de traitement d’images optimisé sur des architectures SIMD et GPU”, Conférence d’informatique en Parallélisme, Architecture et Système (COMPAS 2018), Toulouse, France (2018)
- A. Petreto, A. Hennequin, Th. Koehler, Th. Romera, Y. Fargeix, B. Gaillard, M. Bouyer, Q. Meunier, L. Lacassagne : “Comparaison de la consommation énergétique et du temps d’exécution d’un algorithme de traitement d’images optimisé sur des architectures SIMD et GPU”, GdR SOC2, Paris, France (2018)
- A. Brière, Th. Romera, J. Denoulet : “Modélisation et évaluation d’une architecture many-coeurs basée sur un réseau sur puce RF”, 13^e colloque du GDR SOC-SIP du CNRS, Paris, France (2018)