GOH Hanlin
Supervision : Matthieu CORD
Co-supervision : LIM Joo-Hwee
Learning Deep Visual Representations
Recent advancements in the areas of deep learning and visual information processing have presented an opportunity to unite both fields. These complementary fields combine to tackle the problem of classifying images into their semantic categories. Deep learning brings learning and representational capabilities to a visual processing model that is adapted for image classification. This thesis addresses problems that lead to the proposal of learning deep visual representations for image classification.
The problem of deep learning is tackled on two fronts. The first aspect is the problem of unsupervised learning of latent representations from input data. The main focus is the integration of prior knowledge into the learning of restricted Boltzmann machines (RBM) through regularization. Regularizers are proposed to induce sparsity, selectivity and topographic organization in the coding to improve discrimination and invariance. The second direction introduces the notion of gradually transiting from unsupervised layer-wise learning to supervised deep learning. This is done through the integration of bottom-up information with top-down signals. Two novel implementations supporting this notion are explored. The first method uses top-down regularization to train a deep network of RBMs. The second method combines predictive and reconstructive loss functions to optimize a stack of encoder-decoder networks.
The proposed deep learning techniques are applied to tackle the image classification problem. The bag-of-words model is adopted due to its strengths in image modeling through the use of local image descriptors and spatial pooling schemes. Deep learning with spatial aggregation is used to learn a hierarchical visual dictionary for encoding the image descriptors into mid-level representations. This method achieves leading image classification performances for object and scene images. The learned dictionaries are diverse and non-redundant. The speed of inference is also high. From this, a further optimization is performed for the subsequent pooling step. This is done by introducing a differentiable pooling parameterization and applying the error backpropagation algorithm.
This thesis represents one of the first attempts to synthesize deep learning and the bag-of-words model. This union results in many challenging research problems, leaving much room for further study in this area.
Defence : 07/12/2013
Jury members :
Frédéric Jurie - Université de Caen Basse-Normandie [rapporteur]
Alain Rakotomamonjy - Université de Rouen [rapporteur]
Yann LeCun - New York University, USA
Patrick Gallinari - Université Pierre et Marie Curie
Joo-Hwee Lim - Institute for Infocomm Research, Singapore
Matthieu Cord - Université Pierre et Marie Curie
Nicolas Thome - Université Pierre et Marie Curie
2010-2014 Publications
-
2014
- H. Goh, N. Thome, M. Cord, J.‑H. Lim : “Learning Deep Hierarchical Visual Feature Coding”, IEEE Transactions on Neural Networks and Learning Systems, vol. 25 (12), pp. 2212-2225, (IEEE) (2014)
-
2013
- H. Goh : “Learning Deep Visual Representations”, thesis, phd defence 07/12/2013, supervision Cord, Matthieu, co-supervision : Lim, Joo-Hwee (2013)
- H. Goh, N. Thome, M. Cord, J.‑H. Lim : “Top-Down Regularization of Deep Belief Networks”, Advances in Neural Information Processing Systems 26, Lake Tahoe, United States, pp. 1878-1886 (2013)
-
2012
- H. Goh, N. Thome, M. Cord, J.‑H. Lim : “Unsupervised and supervised visual codes with restricted Boltzmann machines”, 12th European conference on Computer Vision, vol. 7576, Lecture Notes in Computer Science, Florence, Italy, pp. 298-311 (2012)
-
2011
- H. Goh, K. Łukasz, J.‑H. Lim, N. Thome, M. Cord : “Learning Invariant Color Features with Sparse Topographic Restricted Boltzmann Machines”, ICIP 2011 - IEEE International Conference on Image Processing, Brussels, Belgium, pp. 1241-1244 (2011)
-
2010
- H. Goh, N. Thome, M. Cord : “Biasing Restricted Boltzmann Machines to Manipulate Latent Selectivity and Sparsity”, NIPS 2010 Workshop on Deep Learning and Unsupervised Feature Learning, Vancouver, Canada (2010)