LIP6 1998/016
- Thesis
Combinaison de Classifieurs Statistiques, Application à la Prédiction de la Structure Secondaire des Protéines - Y. Guermeur
- 164 pages - 04/29/1998- document en - http://www.lip6.fr/lip6/reports/1998/lip6.1998.016.ps.tar.gz - 503 Ko
- Contact : Yann.Guermeur (at) nulllip6.fr
- Ancien Thème : APA
- Keywords : Ensemble methods, complexity control, VC dimension, discrimination, Bayes error rate estimation, hierarchical models, protein secondary structure prediction, stacked regression, hybrid systems
- Publisher : Valerie.Mangin (at) nulllip6.fr
Model combination has recently been at the origine of significant improvements in the field of statistical learning, both for regression and pattern recognition tasks. However, fundamental questions have remained virtually untackled. Few criteria have thus been developed to motivate the choice of a specific method, whereas no independent result has been derived in the field of discrimination.
This dissertation deals with one of the most commonly used combination techniques: linear regression. We first characterize the regularizing effect of the "stacked regression" method introduced by Breiman. We then study the application of the multivariate linear regression model to the combination of discriminant experts the outputs of which are estimates of the class posterior probabilities. This question is successively considered from the point of view of optimization and complexity control. The latter point involves the computation of generalized Vapnik-Chervonenkis dimensions.
The study is followed up with the description of a non parametric method for Bayes' error rate estimation.
Our ensemble method is assessed on an open biological sequence processing problem: the problem of globular protein secondary structure prediction. To perform this discrimination task, we introduce a hierarchical and modular approach in which combination is used at an intermediate level.
This dissertation deals with one of the most commonly used combination techniques: linear regression. We first characterize the regularizing effect of the "stacked regression" method introduced by Breiman. We then study the application of the multivariate linear regression model to the combination of discriminant experts the outputs of which are estimates of the class posterior probabilities. This question is successively considered from the point of view of optimization and complexity control. The latter point involves the computation of generalized Vapnik-Chervonenkis dimensions.
The study is followed up with the description of a non parametric method for Bayes' error rate estimation.
Our ensemble method is assessed on an open biological sequence processing problem: the problem of globular protein secondary structure prediction. To perform this discrimination task, we introduce a hierarchical and modular approach in which combination is used at an intermediate level.