LIP6 2000/004
- Thesis
Partitionnement maximalement prédictif sous contrainte d'ordre total.
Applications aux séquences génétiques - L. Guéguen
- 175 pages - 01/18/2000- document en - http://www.lip6.fr/lip6/reports/2000/lip6.2000.004.ps.gz - 686 Ko
- Contact : gueguen (at) nullccr.jussieu.fr
- Ancien Thème : SYSDEF
- Keywords : classification, prediction, partition, sequences, genomes
- Publisher : Valerie.Mangin (at) nulllip6.fr
Maximal predictive partitionning with total order constraint is a classification method that seeks to part sequences of qualitative objects into homogeneous segments. The homogeneity is defined according to criteria that follow the notion of prediction. Given a specific problem, a finite set of predictors is introduced and for each predictor the prediction is an evaluation function on the objects of the sequence. The homogeneity of a segment is evaluated by the sum of the predictions on its elements by a same optimal predictor. The evaluation of a partition is the sum of the predictions of its segments. The aims of this approach is to get a `` good `` summary of the sequence, owing to the predictors of the segments of a `` good `` partition, and to reveal a possible structure of that sequence. The relevant number of segments in a good partition is also to be found out.
We introduce an algorithm that builds up the optimal predictive partitions in i classes of a given sequence, given a specified set of predictors, for all i between 1 and a given number. In the result of such a calculus, which is called a partitionning, we can analyse the successive sets of partitions according to their number of segments, and some estimation criteria of the `` good `` number of segments are given.
That algorithm has a time-complexity linear with the length of the sequence, the size of the set of predictors, and with the maximal number of segments. Hence, partitionning can be made on very big sequences, such as biological ones. We present some applications of maximal predictive partitionning on genetic sequences.
We introduce an algorithm that builds up the optimal predictive partitions in i classes of a given sequence, given a specified set of predictors, for all i between 1 and a given number. In the result of such a calculus, which is called a partitionning, we can analyse the successive sets of partitions according to their number of segments, and some estimation criteria of the `` good `` number of segments are given.
That algorithm has a time-complexity linear with the length of the sequence, the size of the set of predictors, and with the maximal number of segments. Hence, partitionning can be made on very big sequences, such as biological ones. We present some applications of maximal predictive partitionning on genetic sequences.