Data-driven penalty calibration: a case study for Gaussian mixture model selection

Abstract

In the companion paper [C. Maugis and B. Michel, A non asymptotic penalized criterion for Gaussian mixture model selection. ESAIM: P&S 15 (2011) 41–68] , a penalized likelihood criterion is proposed to select a Gaussian mixture model among a specific model collection. This criterion depends on unknown constants which have to be calibrated in practical situations. A “slope heuristics” method is described and experimented to deal with this practical problem. In a model-based clustering context, the specific form of the considered Gaussian mixtures allows us to detect the noisy variables in order to improve the data clustering and its interpretation. The behavior of our data-driven criterion is highlighted on simulated datasets, a curve clustering example and a genomics application.

Publication
ESAIM: Probability and Statistics, 15, pp. 320–339
Bertrand MICHEL
Professor