Mixture models have become a favourite method for modelling data that are not easily described as coming from a simple homogeneous population. Analysis of such data refers to components of the mixture as clusters and expends much effort in identifying the number and nature of these clusters. This usually proceeds in a conditional framework by firstly identifying the likely number of clusters and then estimating the individual component parameters. From the point of view of cluster analysis, we are also very interested in partitioning the data set into the component clusters.Using a Bayesian approach,implemented via MCMC, we show how the distribution of the latent allocation variables that indicate cluster membership can help to illuminate the whole process of inference for such models and avoid the pitfalls to which the conditional approach is subject. However understanding the nature of the allocation distribution is challenging.
17
Oct 08