Expectation Maximization
The Expectation Maximization(EM) algorithm estimates the parameters of the multivariate probability density function in the form of a Gaussian mixture distribution with a specified number of mixtures.
Consider the set of the N feature vectors
{ } from a ddimensional Euclidean space drawn from a Gaussian mixture:
where
is the number of mixtures,
is the normal distribution
density with the mean
and covariance matrix
,
is the weight of the kth mixture. Given the number of mixtures
and the samples
,
the algorithm finds the
maximumlikelihood estimates (MLE) of all the mixture parameters,
that is,
,
and
:
The EM algorithm is an iterative procedure. Each iteration includes
two steps. At the first step (Expectation step or Estep), you find a
probability
(denoted
in the formula below) of
sample i to belong to mixture k using the currently
available mixture parameter estimates:
At the second step (Maximization step or Mstep), the mixture parameter estimates are refined using the computed probabilities:
Alternatively, the algorithm may start with the Mstep when the initial values for
can be provided. Another alternative when
are unknown is to use a simpler clustering algorithm to precluster the input samples and thus obtain initial
. Often (including machine learning) the
kmeans algorithm is used for that purpose.
One of the main problems of the EM algorithm is a large number
of parameters to estimate. The majority of the parameters reside in
covariance matrices, which are
elements each
where
is the feature space dimensionality. However, in
many practical problems, the covariance matrices are close to diagonal
or even to
, where
is an identity matrix and
is a mixturedependent “scale” parameter. So, a robust computation
scheme could start with harder constraints on the covariance
matrices and then use the estimated parameters as an input for a less
constrained optimization problem (often a diagonal covariance matrix is
already a good enough approximation).
References:
 Bilmes98 J. A. Bilmes. A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models. Technical Report TR97021, International Computer Science Institute and Computer Science Division, University of California at Berkeley, April 1998.
EM

class EM : public StatModel
The class implements the EM algorithm as described in the beginning of this section.
EM::Params

class EM::Params
The class describes EM training parameters.
EM::Params::Params
The constructor

C++: EM::Params::Params(int nclusters=DEFAULT_NCLUSTERS, int covMatType=EM::COV_MAT_DIAGONAL, const TermCriteria& termCrit=TermCriteria(TermCriteria::COUNT+TermCriteria::EPS, EM::DEFAULT_MAX_ITERS, 1e6))

EM::create
Creates empty EM model

C++: Ptr<EM> EM::create(const Params& params=Params())

The model should be trained then using StatModel::train(traindata, flags) method. Alternatively, you can use one of the EM::train* methods or load it from file using StatModel::load<EM>(filename).
EM::train
Static methods that estimate the Gaussian mixture parameters from a samples set

C++: Ptr<EM> EM::train(InputArray samples, OutputArray logLikelihoods=noArray(), OutputArray labels=noArray(), OutputArray probs=noArray(), const Params& params=Params())

C++: bool EM::train_startWithE(InputArray samples, InputArray means0, InputArray covs0=noArray(), InputArray weights0=noArray(), OutputArray logLikelihoods=noArray(), OutputArray labels=noArray(), OutputArray probs=noArray(), const Params& params=Params())

C++: bool EM::train_startWithM(InputArray samples, InputArray probs0, OutputArray logLikelihoods=noArray(), OutputArray labels=noArray(), OutputArray probs=noArray(), const Params& params=Params())
Parameters: 
 samples – Samples from which the Gaussian mixture model will be estimated. It should be a onechannel matrix, each row of which is a sample. If the matrix does not have CV_64F type it will be converted to the inner matrix of such type for the further computing.
 means0 – Initial means of mixture components. It is a onechannel matrix of size. If the matrix does not have CV_64F type it will be converted to the inner matrix of such type for the further computing.
 covs0 – The vector of initial covariance matrices of mixture components. Each of covariance matrices is a onechannel matrix of size. If the matrices do not have CV_64F type they will be converted to the inner matrices of such type for the further computing.
 weights0 – Initial weights of mixture components. It should be a onechannel floatingpoint matrix with or size.
 probs0 – Initial probabilities of sample to belong to mixture component . It is a onechannel floatingpoint matrix of size.
 logLikelihoods – The optional output matrix that contains a likelihood logarithm value for each sample. It has size and CV_64FC1 type.
 labels – The optional output “class label” for each sample: (indices of the most probable mixture component for each sample). It has size and CV_32SC1 type.
 probs – The optional output matrix that contains posterior probabilities of each Gaussian mixture component given the each sample. It has size and CV_64FC1 type.
 params – The Gaussian mixture params, see EM::Params description above.

Three versions of training method differ in the initialization of Gaussian mixture model parameters and start step:
 train  Starts with Expectation step. Initial values of the model parameters will be estimated by the kmeans algorithm.
 trainE  Starts with Expectation step. You need to provide initial means of mixture components. Optionally you can pass initial weights and covariance matrices of mixture components.
 trainM  Starts with Maximization step. You need to provide initial probabilities to use this option.
The methods return true if the Gaussian mixture model was trained successfully, otherwise it returns false.
Unlike many of the ML models, EM is an unsupervised learning algorithm and it does not take responses (class labels or function values) as input. Instead, it computes the
Maximum Likelihood Estimate of the Gaussian mixture parameters from an input sample set, stores all the parameters inside the structure:
in probs,
in means ,
in covs[k],
in weights , and optionally computes the output “class label” for each sample:
(indices of the most probable mixture component for each sample).
The trained model can be used further for prediction, just like any other classifier. The trained model is similar to the
NormalBayesClassifier.
EM::predict2
Returns a likelihood logarithm value and an index of the most probable mixture component for the given sample.

C++: Vec2d EM::predict2(InputArray sample, OutputArray probs=noArray()) const
Parameters: 
 sample – A sample for classification. It should be a onechannel matrix of or size.
 probs – Optional output matrix that contains posterior probabilities of each component given the sample. It has size and CV_64FC1 type.

The method returns a twoelement double vector. Zero element is a likelihood logarithm value for the sample. First element is an index of the most probable mixture component for the given sample.
EM::getMeans
Returns the cluster centers (means of the Gaussian mixture)

C++: Mat EM::getMeans() const
Returns matrix with the number of rows equal to the number of mixtures and number of columns equal to the space dimensionality.
EM::getWeights
Returns weights of the mixtures

C++: Mat EM::getWeights() const
Returns vector with the number of elements equal to the number of mixtures.
EM::getCovs
Returns covariation matrices

C++: void EM::getCovs(std::vector<Mat>& covs) const
Returns vector of covariation matrices. Number of matrices is the number of gaussian mixtures, each matrix is a square floatingpoint matrix NxN, where N is the space dimensionality.