Enumeration of splitting types that can be used as splitting creation of cluster in X-Means algorithm. More...
Static Public Attributes | |
int | BAYESIAN_INFORMATION_CRITERION = 0 |
Bayesian information criterion (BIC) to approximate the correct number of clusters. More... | |
int | MINIMUM_NOISELESS_DESCRIPTION_LENGTH = 1 |
Minimum noiseless description length (MNDL) to approximate the correct number of clusters. More... | |
Enumeration of splitting types that can be used as splitting creation of cluster in X-Means algorithm.
|
static |
Bayesian information criterion (BIC) to approximate the correct number of clusters.
Kass's formula is used to calculate BIC:
The number of free parameters is simply the sum of class probabilities, centroid coordinates, and one variance estimate:
The log-likelihood of the data:
The maximum likelihood estimate (MLE) for the variance:
|
static |
Minimum noiseless description length (MNDL) to approximate the correct number of clusters.
Beheshti's formula is used to calculate upper bound:
where and represent the parameters for validation probability and confidence probability.
To improve clustering results some contradiction is introduced: