pyclustering.cluster.xmeans.splitting_type Class Reference

Enumeration of splitting types that can be used as splitting creation of cluster in X-Means algorithm. More...

+ Inheritance diagram for pyclustering.cluster.xmeans.splitting_type:
+ Collaboration diagram for pyclustering.cluster.xmeans.splitting_type:

Static Public Attributes

int BAYESIAN_INFORMATION_CRITERION = 0
 Bayesian information criterion (BIC) to approximate the correct number of clusters. More...
 
int MINIMUM_NOISELESS_DESCRIPTION_LENGTH = 1
 Minimum noiseless description length (MNDL) to approximate the correct number of clusters. More...
 

Detailed Description

Enumeration of splitting types that can be used as splitting creation of cluster in X-Means algorithm.

Definition at line 46 of file xmeans.py.

Member Data Documentation

◆ BAYESIAN_INFORMATION_CRITERION

int pyclustering.cluster.xmeans.splitting_type.BAYESIAN_INFORMATION_CRITERION = 0
static

Bayesian information criterion (BIC) to approximate the correct number of clusters.

Kass's formula is used to calculate BIC:

\[BIC(\theta) = L(D) - \frac{1}{2}pln(N)\]

The number of free parameters $p$ is simply the sum of $K - 1$ class probabilities, $MK$ centroid coordinates, and one variance estimate:

\[p = (K - 1) + MK + 1\]

The log-likelihood of the data:

\[L(D) = n_jln(n_j) - n_jln(N) - \frac{n_j}{2}ln(2\pi) - \frac{n_jd}{2}ln(\hat{\sigma}^2) - \frac{n_j - K}{2}\]

The maximum likelihood estimate (MLE) for the variance:

\[\hat{\sigma}^2 = \frac{1}{N - K}\sum\limits_{j}\sum\limits_{i}||x_{ij} - \hat{C}_j||^2\]

Definition at line 64 of file xmeans.py.

◆ MINIMUM_NOISELESS_DESCRIPTION_LENGTH

int pyclustering.cluster.xmeans.splitting_type.MINIMUM_NOISELESS_DESCRIPTION_LENGTH = 1
static

Minimum noiseless description length (MNDL) to approximate the correct number of clusters.

Beheshti's formula is used to calculate upper bound:

\[Z = \frac{\sigma^2 \sqrt{2K} }{N}(\sqrt{2K} + \beta) + W - \sigma^2 + \frac{2\alpha\sigma}{\sqrt{N}}\sqrt{\frac{\alpha^2\sigma^2}{N} + W - \left(1 - \frac{K}{N}\right)\frac{\sigma^2}{2}} + \frac{2\alpha^2\sigma^2}{N}\]

where $\alpha$ and $\beta$ represent the parameters for validation probability and confidence probability.

To improve clustering results some contradiction is introduced:

\[W = \frac{1}{n_j}\sum\limits_{i}||x_{ij} - \hat{C}_j||\]

\[\hat{\sigma}^2 = \frac{1}{N - K}\sum\limits_{j}\sum\limits_{i}||x_{ij} - \hat{C}_j||\]

Definition at line 75 of file xmeans.py.


The documentation for this class was generated from the following file: