Enumeration of splitting types that can be used as splitting creation of cluster in X-Means algorithm. More...

Inheritance diagram for pyclustering.cluster.xmeans.splitting_type:

Collaboration diagram for pyclustering.cluster.xmeans.splitting_type:

Static Public Attributes
int	BAYESIAN_INFORMATION_CRITERION = 0
	Bayesian information criterion (BIC) to approximate the correct number of clusters. More...

int	MINIMUM_NOISELESS_DESCRIPTION_LENGTH = 1
	Minimum noiseless description length (MNDL) to approximate the correct number of clusters [37]. More...

Detailed Description

Enumeration of splitting types that can be used as splitting creation of cluster in X-Means algorithm.

Definition at line 31 of file xmeans.py.

Member Data Documentation

◆ BAYESIAN_INFORMATION_CRITERION

int pyclustering.cluster.xmeans.splitting_type.BAYESIAN_INFORMATION_CRITERION = 0

static

Bayesian information criterion (BIC) to approximate the correct number of clusters.

Kass's formula is used to calculate BIC:

\[BIC(\theta) = L(D) - \frac{1}{2}pln(N)\]

The number of free parameters \(p\) is simply the sum of \(K - 1\) class probabilities, \(MK\) centroid coordinates, and one variance estimate:

\[p = (K - 1) + MK + 1\]

The log-likelihood of the data:

\[L(D) = n_jln(n_j) - n_jln(N) - \frac{n_j}{2}ln(2\pi) - \frac{n_jd}{2}ln(\hat{\sigma}^2) - \frac{n_j - K}{2}\]

The maximum likelihood estimate (MLE) for the variance:

\[\hat{\sigma}^2 = \frac{1}{N - K}\sum\limits_{j}\sum\limits_{i}||x_{ij} - \hat{C}_j||^2\]

Definition at line 49 of file xmeans.py.

◆ MINIMUM_NOISELESS_DESCRIPTION_LENGTH

int pyclustering.cluster.xmeans.splitting_type.MINIMUM_NOISELESS_DESCRIPTION_LENGTH = 1

static

Minimum noiseless description length (MNDL) to approximate the correct number of clusters [37].

Beheshti's formula is used to calculate upper bound:

\[Z = \frac{\sigma^2 \sqrt{2K} }{N}(\sqrt{2K} + \beta) + W - \sigma^2 + \frac{2\alpha\sigma}{\sqrt{N}}\sqrt{\frac{\alpha^2\sigma^2}{N} + W - \left(1 - \frac{K}{N}\right)\frac{\sigma^2}{2}} + \frac{2\alpha^2\sigma^2}{N}\]

where \(\alpha\) and \(\beta\) represent the parameters for validation probability and confidence probability.

To improve clustering results some contradiction is introduced:

\[W = \frac{1}{n_j}\sum\limits_{i}||x_{ij} - \hat{C}_j||\]

\[\hat{\sigma}^2 = \frac{1}{N - K}\sum\limits_{j}\sum\limits_{i}||x_{ij} - \hat{C}_j||\]

Definition at line 60 of file xmeans.py.

The documentation for this class was generated from the following file:

pyclustering/cluster/xmeans.py

Static Public Attributes

Detailed Description

Member Data Documentation

◆ BAYESIAN_INFORMATION_CRITERION

◆ MINIMUM_NOISELESS_DESCRIPTION_LENGTH