pyclustering  0.10.1 pyclustring is a Python, C++ data mining library.
pyclustering.cluster.xmeans.splitting_type Class Reference

Enumeration of splitting types that can be used as splitting creation of cluster in X-Means algorithm. More...

Inheritance diagram for pyclustering.cluster.xmeans.splitting_type:
Collaboration diagram for pyclustering.cluster.xmeans.splitting_type:

## Static Public Attributes

int BAYESIAN_INFORMATION_CRITERION = 0
Bayesian information criterion (BIC) to approximate the correct number of clusters. More...

int MINIMUM_NOISELESS_DESCRIPTION_LENGTH = 1
Minimum noiseless description length (MNDL) to approximate the correct number of clusters [37]. More...

## Detailed Description

Enumeration of splitting types that can be used as splitting creation of cluster in X-Means algorithm.

Definition at line 31 of file xmeans.py.

## ◆ BAYESIAN_INFORMATION_CRITERION

 int pyclustering.cluster.xmeans.splitting_type.BAYESIAN_INFORMATION_CRITERION = 0
static

Bayesian information criterion (BIC) to approximate the correct number of clusters.

Kass's formula is used to calculate BIC:

$BIC(\theta) = L(D) - \frac{1}{2}pln(N)$

The number of free parameters $$p$$ is simply the sum of $$K - 1$$ class probabilities, $$MK$$ centroid coordinates, and one variance estimate:

$p = (K - 1) + MK + 1$

The log-likelihood of the data:

$L(D) = n_jln(n_j) - n_jln(N) - \frac{n_j}{2}ln(2\pi) - \frac{n_jd}{2}ln(\hat{\sigma}^2) - \frac{n_j - K}{2}$

The maximum likelihood estimate (MLE) for the variance:

$\hat{\sigma}^2 = \frac{1}{N - K}\sum\limits_{j}\sum\limits_{i}||x_{ij} - \hat{C}_j||^2$

Definition at line 49 of file xmeans.py.

## ◆ MINIMUM_NOISELESS_DESCRIPTION_LENGTH

 int pyclustering.cluster.xmeans.splitting_type.MINIMUM_NOISELESS_DESCRIPTION_LENGTH = 1
static

Minimum noiseless description length (MNDL) to approximate the correct number of clusters [37].

Beheshti's formula is used to calculate upper bound:

$Z = \frac{\sigma^2 \sqrt{2K} }{N}(\sqrt{2K} + \beta) + W - \sigma^2 + \frac{2\alpha\sigma}{\sqrt{N}}\sqrt{\frac{\alpha^2\sigma^2}{N} + W - \left(1 - \frac{K}{N}\right)\frac{\sigma^2}{2}} + \frac{2\alpha^2\sigma^2}{N}$

where $$\alpha$$ and $$\beta$$ represent the parameters for validation probability and confidence probability.

To improve clustering results some contradiction is introduced:

$W = \frac{1}{n_j}\sum\limits_{i}||x_{ij} - \hat{C}_j||$

$\hat{\sigma}^2 = \frac{1}{N - K}\sum\limits_{j}\sum\limits_{i}||x_{ij} - \hat{C}_j||$

Definition at line 60 of file xmeans.py.

The documentation for this class was generated from the following file: