pyclustering
0.10.1
pyclustring is a Python, C++ data mining library.

Class represents clustering algorithm XMeans. More...
Public Member Functions  
def  __init__ (self, data, initial_centers=None, kmax=20, tolerance=0.001, criterion=splitting_type.BAYESIAN_INFORMATION_CRITERION, ccore=True, **kwargs) 
Constructor of clustering algorithm XMeans. More...  
def  process (self) 
Performs cluster analysis in line with rules of XMeans algorithm. More...  
def  predict (self, points) 
Calculates the closest cluster to each point. More...  
def  get_clusters (self) 
Returns list of allocated clusters, each cluster contains indexes of objects in list of data. More...  
def  get_centers (self) 
Returns list of centers for allocated clusters. More...  
def  get_cluster_encoding (self) 
Returns clustering result representation type that indicate how clusters are encoded. More...  
def  get_total_wce (self) 
Returns sum of Euclidean Squared metric errors (SSE  Sum of Squared Errors). More...  
Class represents clustering algorithm XMeans.
Xmeans clustering method starts with the assumption of having a minimum number of clusters, and then dynamically increases them. Xmeans uses specified splitting criterion to control the process of splitting clusters. Method KMeans++ can be used for calculation of initial centers.
CCORE implementation of the algorithm uses thread pool to parallelize the clustering process.
Here example how to perform cluster analysis using XMeans algorithm:
Visualization of clustering results that were obtained using code above and where XMeans algorithm allocates four clusters.
By default XMeans clustering algorithm uses Bayesian Information Criterion (BIC) to approximate the correct number of clusters. There is an example where another criterion Minimum Noiseless Description Length (MNDL) is used in order to find optimal amount of clusters:
As in many others algorithms, it is possible to specify metric that should be used for cluster analysis, for example, Chebyshev distance metric:
def pyclustering.cluster.xmeans.xmeans.__init__  (  self,  
data,  
initial_centers = None , 

kmax = 20 , 

tolerance = 0.001 , 

criterion = splitting_type.BAYESIAN_INFORMATION_CRITERION , 

ccore = True , 

**  kwargs  
) 
Constructor of clustering algorithm XMeans.
[in]  data  (array_like): Input data that is presented as list of points (objects), each point should be represented by list or tuple. 
[in]  initial_centers  (list): Initial coordinates of centers of clusters that are represented by list: [center1, center2, ...] , if it is not specified then XMeans starts from the random center. 
[in]  kmax  (uint): Maximum number of clusters that can be allocated. 
[in]  tolerance  (double): Stop condition for each iteration: if maximum value of change of centers of clusters is less than tolerance than algorithm will stop processing. 
[in]  criterion  (splitting_type): Type of splitting creation (by default splitting_type.BAYESIAN_INFORMATION_CRITERION ). 
[in]  ccore  (bool): Defines if C++ pyclustering library should be used instead of Python implementation. 
[in]  **kwargs  Arbitrary keyword arguments (available arguments: repeat , random_state , metric , alpha , beta ). 
Keyword Args:
1
). With larger repeat
values suggesting higher probability of finding global optimum.None
, current system time is used).def pyclustering.cluster.xmeans.xmeans.get_centers  (  self  ) 
Returns list of centers for allocated clusters.
def pyclustering.cluster.xmeans.xmeans.get_cluster_encoding  (  self  ) 
Returns clustering result representation type that indicate how clusters are encoded.
def pyclustering.cluster.xmeans.xmeans.get_clusters  (  self  ) 
Returns list of allocated clusters, each cluster contains indexes of objects in list of data.
Definition at line 314 of file xmeans.py.
Referenced by pyclustering.samples.answer_reader.get_cluster_lengths().
def pyclustering.cluster.xmeans.xmeans.get_total_wce  (  self  ) 
Returns sum of Euclidean Squared metric errors (SSE  Sum of Squared Errors).
Sum of metric errors is calculated using distance between point and its center:
\[error=\sum_{i=0}^{N}euclidean_square_distance(x_{i}center(x_{i}))\]
def pyclustering.cluster.xmeans.xmeans.predict  (  self,  
points  
) 
Calculates the closest cluster to each point.
[in]  points  (array_like): Points for which closest clusters are calculated. 
An example how to calculate (or predict) the closest cluster to specified points.
def pyclustering.cluster.xmeans.xmeans.process  (  self  ) 
Performs cluster analysis in line with rules of XMeans algorithm.
Definition at line 206 of file xmeans.py.
Referenced by pyclustering.cluster.xmeans.xmeans.get_total_wce().