pyclustering
0.10.1
pyclustring is a Python, C++ data mining library.

Class implements GMeans clustering algorithm. More...
Public Member Functions  
def  __init__ (self, data, k_init=1, ccore=True, **kwargs) 
Initializes GMeans algorithm. More...  
def  process (self) 
Performs cluster analysis in line with rules of GMeans algorithm. More...  
def  predict (self, points) 
Calculates the closest cluster to each point. More...  
def  get_clusters (self) 
Returns list of allocated clusters, each cluster contains indexes of objects in list of data. More...  
def  get_centers (self) 
Returns list of centers of allocated clusters. More...  
def  get_total_wce (self) 
Returns sum of metric errors that depends on metric that was used for clustering (by default SSE  Sum of Squared Errors). More...  
def  get_cluster_encoding (self) 
Returns clustering result representation type that indicate how clusters are encoded. More...  
Class implements GMeans clustering algorithm.
The Gmeans algorithm starts with a small number of centers, and grows the number of centers. Each iteration of the GMeans algorithm splits into two those centers whose data appear not to come from a Gaussian distribution. Gmeans repeatedly makes decisions based on a statistical test for the data assigned to each center.
Implementation based on the paper [17].
Example #1. In this example, GMeans starts analysis from single cluster.
Example #2. Sometimes GMeans might find local optimum. repeat
value can be used to increase probability to find global optimum. Argument repeat
defines how many times KMeans clustering with KMeans++ initialization should be run in order to find optimal clusters.
In case of requirement to have labels instead of default representation of clustering results CLUSTER_INDEX_LIST_SEPARATION
:
There is an output of the code above:
def pyclustering.cluster.gmeans.gmeans.__init__  (  self,  
data,  
k_init = 1 , 

ccore = True , 

**  kwargs  
) 
Initializes GMeans algorithm.
[in]  data  (array_like): Input data that is presented as array of points (objects), each point should be represented by array_like data structure. 
[in]  k_init  (uint): Initial amount of centers (by default started search from 1). 
[in]  ccore  (bool): Defines whether CCORE library (C/C++ part of the library) should be used instead of Python code. 
[in]  **kwargs  Arbitrary keyword arguments (available arguments: tolerance , repeat , k_max , random_state ). 
Keyword Args:
k_max
is 1).None
, current system time is used). def pyclustering.cluster.gmeans.gmeans.get_centers  (  self  ) 
Returns list of centers of allocated clusters.
def pyclustering.cluster.gmeans.gmeans.get_cluster_encoding  (  self  ) 
Returns clustering result representation type that indicate how clusters are encoded.
def pyclustering.cluster.gmeans.gmeans.get_clusters  (  self  ) 
Returns list of allocated clusters, each cluster contains indexes of objects in list of data.
Definition at line 218 of file gmeans.py.
Referenced by pyclustering.samples.answer_reader.get_cluster_lengths(), and pyclustering.cluster.optics.optics.process().
def pyclustering.cluster.gmeans.gmeans.get_total_wce  (  self  ) 
Returns sum of metric errors that depends on metric that was used for clustering (by default SSE  Sum of Squared Errors).
Sum of metric errors is calculated using distance between point and its center:
\[error=\sum_{i=0}^{N}distance(x_{i}center(x_{i}))\]
def pyclustering.cluster.gmeans.gmeans.predict  (  self,  
points  
) 
Calculates the closest cluster to each point.
[in]  points  (array_like): Points for which closest clusters are calculated. 
def pyclustering.cluster.gmeans.gmeans.process  (  self  ) 
Performs cluster analysis in line with rules of GMeans algorithm.
Definition at line 150 of file gmeans.py.
Referenced by pyclustering.cluster.gmeans.gmeans.get_cluster_encoding().