pyclustering  0.10.1
pyclustring is a Python, C++ data mining library.
xmeans.py
1 """!
2 
3 @brief Cluster analysis algorithm: X-Means
4 @details Implementation based on papers @cite article::xmeans::1, @cite article::xmeans::mndl
5 
6 @authors Andrei Novikov (pyclustering@yandex.ru)
7 @date 2014-2020
8 @copyright BSD-3-Clause
9 
10 """
11 
12 
13 import copy
14 import numpy
15 
16 from enum import IntEnum
17 from math import log
18 
19 from pyclustering.cluster.encoder import type_encoding
20 from pyclustering.cluster.kmeans import kmeans
21 from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
22 
23 from pyclustering.core.metric_wrapper import metric_wrapper
24 from pyclustering.core.wrapper import ccore_library
25 
26 import pyclustering.core.xmeans_wrapper as wrapper
27 
28 from pyclustering.utils import distance_metric, type_metric
29 
30 
31 class splitting_type(IntEnum):
32  """!
33  @brief Enumeration of splitting types that can be used as splitting creation of cluster in X-Means algorithm.
34 
35  """
36 
37 
49  BAYESIAN_INFORMATION_CRITERION = 0
50 
51 
60  MINIMUM_NOISELESS_DESCRIPTION_LENGTH = 1
61 
62 
63 class xmeans:
64  """!
65  @brief Class represents clustering algorithm X-Means.
66  @details X-means clustering method starts with the assumption of having a minimum number of clusters,
67  and then dynamically increases them. X-means uses specified splitting criterion to control
68  the process of splitting clusters. Method K-Means++ can be used for calculation of initial centers.
69 
70  CCORE implementation of the algorithm uses thread pool to parallelize the clustering process.
71 
72  Here example how to perform cluster analysis using X-Means algorithm:
73  @code
74  from pyclustering.cluster import cluster_visualizer
75  from pyclustering.cluster.xmeans import xmeans
76  from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
77  from pyclustering.utils import read_sample
78  from pyclustering.samples.definitions import SIMPLE_SAMPLES
79 
80  # Read sample 'simple3' from file.
81  sample = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE3)
82 
83  # Prepare initial centers - amount of initial centers defines amount of clusters from which X-Means will
84  # start analysis.
85  amount_initial_centers = 2
86  initial_centers = kmeans_plusplus_initializer(sample, amount_initial_centers).initialize()
87 
88  # Create instance of X-Means algorithm. The algorithm will start analysis from 2 clusters, the maximum
89  # number of clusters that can be allocated is 20.
90  xmeans_instance = xmeans(sample, initial_centers, 20)
91  xmeans_instance.process()
92 
93  # Extract clustering results: clusters and their centers
94  clusters = xmeans_instance.get_clusters()
95  centers = xmeans_instance.get_centers()
96 
97  # Print total sum of metric errors
98  print("Total WCE:", xmeans_instance.get_total_wce())
99 
100  # Visualize clustering results
101  visualizer = cluster_visualizer()
102  visualizer.append_clusters(clusters, sample)
103  visualizer.append_cluster(centers, None, marker='*', markersize=10)
104  visualizer.show()
105  @endcode
106 
107  Visualization of clustering results that were obtained using code above and where X-Means algorithm allocates four clusters.
108  @image html xmeans_clustering_simple3.png "Fig. 1. X-Means clustering results (data 'Simple3')."
109 
110  By default X-Means clustering algorithm uses Bayesian Information Criterion (BIC) to approximate the correct number
111  of clusters. There is an example where another criterion Minimum Noiseless Description Length (MNDL) is used in order
112  to find optimal amount of clusters:
113  @code
114  from pyclustering.cluster import cluster_visualizer
115  from pyclustering.cluster.xmeans import xmeans, splitting_type
116  from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
117  from pyclustering.utils import read_sample
118  from pyclustering.samples.definitions import FCPS_SAMPLES
119 
120  # Read sample 'Target'.
121  sample = read_sample(FCPS_SAMPLES.SAMPLE_TARGET)
122 
123  # Prepare initial centers - amount of initial centers defines amount of clusters from which X-Means will start analysis.
124  random_seed = 1000
125  amount_initial_centers = 3
126  initial_centers = kmeans_plusplus_initializer(sample, amount_initial_centers, random_state=random_seed).initialize()
127 
128  # Create instance of X-Means algorithm with MNDL splitting criterion.
129  xmeans_mndl = xmeans(sample, initial_centers, 20, splitting_type=splitting_type.MINIMUM_NOISELESS_DESCRIPTION_LENGTH, random_state=random_seed)
130  xmeans_mndl.process()
131 
132  # Extract X-Means MNDL clustering results.
133  mndl_clusters = xmeans_mndl.get_clusters()
134 
135  # Visualize clustering results.
136  visualizer = cluster_visualizer(titles=['X-Means with MNDL criterion'])
137  visualizer.append_clusters(mndl_clusters, sample)
138  visualizer.show()
139  @endcode
140 
141  @image html xmeans_clustering_mndl_target.png "Fig. 2. X-Means MNDL clustering results (data 'Target')."
142 
143  As in many others algorithms, it is possible to specify metric that should be used for cluster analysis, for
144  example, Chebyshev distance metric:
145  @code
146  # Create instance of X-Means algorithm with Chebyshev distance metric.
147  chebyshev_metric = distance_metric(type_metric.CHEBYSHEV)
148  xmeans_instance = xmeans(sample, initial_centers, max_clusters_amount, metric=chebyshev_metric).process()
149  @endcode
150 
151  @see center_initializer
152 
153  """
154 
155  def __init__(self, data, initial_centers=None, kmax=20, tolerance=0.001, criterion=splitting_type.BAYESIAN_INFORMATION_CRITERION, ccore=True, **kwargs):
156  """!
157  @brief Constructor of clustering algorithm X-Means.
158 
159  @param[in] data (array_like): Input data that is presented as list of points (objects), each point should be represented by list or tuple.
160  @param[in] initial_centers (list): Initial coordinates of centers of clusters that are represented by list: `[center1, center2, ...]`,
161  if it is not specified then X-Means starts from the random center.
162  @param[in] kmax (uint): Maximum number of clusters that can be allocated.
163  @param[in] tolerance (double): Stop condition for each iteration: if maximum value of change of centers of clusters is less than tolerance than algorithm will stop processing.
164  @param[in] criterion (splitting_type): Type of splitting creation (by default `splitting_type.BAYESIAN_INFORMATION_CRITERION`).
165  @param[in] ccore (bool): Defines if C++ pyclustering library should be used instead of Python implementation.
166  @param[in] **kwargs: Arbitrary keyword arguments (available arguments: `repeat`, `random_state`, `metric`, `alpha`, `beta`).
167 
168  <b>Keyword Args:</b><br>
169  - repeat (unit): How many times K-Means should be run to improve parameters (by default is `1`).
170  With larger `repeat` values suggesting higher probability of finding global optimum.
171  - random_state (int): Seed for random state (by default is `None`, current system time is used).
172  - metric (distance_metric): Metric that is used for distance calculation between two points (by default
173  euclidean square distance).
174  - alpha (double): Parameter distributed [0.0, 1.0] for alpha probabilistic bound \f$Q\left(\alpha\right)\f$.
175  The parameter is used only in case of MNDL splitting criterion, in all other cases this value is ignored.
176  - beta (double): Parameter distributed [0.0, 1.0] for beta probabilistic bound \f$Q\left(\beta\right)\f$.
177  The parameter is used only in case of MNDL splitting criterion, in all other cases this value is ignored.
178 
179  """
180 
181  self.__pointer_data = numpy.array(data)
182  self.__clusters = []
183  self.__random_state = kwargs.get('random_state', None)
184  self.__metric = copy.copy(kwargs.get('metric', distance_metric(type_metric.EUCLIDEAN_SQUARE)))
185 
186  if initial_centers is not None:
187  self.__centers = numpy.array(initial_centers)
188  else:
189  self.__centers = kmeans_plusplus_initializer(data, 2, random_state=self.__random_state).initialize()
190 
191  self.__kmax = kmax
192  self.__tolerance = tolerance
193  self.__criterion = criterion
194  self.__total_wce = 0.0
195  self.__repeat = kwargs.get('repeat', 1)
196  self.__alpha = kwargs.get('alpha', 0.9)
197  self.__beta = kwargs.get('beta', 0.9)
198 
199  self.__ccore = ccore and self.__metric.get_type() != type_metric.USER_DEFINED
200  if self.__ccore is True:
201  self.__ccore = ccore_library.workable()
202 
203  self.__verify_arguments()
204 
205 
206  def process(self):
207  """!
208  @brief Performs cluster analysis in line with rules of X-Means algorithm.
209 
210  @return (xmeans) Returns itself (X-Means instance).
211 
212  @see get_clusters()
213  @see get_centers()
214 
215  """
216 
217  if self.__ccore is True:
218  self.__process_by_ccore()
219 
220  else:
221  self.__process_by_python()
222 
223  return self
224 
225 
226  def __process_by_ccore(self):
227  """!
228  @brief Performs cluster analysis using CCORE (C/C++ part of pyclustering library).
229 
230  """
231 
232  ccore_metric = metric_wrapper.create_instance(self.__metric)
233 
234  result = wrapper.xmeans(self.__pointer_data, self.__centers, self.__kmax, self.__tolerance, self.__criterion,
235  self.__alpha, self.__beta, self.__repeat, self.__random_state,
236  ccore_metric.get_pointer())
237 
238  self.__clusters = result[0]
239  self.__centers = result[1]
240  self.__total_wce = result[2][0]
241 
242 
243  def __process_by_python(self):
244  """!
245  @brief Performs cluster analysis using python code.
246 
247  """
248 
249  self.__clusters = []
250  while len(self.__centers) <= self.__kmax:
251  current_cluster_number = len(self.__centers)
252 
253  self.__clusters, self.__centers, _ = self.__improve_parameters(self.__centers)
254  allocated_centers = self.__improve_structure(self.__clusters, self.__centers)
255 
256  if current_cluster_number == len(allocated_centers):
257  break
258  else:
259  self.__centers = allocated_centers
260 
261  self.__clusters, self.__centers, self.__total_wce = self.__improve_parameters(self.__centers)
262 
263 
264  def predict(self, points):
265  """!
266  @brief Calculates the closest cluster to each point.
267 
268  @param[in] points (array_like): Points for which closest clusters are calculated.
269 
270  @return (list) List of closest clusters for each point. Each cluster is denoted by index. Return empty
271  collection if 'process()' method was not called.
272 
273  An example how to calculate (or predict) the closest cluster to specified points.
274  @code
275  from pyclustering.cluster.xmeans import xmeans
276  from pyclustering.samples.definitions import SIMPLE_SAMPLES
277  from pyclustering.utils import read_sample
278 
279  # Load list of points for cluster analysis.
280  sample = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE3)
281 
282  # Initial centers for sample 'Simple3'.
283  initial_centers = [[0.2, 0.1], [4.0, 1.0], [2.0, 2.0], [2.3, 3.9]]
284 
285  # Create instance of X-Means algorithm with prepared centers.
286  xmeans_instance = xmeans(sample, initial_centers)
287 
288  # Run cluster analysis.
289  xmeans_instance.process()
290 
291  # Calculate the closest cluster to following two points.
292  points = [[0.25, 0.2], [2.5, 4.0]]
293  closest_clusters = xmeans_instance.predict(points)
294  print(closest_clusters)
295  @endcode
296 
297  """
298  nppoints = numpy.array(points)
299  if len(self.__clusters) == 0:
300  return []
301 
302  self.__metric.enable_numpy_usage()
303 
304  npcenters = numpy.array(self.__centers)
305  differences = numpy.zeros((len(nppoints), len(npcenters)))
306  for index_point in range(len(nppoints)):
307  differences[index_point] = self.__metric(nppoints[index_point], npcenters)
308 
309  self.__metric.disable_numpy_usage()
310 
311  return numpy.argmin(differences, axis=1)
312 
313 
314  def get_clusters(self):
315  """!
316  @brief Returns list of allocated clusters, each cluster contains indexes of objects in list of data.
317 
318  @return (list) List of allocated clusters.
319 
320  @see process()
321  @see get_centers()
322  @see get_total_wce()
323 
324  """
325 
326  return self.__clusters
327 
328 
329  def get_centers(self):
330  """!
331  @brief Returns list of centers for allocated clusters.
332 
333  @return (list) List of centers for allocated clusters.
334 
335  @see process()
336  @see get_clusters()
337  @see get_total_wce()
338 
339  """
340 
341  return self.__centers
342 
343 
345  """!
346  @brief Returns clustering result representation type that indicate how clusters are encoded.
347 
348  @return (type_encoding) Clustering result representation.
349 
350  @see get_clusters()
351 
352  """
353 
354  return type_encoding.CLUSTER_INDEX_LIST_SEPARATION
355 
356 
357  def get_total_wce(self):
358  """!
359  @brief Returns sum of Euclidean Squared metric errors (SSE - Sum of Squared Errors).
360  @details Sum of metric errors is calculated using distance between point and its center:
361  \f[error=\sum_{i=0}^{N}euclidean_square_distance(x_{i}-center(x_{i}))\f]
362 
363  @see process()
364  @see get_clusters()
365 
366  """
367 
368  return self.__total_wce
369 
370 
371  def __search_optimial_parameters(self, local_data):
372  """!
373  @brief Split data of the region into two cluster and tries to find global optimum by running k-means clustering
374  several times (defined by 'repeat' argument).
375 
376  @param[in] local_data (list): Points of a region that should be split into two clusters.
377 
378  @return (tuple) List of allocated clusters, list of centers and total WCE (clusters, centers, wce).
379 
380  """
381  optimal_wce, optimal_centers, optimal_clusters = float('+inf'), None, None
382 
383  for _ in range(self.__repeat):
384  candidates = 5
385  if len(local_data) < candidates:
386  candidates = len(local_data)
387 
388  local_centers = kmeans_plusplus_initializer(local_data, 2, candidates, random_state=self.__random_state).initialize()
389 
390  kmeans_instance = kmeans(local_data, local_centers, tolerance=self.__tolerance, ccore=False, metric=self.__metric)
391  kmeans_instance.process()
392 
393  local_wce = kmeans_instance.get_total_wce()
394  if local_wce < optimal_wce:
395  optimal_centers = kmeans_instance.get_centers()
396  optimal_clusters = kmeans_instance.get_clusters()
397  optimal_wce = local_wce
398 
399  return optimal_clusters, optimal_centers, optimal_wce
400 
401 
402  def __improve_parameters(self, centers, available_indexes=None):
403  """!
404  @brief Performs k-means clustering in the specified region.
405 
406  @param[in] centers (list): Cluster centers, if None then automatically generated two centers using center initialization method.
407  @param[in] available_indexes (list): Indexes that defines which points can be used for k-means clustering, if None then all points are used.
408 
409  @return (tuple) List of allocated clusters, list of centers and total WCE (clusters, centers, wce).
410 
411  """
412 
413  if available_indexes and len(available_indexes) == 1:
414  index_center = available_indexes[0]
415  return [available_indexes], self.__pointer_data[index_center], 0.0
416 
417  local_data = self.__pointer_data
418  if available_indexes:
419  local_data = [self.__pointer_data[i] for i in available_indexes]
420 
421  local_centers = centers
422  if centers is None:
423  clusters, local_centers, local_wce = self.__search_optimial_parameters(local_data)
424  else:
425  kmeans_instance = kmeans(local_data, local_centers, tolerance=self.__tolerance, ccore=False, metric=self.__metric).process()
426 
427  local_wce = kmeans_instance.get_total_wce()
428  local_centers = kmeans_instance.get_centers()
429  clusters = kmeans_instance.get_clusters()
430 
431  if available_indexes:
432  clusters = self.__local_to_global_clusters(clusters, available_indexes)
433 
434  return clusters, local_centers, local_wce
435 
436 
437  def __local_to_global_clusters(self, local_clusters, available_indexes):
438  """!
439  @brief Converts clusters in local region define by 'available_indexes' to global clusters.
440 
441  @param[in] local_clusters (list): Local clusters in specific region.
442  @param[in] available_indexes (list): Map between local and global point's indexes.
443 
444  @return Global clusters.
445 
446  """
447 
448  clusters = []
449  for local_cluster in local_clusters:
450  current_cluster = []
451  for index_point in local_cluster:
452  current_cluster.append(available_indexes[index_point])
453 
454  clusters.append(current_cluster)
455 
456  return clusters
457 
458 
459  def __improve_structure(self, clusters, centers):
460  """!
461  @brief Check for best structure: divides each cluster into two and checks for best results using splitting criterion.
462 
463  @param[in] clusters (list): Clusters that have been allocated (each cluster contains indexes of points from data).
464  @param[in] centers (list): Centers of clusters.
465 
466  @return (list) Allocated centers for clustering.
467 
468  """
469 
470  allocated_centers = []
471  amount_free_centers = self.__kmax - len(centers)
472 
473  for index_cluster in range(len(clusters)):
474  # solve k-means problem for children where data of parent are used.
475  (parent_child_clusters, parent_child_centers, _) = self.__improve_parameters(None, clusters[index_cluster])
476 
477  # If it's possible to split current data
478  if len(parent_child_clusters) > 1:
479  # Calculate splitting criterion
480  parent_scores = self.__splitting_criterion([clusters[index_cluster]], [centers[index_cluster]])
481  child_scores = self.__splitting_criterion([parent_child_clusters[0], parent_child_clusters[1]], parent_child_centers)
482 
483  split_require = False
484 
485  # Reallocate number of centers (clusters) in line with scores
486  if self.__criterion == splitting_type.BAYESIAN_INFORMATION_CRITERION:
487  if parent_scores < child_scores:
488  split_require = True
489 
490  elif self.__criterion == splitting_type.MINIMUM_NOISELESS_DESCRIPTION_LENGTH:
491  # If its score for the split structure with two children is smaller than that for the parent structure,
492  # then representing the data samples with two clusters is more accurate in comparison to a single parent cluster.
493  if parent_scores > child_scores:
494  split_require = True
495 
496  if (split_require is True) and (amount_free_centers > 0):
497  allocated_centers.append(parent_child_centers[0])
498  allocated_centers.append(parent_child_centers[1])
499 
500  amount_free_centers -= 1
501  else:
502  allocated_centers.append(centers[index_cluster])
503 
504  else:
505  allocated_centers.append(centers[index_cluster])
506 
507  return allocated_centers
508 
509 
510  def __splitting_criterion(self, clusters, centers):
511  """!
512  @brief Calculates splitting criterion for input clusters.
513 
514  @param[in] clusters (list): Clusters for which splitting criterion should be calculated.
515  @param[in] centers (list): Centers of the clusters.
516 
517  @return (double) Returns splitting criterion. High value of splitting criterion means that current structure is
518  much better.
519 
520  @see __bayesian_information_criterion(clusters, centers)
521  @see __minimum_noiseless_description_length(clusters, centers)
522 
523  """
524 
525  if self.__criterion == splitting_type.BAYESIAN_INFORMATION_CRITERION:
526  return self.__bayesian_information_criterion(clusters, centers)
527 
528  elif self.__criterion == splitting_type.MINIMUM_NOISELESS_DESCRIPTION_LENGTH:
529  return self.__minimum_noiseless_description_length(clusters, centers)
530 
531  else:
532  assert 0
533 
534 
535  def __minimum_noiseless_description_length(self, clusters, centers):
536  """!
537  @brief Calculates splitting criterion for input clusters using minimum noiseless description length criterion.
538 
539  @param[in] clusters (list): Clusters for which splitting criterion should be calculated.
540  @param[in] centers (list): Centers of the clusters.
541 
542  @return (double) Returns splitting criterion in line with bayesian information criterion.
543  Low value of splitting cretion means that current structure is much better.
544 
545  @see __bayesian_information_criterion(clusters, centers)
546 
547  """
548 
549  score = float('inf')
550 
551  W = 0.0
552  K = len(clusters)
553  N = 0.0
554 
555  sigma_square = 0.0
556 
557  alpha = self.__alpha
558  alpha_square = alpha * alpha
559  beta = self.__beta
560 
561  for index_cluster in range(0, len(clusters), 1):
562  Ni = len(clusters[index_cluster])
563  if Ni == 0:
564  return float('inf')
565 
566  Wi = 0.0
567  for index_object in clusters[index_cluster]:
568  Wi += self.__metric(self.__pointer_data[index_object], centers[index_cluster])
569 
570  sigma_square += Wi
571  W += Wi / Ni
572  N += Ni
573 
574  if N - K > 0:
575  sigma_square /= (N - K)
576  sigma = sigma_square ** 0.5
577 
578  Kw = (1.0 - K / N) * sigma_square
579  Ksa = (2.0 * alpha * sigma / (N ** 0.5)) * (alpha_square * sigma_square / N + W - Kw / 2.0) ** 0.5
580  UQa = W - Kw + 2.0 * alpha_square * sigma_square / N + Ksa
581 
582  score = sigma_square * K / N + UQa + sigma_square * beta * ((2.0 * K) ** 0.5) / N
583 
584  return score
585 
586 
587  def __bayesian_information_criterion(self, clusters, centers):
588  """!
589  @brief Calculates splitting criterion for input clusters using bayesian information criterion.
590 
591  @param[in] clusters (list): Clusters for which splitting criterion should be calculated.
592  @param[in] centers (list): Centers of the clusters.
593 
594  @return (double) Splitting criterion in line with bayesian information criterion.
595  High value of splitting criterion means that current structure is much better.
596 
597  @see __minimum_noiseless_description_length(clusters, centers)
598 
599  """
600 
601  scores = [float('inf')] * len(clusters) # splitting criterion
602  dimension = len(self.__pointer_data[0])
603 
604  # estimation of the noise variance in the data set
605  sigma_sqrt = 0.0
606  K = len(clusters)
607  N = 0.0
608 
609  for index_cluster in range(0, len(clusters), 1):
610  for index_object in clusters[index_cluster]:
611  sigma_sqrt += self.__metric(self.__pointer_data[index_object], centers[index_cluster])
612 
613  N += len(clusters[index_cluster])
614 
615  if N - K > 0:
616  sigma_sqrt /= (N - K)
617  p = (K - 1) + dimension * K + 1
618 
619  # in case of the same points, sigma_sqrt can be zero (issue: #407)
620  sigma_multiplier = 0.0
621  if sigma_sqrt <= 0.0:
622  sigma_multiplier = float('-inf')
623  else:
624  sigma_multiplier = dimension * 0.5 * log(sigma_sqrt)
625 
626  # splitting criterion
627  for index_cluster in range(0, len(clusters), 1):
628  n = len(clusters[index_cluster])
629 
630  L = n * log(n) - n * log(N) - n * 0.5 * log(2.0 * numpy.pi) - n * sigma_multiplier - (n - K) * 0.5
631 
632  # BIC calculation
633  scores[index_cluster] = L - p * 0.5 * log(N)
634 
635  return sum(scores)
636 
637 
638  def __verify_arguments(self):
639  """!
640  @brief Verify input parameters for the algorithm and throw exception in case of incorrectness.
641 
642  """
643  if len(self.__pointer_data) == 0:
644  raise ValueError("Input data is empty (size: '%d')." % len(self.__pointer_data))
645 
646  if len(self.__centers) == 0:
647  raise ValueError("Initial centers are empty (size: '%d')." % len(self.__pointer_data))
648 
649  if self.__tolerance < 0:
650  raise ValueError("Tolerance (current value: '%d') should be greater or equal to 0." %
651  self.__tolerance)
652 
653  if self.__repeat <= 0:
654  raise ValueError("Repeat (current value: '%d') should be greater than 0." %
655  self.__repeat)
656 
657  if self.__alpha < 0.0 or self.__alpha > 1.0:
658  raise ValueError("Parameter for the probabilistic bound Q(alpha) should in the following range [0, 1] "
659  "(current value: '%f')." % self.__alpha)
660 
661  if self.__beta < 0.0 or self.__beta > 1.0:
662  raise ValueError("Parameter for the probabilistic bound Q(beta) should in the following range [0, 1] "
663  "(current value: '%f')." % self.__beta)
pyclustering.cluster.center_initializer.kmeans_plusplus_initializer
K-Means++ is an algorithm for choosing the initial centers for algorithms like K-Means or X-Means.
Definition: center_initializer.py:95
pyclustering.cluster.xmeans.xmeans.__process_by_ccore
def __process_by_ccore(self)
Performs cluster analysis using CCORE (C/C++ part of pyclustering library).
Definition: xmeans.py:226
pyclustering.cluster.xmeans.xmeans.__total_wce
__total_wce
Definition: xmeans.py:194
pyclustering.cluster.xmeans.xmeans.__splitting_criterion
def __splitting_criterion(self, clusters, centers)
Calculates splitting criterion for input clusters.
Definition: xmeans.py:510
pyclustering.cluster.xmeans.xmeans.predict
def predict(self, points)
Calculates the closest cluster to each point.
Definition: xmeans.py:264
pyclustering.cluster.xmeans.xmeans.get_cluster_encoding
def get_cluster_encoding(self)
Returns clustering result representation type that indicate how clusters are encoded.
Definition: xmeans.py:344
pyclustering.cluster.kmeans.kmeans
Class implements K-Means clustering algorithm.
Definition: kmeans.py:253
pyclustering.cluster.center_initializer
Collection of center initializers for algorithm that uses initial centers, for example,...
Definition: center_initializer.py:1
pyclustering.cluster.xmeans.xmeans.__centers
__centers
Definition: xmeans.py:187
pyclustering.cluster.xmeans.xmeans.__kmax
__kmax
Definition: xmeans.py:191
pyclustering.cluster.xmeans.xmeans.get_centers
def get_centers(self)
Returns list of centers for allocated clusters.
Definition: xmeans.py:329
pyclustering.cluster.xmeans.xmeans.__bayesian_information_criterion
def __bayesian_information_criterion(self, clusters, centers)
Calculates splitting criterion for input clusters using bayesian information criterion.
Definition: xmeans.py:587
pyclustering.cluster.xmeans.xmeans
Class represents clustering algorithm X-Means.
Definition: xmeans.py:63
pyclustering.cluster.xmeans.xmeans.__local_to_global_clusters
def __local_to_global_clusters(self, local_clusters, available_indexes)
Converts clusters in local region define by 'available_indexes' to global clusters.
Definition: xmeans.py:437
pyclustering.cluster.xmeans.xmeans.get_total_wce
def get_total_wce(self)
Returns sum of Euclidean Squared metric errors (SSE - Sum of Squared Errors).
Definition: xmeans.py:357
pyclustering.cluster.xmeans.xmeans.__beta
__beta
Definition: xmeans.py:197
pyclustering.cluster.xmeans.xmeans.__pointer_data
__pointer_data
Definition: xmeans.py:181
pyclustering.cluster.xmeans.xmeans.__clusters
__clusters
Definition: xmeans.py:182
pyclustering.cluster.xmeans.xmeans.__metric
__metric
Definition: xmeans.py:184
pyclustering.cluster.xmeans.xmeans.__init__
def __init__(self, data, initial_centers=None, kmax=20, tolerance=0.001, criterion=splitting_type.BAYESIAN_INFORMATION_CRITERION, ccore=True, **kwargs)
Constructor of clustering algorithm X-Means.
Definition: xmeans.py:155
pyclustering.cluster.xmeans.splitting_type
Enumeration of splitting types that can be used as splitting creation of cluster in X-Means algorithm...
Definition: xmeans.py:31
pyclustering.cluster.xmeans.xmeans.get_clusters
def get_clusters(self)
Returns list of allocated clusters, each cluster contains indexes of objects in list of data.
Definition: xmeans.py:314
pyclustering.cluster.xmeans.xmeans.__repeat
__repeat
Definition: xmeans.py:195
pyclustering.cluster.xmeans.xmeans.__process_by_python
def __process_by_python(self)
Performs cluster analysis using python code.
Definition: xmeans.py:243
pyclustering.cluster.xmeans.xmeans.__improve_parameters
def __improve_parameters(self, centers, available_indexes=None)
Performs k-means clustering in the specified region.
Definition: xmeans.py:402
pyclustering.cluster.xmeans.xmeans.__criterion
__criterion
Definition: xmeans.py:193
pyclustering.cluster.kmeans
The module contains K-Means algorithm and other related services.
Definition: kmeans.py:1
pyclustering.cluster.xmeans.xmeans.__random_state
__random_state
Definition: xmeans.py:183
pyclustering.cluster.xmeans.xmeans.__tolerance
__tolerance
Definition: xmeans.py:192
pyclustering.cluster.xmeans.xmeans.__verify_arguments
def __verify_arguments(self)
Verify input parameters for the algorithm and throw exception in case of incorrectness.
Definition: xmeans.py:638
pyclustering.cluster.xmeans.xmeans.process
def process(self)
Performs cluster analysis in line with rules of X-Means algorithm.
Definition: xmeans.py:206
pyclustering.cluster.xmeans.xmeans.__alpha
__alpha
Definition: xmeans.py:196
pyclustering.cluster.xmeans.xmeans.__minimum_noiseless_description_length
def __minimum_noiseless_description_length(self, clusters, centers)
Calculates splitting criterion for input clusters using minimum noiseless description length criterio...
Definition: xmeans.py:535
pyclustering.utils
Utils that are used by modules of pyclustering.
Definition: __init__.py:1
pyclustering.cluster.encoder
Module for representing clustering results.
Definition: encoder.py:1
pyclustering.cluster.xmeans.xmeans.__improve_structure
def __improve_structure(self, clusters, centers)
Check for best structure: divides each cluster into two and checks for best results using splitting c...
Definition: xmeans.py:459
pyclustering.cluster.xmeans.xmeans.__ccore
__ccore
Definition: xmeans.py:199
pyclustering.cluster.xmeans.xmeans.__search_optimial_parameters
def __search_optimial_parameters(self, local_data)
Split data of the region into two cluster and tries to find global optimum by running k-means cluster...
Definition: xmeans.py:371