pyclustering  0.10.1
pyclustring is a Python, C++ data mining library.
silhouette.py
1 """!
2 
3 @brief Silhouette - method of interpretation and validation of consistency.
4 @details Implementation based on paper @cite article::cluster::silhouette::1.
5 
6 @authors Andrei Novikov (pyclustering@yandex.ru)
7 @date 2014-2020
8 @copyright BSD-3-Clause
9 
10 """
11 
12 
13 from enum import IntEnum
14 
15 import numpy
16 
17 from pyclustering.cluster.kmeans import kmeans
18 from pyclustering.cluster.kmedians import kmedians
19 from pyclustering.cluster.kmedoids import kmedoids
20 from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
21 
22 from pyclustering.utils.metric import distance_metric, type_metric
23 
24 from pyclustering.core.wrapper import ccore_library
25 from pyclustering.core.metric_wrapper import metric_wrapper
26 
27 import pyclustering.core.silhouette_wrapper as wrapper
28 
29 
30 class silhouette:
31  """!
32  @brief Represents Silhouette method that is used interpretation and validation of consistency.
33  @details The silhouette value is a measure of how similar an object is to its own cluster compared to other clusters.
34  Be aware that silhouette method is applicable for K algorithm family, such as K-Means, K-Medians,
35  K-Medoids, X-Means, etc., not not applicable for DBSCAN, OPTICS, CURE, etc. The Silhouette value is
36  calculated using following formula:
37  \f[s\left ( i \right )=\frac{ b\left ( i \right ) - a\left ( i \right ) }{ max\left \{ a\left ( i \right ), b\left ( i \right ) \right \}}\f]
38  where \f$a\left ( i \right )\f$ - is average distance from object i to objects in its own cluster,
39  \f$b\left ( i \right )\f$ - is average distance from object i to objects in the nearest cluster (the appropriate among other clusters).
40 
41  Here is an example where Silhouette score is calculated for K-Means's clustering result:
42  @code
43  from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
44  from pyclustering.cluster.kmeans import kmeans
45  from pyclustering.cluster.silhouette import silhouette
46 
47  from pyclustering.samples.definitions import SIMPLE_SAMPLES
48  from pyclustering.utils import read_sample
49 
50  # Read data 'SampleSimple3' from Simple Sample collection.
51  sample = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE3)
52 
53  # Prepare initial centers
54  centers = kmeans_plusplus_initializer(sample, 4).initialize()
55 
56  # Perform cluster analysis
57  kmeans_instance = kmeans(sample, centers)
58  kmeans_instance.process()
59  clusters = kmeans_instance.get_clusters()
60 
61  # Calculate Silhouette score
62  score = silhouette(sample, clusters).process().get_score()
63  @endcode
64 
65  Let's perform clustering of the same sample by K-Means algorithm using different `K` values (2, 4, 6 and 8) and
66  estimate clustering results using Silhouette method.
67  @code
68  from pyclustering.cluster.kmeans import kmeans
69  from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
70  from pyclustering.cluster.silhouette import silhouette
71 
72  from pyclustering.samples.definitions import SIMPLE_SAMPLES
73  from pyclustering.utils import read_sample
74 
75  import matplotlib.pyplot as plt
76 
77  def get_score(sample, amount_clusters):
78  # Prepare initial centers for K-Means algorithm.
79  centers = kmeans_plusplus_initializer(sample, amount_clusters).initialize()
80 
81  # Perform cluster analysis.
82  kmeans_instance = kmeans(sample, centers)
83  kmeans_instance.process()
84  clusters = kmeans_instance.get_clusters()
85 
86  # Calculate Silhouette score.
87  return silhouette(sample, clusters).process().get_score()
88 
89  def draw_score(figure, position, title, score):
90  ax = figure.add_subplot(position)
91  ax.bar(range(0, len(score)), score, width=0.7)
92  ax.set_title(title)
93  ax.set_xlim(0, len(score))
94  ax.set_xticklabels([])
95  ax.grid()
96 
97  # Read data 'SampleSimple3' from Simple Sample collection.
98  sample = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE3)
99 
100  # Perform cluster analysis and estimation by Silhouette.
101  score_2 = get_score(sample, 2) # K = 2 (amount of clusters).
102  score_4 = get_score(sample, 4) # K = 4 - optimal.
103  score_6 = get_score(sample, 6) # K = 6.
104  score_8 = get_score(sample, 8) # K = 8.
105 
106  # Visualize results.
107  figure = plt.figure()
108 
109  # Visualize each result separately.
110  draw_score(figure, 221, 'K = 2', score_2)
111  draw_score(figure, 222, 'K = 4 (optimal)', score_4)
112  draw_score(figure, 223, 'K = 6', score_6)
113  draw_score(figure, 224, 'K = 8', score_8)
114 
115  # Show a plot with visualized results.
116  plt.show()
117  @endcode
118 
119  There is visualized results that were done by Silhouette method. `K = 4` is the optimal amount of clusters in line
120  with Silhouette method because the score for each point is close to `1.0` and the average score for `K = 4` is
121  biggest value among others `K`.
122 
123  @image html silhouette_score_for_various_K.png "Fig. 1. Silhouette scores for various K."
124 
125  @see kmeans, kmedoids, kmedians, xmeans, elbow
126 
127  """
128 
129  def __init__(self, data, clusters, **kwargs):
130  """!
131  @brief Initializes Silhouette method for analysis.
132 
133  @param[in] data (array_like): Input data that was used for cluster analysis and that is presented as list of
134  points or distance matrix (defined by parameter 'data_type', by default data is considered as a list
135  of points).
136  @param[in] clusters (list): Clusters that have been obtained after cluster analysis.
137  @param[in] **kwargs: Arbitrary keyword arguments (available arguments: 'metric').
138 
139  <b>Keyword Args:</b><br>
140  - metric (distance_metric): Metric that was used for cluster analysis and should be used for Silhouette
141  score calculation (by default Square Euclidean distance).
142  - data_type (string): Data type of input sample 'data' that is processed by the algorithm ('points', 'distance_matrix').
143  - ccore (bool): If True then CCORE (C++ implementation of pyclustering library) is used (by default True).
144 
145  """
146  self.__data = data
147  self.__clusters = clusters
148  self.__metric = kwargs.get('metric', distance_metric(type_metric.EUCLIDEAN_SQUARE))
149  self.__data_type = kwargs.get('data_type', 'points')
150 
151  if self.__metric.get_type() != type_metric.USER_DEFINED:
152  self.__metric.enable_numpy_usage()
153  else:
154  self.__metric.disable_numpy_usage()
155 
156  self.__score = [0.0] * len(data)
157 
158  self.__ccore = kwargs.get('ccore', True) and self.__metric.get_type() != type_metric.USER_DEFINED
159  if self.__ccore:
160  self.__ccore = ccore_library.workable()
161 
162  if self.__ccore is False:
163  self.__data = numpy.array(data)
164 
165  self.__verify_arguments()
166 
167 
168  def process(self):
169  """!
170  @brief Calculates Silhouette score for each object from input data.
171 
172  @return (silhouette) Instance of the method (self).
173 
174  """
175  if self.__ccore is True:
176  self.__process_by_ccore()
177  else:
178  self.__process_by_python()
179 
180  return self
181 
182 
183  def __process_by_ccore(self):
184  """!
185  @brief Performs processing using CCORE (C/C++ part of pyclustering library).
186 
187  """
188  ccore_metric = metric_wrapper.create_instance(self.__metric)
189  self.__score = wrapper.silhoeutte(self.__data, self.__clusters, ccore_metric.get_pointer(), self.__data_type)
190 
191 
192  def __process_by_python(self):
193  """!
194  @brief Performs processing using python code.
195 
196  """
197  for index_cluster in range(len(self.__clusters)):
198  for index_point in self.__clusters[index_cluster]:
199  self.__score[index_point] = self.__calculate_score(index_point, index_cluster)
200 
201 
202  def get_score(self):
203  """!
204  @brief Returns Silhouette score for each object from input data.
205 
206  @see process
207 
208  """
209  return self.__score
210 
211 
212  def __calculate_score(self, index_point, index_cluster):
213  """!
214  @brief Calculates Silhouette score for the specific object defined by index_point.
215 
216  @param[in] index_point (uint): Index point from input data for which Silhouette score should be calculated.
217  @param[in] index_cluster (uint): Index cluster to which the point belongs to.
218 
219  @return (float) Silhouette score for the object.
220 
221  """
222  if self.__data_type == 'points':
223  difference = self.__calculate_dataset_difference(index_point)
224  else:
225  difference = self.__data[index_point]
226 
227  a_score = self.__calculate_within_cluster_score(index_cluster, difference)
228  b_score = self.__caclulate_optimal_neighbor_cluster_score(index_cluster, difference)
229 
230  return (b_score - a_score) / max(a_score, b_score)
231 
232 
233  def __calculate_within_cluster_score(self, index_cluster, difference):
234  """!
235  @brief Calculates 'A' score for the specific object in cluster to which it belongs to.
236 
237  @param[in] index_point (uint): Index point from input data for which 'A' score should be calculated.
238  @param[in] index_cluster (uint): Index cluster to which the point is belong to.
239 
240  @return (float) 'A' score for the object.
241 
242  """
243 
244  score = self.__calculate_cluster_difference(index_cluster, difference)
245  if len(self.__clusters[index_cluster]) == 1:
246  return float('nan')
247  return score / (len(self.__clusters[index_cluster]) - 1)
248 
249 
250  def __calculate_cluster_score(self, index_cluster, difference):
251  """!
252  @brief Calculates 'B*' score for the specific object for specific cluster.
253 
254  @param[in] index_point (uint): Index point from input data for which 'B*' score should be calculated.
255  @param[in] index_cluster (uint): Index cluster to which the point is belong to.
256 
257  @return (float) 'B*' score for the object for specific cluster.
258 
259  """
260 
261  score = self.__calculate_cluster_difference(index_cluster, difference)
262  return score / len(self.__clusters[index_cluster])
263 
264 
265  def __caclulate_optimal_neighbor_cluster_score(self, index_cluster, difference):
266  """!
267  @brief Calculates 'B' score for the specific object for the nearest cluster.
268 
269  @param[in] index_point (uint): Index point from input data for which 'B' score should be calculated.
270  @param[in] index_cluster (uint): Index cluster to which the point is belong to.
271 
272  @return (float) 'B' score for the object.
273 
274  """
275 
276  optimal_score = float('inf')
277  for index_neighbor_cluster in range(len(self.__clusters)):
278  if index_cluster != index_neighbor_cluster:
279  candidate_score = self.__calculate_cluster_score(index_neighbor_cluster, difference)
280  if candidate_score < optimal_score:
281  optimal_score = candidate_score
282 
283  if optimal_score == float('inf'):
284  optimal_score = -1.0
285 
286  return optimal_score
287 
288 
289  def __calculate_cluster_difference(self, index_cluster, difference):
290  """!
291  @brief Calculates distance from each object in specified cluster to specified object.
292 
293  @param[in] index_point (uint): Index point for which difference is calculated.
294 
295  @return (list) Distance from specified object to each object from input data in specified cluster.
296 
297  """
298  cluster_difference = 0.0
299  for index_point in self.__clusters[index_cluster]:
300  cluster_difference += difference[index_point]
301 
302  return cluster_difference
303 
304 
305  def __calculate_dataset_difference(self, index_point):
306  """!
307  @brief Calculate distance from each object to specified object.
308 
309  @param[in] index_point (uint): Index point for which difference with other points is calculated.
310 
311  @return (list) Distance to each object from input data from the specified.
312 
313  """
314 
315  if self.__metric.get_type() != type_metric.USER_DEFINED:
316  dataset_differences = self.__metric(self.__data, self.__data[index_point])
317  else:
318  dataset_differences = [self.__metric(point, self.__data[index_point]) for point in self.__data]
319 
320  return dataset_differences
321 
322 
323  def __verify_arguments(self):
324  """!
325  @brief Verify input parameters for the algorithm and throw exception in case of incorrectness.
326 
327  """
328  if len(self.__data) == 0:
329  raise ValueError("Input data is empty (size: '%d')." % len(self.__data))
330 
331  if len(self.__clusters) == 0:
332  raise ValueError("Input clusters are empty (size: '%d')." % len(self.__clusters))
333 
334 
335 
336 class silhouette_ksearch_type(IntEnum):
337  """!
338  @brief Defines algorithms that can be used to find optimal number of cluster using Silhouette method.
339 
340  @see silhouette_ksearch
341 
342  """
343 
344 
345  KMEANS = 0
346 
347 
348  KMEDIANS = 1
349 
350 
351  KMEDOIDS = 2
352 
353  def get_type(self):
354  """!
355  @brief Returns algorithm type that corresponds to specified enumeration value.
356 
357  @return (type) Algorithm type for cluster analysis.
358 
359  """
360  if self == silhouette_ksearch_type.KMEANS:
361  return kmeans
362  elif self == silhouette_ksearch_type.KMEDIANS:
363  return kmedians
364  elif self == silhouette_ksearch_type.KMEDOIDS:
365  return kmedoids
366  else:
367  return None
368 
369 
370 
372  """!
373  @brief Represent algorithm for searching optimal number of clusters using specified K-algorithm (K-Means,
374  K-Medians, K-Medoids) that is based on Silhouette method.
375 
376  @details This algorithm uses average value of scores for estimation and applicable for clusters that are well
377  separated. Here is an example where clusters are well separated (sample 'Hepta'):
378  @code
379  from pyclustering.cluster import cluster_visualizer
380  from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
381  from pyclustering.cluster.kmeans import kmeans
382  from pyclustering.cluster.silhouette import silhouette_ksearch_type, silhouette_ksearch
383  from pyclustering.samples.definitions import FCPS_SAMPLES
384  from pyclustering.utils import read_sample
385 
386  sample = read_sample(FCPS_SAMPLES.SAMPLE_HEPTA)
387  search_instance = silhouette_ksearch(sample, 2, 10, algorithm=silhouette_ksearch_type.KMEANS).process()
388 
389  amount = search_instance.get_amount()
390  scores = search_instance.get_scores()
391 
392  print("Scores: '%s'" % str(scores))
393 
394  initial_centers = kmeans_plusplus_initializer(sample, amount).initialize()
395  kmeans_instance = kmeans(sample, initial_centers).process()
396 
397  clusters = kmeans_instance.get_clusters()
398 
399  visualizer = cluster_visualizer()
400  visualizer.append_clusters(clusters, sample)
401  visualizer.show()
402  @endcode
403 
404  Obtained Silhouette scores for each K:
405  @code
406  Scores: '{2: 0.418434, 3: 0.450906, 4: 0.534709, 5: 0.689970, 6: 0.588460, 7: 0.882674, 8: 0.804725, 9: 0.780189}'
407  @endcode
408 
409  K = 7 has the bigger average Silhouette score and it means that it is optimal amount of clusters:
410  @image html silhouette_ksearch_hepta.png "Silhouette ksearch's analysis with further K-Means clustering (sample 'Hepta')."
411 
412  @see silhouette_ksearch_type
413 
414  """
415 
416  def __init__(self, data, kmin, kmax, **kwargs):
417  """!
418  @brief Initialize Silhouette search algorithm to find out optimal amount of clusters.
419 
420  @param[in] data (array_like): Input data that is used for searching optimal amount of clusters.
421  @param[in] kmin (uint): Minimum amount of clusters that might be allocated. Should be equal or greater than `2`.
422  @param[in] kmax (uint): Maximum amount of clusters that might be allocated. Should be equal or less than amount
423  of points in input data.
424  @param[in] **kwargs: Arbitrary keyword arguments (available arguments: `algorithm`, `random_state`).
425 
426  <b>Keyword Args:</b><br>
427  - algorithm (silhouette_ksearch_type): Defines algorithm that is used for searching optimal number of
428  clusters (by default K-Means).
429  - ccore (bool): If True then CCORE (C++ implementation of pyclustering library) is used (by default True).
430 
431  """
432  self.__data = data
433  self.__kmin = kmin
434  self.__kmax = kmax
435 
436  self.__algorithm = kwargs.get('algorithm', silhouette_ksearch_type.KMEANS)
437  self.__random_state = kwargs.get('random_state', None)
438  self.__return_index = self.__algorithm == silhouette_ksearch_type.KMEDOIDS
439 
440  self.__amount = -1
441  self.__score = -1.0
442  self.__scores = {}
443 
444  self.__verify_arguments()
445 
446  self.__ccore = kwargs.get('ccore', True)
447  if self.__ccore:
448  self.__ccore = ccore_library.workable()
449 
450 
451  def process(self):
452  """!
453  @brief Performs analysis to find optimal amount of clusters.
454 
455  @see get_amount, get_score, get_scores
456 
457  @return (silhouette_search) Itself instance (silhouette_search)
458 
459  """
460  if self.__ccore is True:
461  self.__process_by_ccore()
462  else:
463  self.__process_by_python()
464 
465  return self
466 
467 
468  def __process_by_ccore(self):
469  """!
470  @brief Performs processing using CCORE (C/C++ part of pyclustering library).
471 
472  """
473  results = wrapper.silhoeutte_ksearch(self.__data, self.__kmin, self.__kmax, self.__algorithm, self.__random_state)
474 
475  self.__amount = results[0]
476  self.__score = results[1]
477 
478  scores_list = results[2]
479  self.__scores = {}
480  for i in range(len(scores_list)):
481  self.__scores[self.__kmin + i] = scores_list[i]
482 
483 
484  def __process_by_python(self):
485  """!
486  @brief Performs processing using python code.
487 
488  """
489  self.__scores = {}
490 
491  for k in range(self.__kmin, self.__kmax):
492  clusters = self.__calculate_clusters(k)
493  if len(clusters) != k:
494  self.__scores[k] = float('nan')
495  continue
496 
497  score = silhouette(self.__data, clusters).process().get_score()
498 
499  self.__scores[k] = sum(score) / len(score)
500 
501  if self.__scores[k] > self.__score:
502  self.__score = self.__scores[k]
503  self.__amount = k
504 
505 
506  def get_amount(self):
507  """!
508  @brief Returns optimal amount of clusters that has been found during analysis.
509 
510  @return (uint) Optimal amount of clusters.
511 
512  @see process
513 
514  """
515  return self.__amount
516 
517 
518  def get_score(self):
519  """!
520  @brief Returns silhouette score that belongs to optimal amount of clusters (k).
521 
522  @return (float) Score that belong to optimal amount of clusters.
523 
524  @see process, get_scores
525 
526  """
527  return self.__score
528 
529 
530  def get_scores(self):
531  """!
532  @brief Returns silhouette score for each K value (amount of clusters).
533 
534  @return (dict) Silhouette score for each K value, where key is a K value and value is a silhouette score.
535 
536  @see process, get_score
537 
538  """
539  return self.__scores
540 
541 
542  def __calculate_clusters(self, k):
543  """!
544  @brief Performs cluster analysis using specified K value.
545 
546  @param[in] k (uint): Amount of clusters that should be allocated.
547 
548  @return (array_like) Allocated clusters.
549 
550  """
551  initial_values = kmeans_plusplus_initializer(self.__data, k, random_state=self.__random_state).initialize(return_index=self.__return_index)
552  algorithm_type = self.__algorithm.get_type()
553  return algorithm_type(self.__data, initial_values).process().get_clusters()
554 
555 
556  def __verify_arguments(self):
557  """!
558  @brief Checks algorithm's arguments and if some of them is incorrect then exception is thrown.
559 
560  """
561  if self.__kmax > len(self.__data):
562  raise ValueError("K max value '" + str(self.__kmax) + "' is bigger than amount of objects '" +
563  str(len(self.__data)) + "' in input data.")
564 
565  if self.__kmin <= 1:
566  raise ValueError("K min value '" + str(self.__kmin) + "' should be greater than 1 (impossible to provide "
567  "silhouette score for only one cluster).")
pyclustering.cluster.center_initializer.kmeans_plusplus_initializer
K-Means++ is an algorithm for choosing the initial centers for algorithms like K-Means or X-Means.
Definition: center_initializer.py:95
pyclustering.cluster.silhouette.silhouette_ksearch.__random_state
__random_state
Definition: silhouette.py:437
pyclustering.cluster.kmedoids
Cluster analysis algorithm: K-Medoids.
Definition: kmedoids.py:1
pyclustering.cluster.silhouette.silhouette.__calculate_cluster_score
def __calculate_cluster_score(self, index_cluster, difference)
Calculates 'B*' score for the specific object for specific cluster.
Definition: silhouette.py:250
pyclustering.cluster.silhouette.silhouette_ksearch
Represent algorithm for searching optimal number of clusters using specified K-algorithm (K-Means,...
Definition: silhouette.py:371
pyclustering.cluster.silhouette.silhouette.__calculate_dataset_difference
def __calculate_dataset_difference(self, index_point)
Calculate distance from each object to specified object.
Definition: silhouette.py:305
pyclustering.cluster.silhouette.silhouette.get_score
def get_score(self)
Returns Silhouette score for each object from input data.
Definition: silhouette.py:202
pyclustering.cluster.center_initializer
Collection of center initializers for algorithm that uses initial centers, for example,...
Definition: center_initializer.py:1
pyclustering.cluster.silhouette.silhouette.__ccore
__ccore
Definition: silhouette.py:158
pyclustering.cluster.silhouette.silhouette_ksearch.__ccore
__ccore
Definition: silhouette.py:446
pyclustering.cluster.silhouette.silhouette_ksearch_type.get_type
def get_type(self)
Returns algorithm type that corresponds to specified enumeration value.
Definition: silhouette.py:353
pyclustering.cluster.silhouette.silhouette_ksearch.__process_by_ccore
def __process_by_ccore(self)
Performs processing using CCORE (C/C++ part of pyclustering library).
Definition: silhouette.py:468
pyclustering.cluster.silhouette.silhouette.__score
__score
Definition: silhouette.py:156
pyclustering.cluster.silhouette.silhouette_ksearch.__calculate_clusters
def __calculate_clusters(self, k)
Performs cluster analysis using specified K value.
Definition: silhouette.py:542
pyclustering.cluster.silhouette.silhouette.__caclulate_optimal_neighbor_cluster_score
def __caclulate_optimal_neighbor_cluster_score(self, index_cluster, difference)
Calculates 'B' score for the specific object for the nearest cluster.
Definition: silhouette.py:265
pyclustering.cluster.silhouette.silhouette.__init__
def __init__(self, data, clusters, **kwargs)
Initializes Silhouette method for analysis.
Definition: silhouette.py:129
pyclustering.cluster.silhouette.silhouette.__data_type
__data_type
Definition: silhouette.py:149
pyclustering.cluster.silhouette.silhouette_ksearch.__return_index
__return_index
Definition: silhouette.py:438
pyclustering.utils.metric.distance_metric
Distance metric performs distance calculation between two points in line with encapsulated function,...
Definition: metric.py:52
pyclustering.cluster.silhouette.silhouette_ksearch.__score
__score
Definition: silhouette.py:441
pyclustering.cluster.silhouette.silhouette.__clusters
__clusters
Definition: silhouette.py:147
pyclustering.cluster.silhouette.silhouette_ksearch.__verify_arguments
def __verify_arguments(self)
Checks algorithm's arguments and if some of them is incorrect then exception is thrown.
Definition: silhouette.py:556
pyclustering.cluster.silhouette.silhouette.__process_by_python
def __process_by_python(self)
Performs processing using python code.
Definition: silhouette.py:192
pyclustering.cluster.silhouette.silhouette.__process_by_ccore
def __process_by_ccore(self)
Performs processing using CCORE (C/C++ part of pyclustering library).
Definition: silhouette.py:183
pyclustering.cluster.silhouette.silhouette_ksearch.get_scores
def get_scores(self)
Returns silhouette score for each K value (amount of clusters).
Definition: silhouette.py:530
pyclustering.cluster.kmeans
The module contains K-Means algorithm and other related services.
Definition: kmeans.py:1
pyclustering.cluster.silhouette.silhouette.__data
__data
Definition: silhouette.py:146
pyclustering.cluster.silhouette.silhouette_ksearch.__data
__data
Definition: silhouette.py:432
pyclustering.cluster.silhouette.silhouette_ksearch.__amount
__amount
Definition: silhouette.py:440
pyclustering.cluster.silhouette.silhouette_ksearch.__init__
def __init__(self, data, kmin, kmax, **kwargs)
Initialize Silhouette search algorithm to find out optimal amount of clusters.
Definition: silhouette.py:416
pyclustering.cluster.silhouette.silhouette_ksearch.__process_by_python
def __process_by_python(self)
Performs processing using python code.
Definition: silhouette.py:484
pyclustering.cluster.silhouette.silhouette_ksearch.get_amount
def get_amount(self)
Returns optimal amount of clusters that has been found during analysis.
Definition: silhouette.py:506
pyclustering.cluster.silhouette.silhouette.__calculate_within_cluster_score
def __calculate_within_cluster_score(self, index_cluster, difference)
Calculates 'A' score for the specific object in cluster to which it belongs to.
Definition: silhouette.py:233
pyclustering.cluster.silhouette.silhouette_ksearch_type
Defines algorithms that can be used to find optimal number of cluster using Silhouette method.
Definition: silhouette.py:336
pyclustering.cluster.silhouette.silhouette_ksearch.__kmin
__kmin
Definition: silhouette.py:433
pyclustering.cluster.silhouette.silhouette.__calculate_cluster_difference
def __calculate_cluster_difference(self, index_cluster, difference)
Calculates distance from each object in specified cluster to specified object.
Definition: silhouette.py:289
pyclustering.cluster.silhouette.silhouette_ksearch.__kmax
__kmax
Definition: silhouette.py:434
pyclustering.cluster.silhouette.silhouette
Represents Silhouette method that is used interpretation and validation of consistency.
Definition: silhouette.py:30
pyclustering.cluster.silhouette.silhouette_ksearch.__scores
__scores
Definition: silhouette.py:442
pyclustering.cluster.silhouette.silhouette.process
def process(self)
Calculates Silhouette score for each object from input data.
Definition: silhouette.py:168
pyclustering.cluster.silhouette.silhouette.__calculate_score
def __calculate_score(self, index_point, index_cluster)
Calculates Silhouette score for the specific object defined by index_point.
Definition: silhouette.py:212
pyclustering.cluster.silhouette.silhouette_ksearch.process
def process(self)
Performs analysis to find optimal amount of clusters.
Definition: silhouette.py:451
pyclustering.cluster.silhouette.silhouette_ksearch.__algorithm
__algorithm
Definition: silhouette.py:436
pyclustering.cluster.silhouette.silhouette.__metric
__metric
Definition: silhouette.py:148
pyclustering.utils.metric
Module provides various distance metrics - abstraction of the notion of distance in a metric space.
Definition: metric.py:1
pyclustering.cluster.kmedians
Cluster analysis algorithm: K-Medians.
Definition: kmedians.py:1
pyclustering.cluster.silhouette.silhouette_ksearch.get_score
def get_score(self)
Returns silhouette score that belongs to optimal amount of clusters (k).
Definition: silhouette.py:518
pyclustering.cluster.silhouette.silhouette.__verify_arguments
def __verify_arguments(self)
Verify input parameters for the algorithm and throw exception in case of incorrectness.
Definition: silhouette.py:323