pyclustering
0.10.1
pyclustring is a Python, C++ data mining library.

Represents Silhouette method that is used interpretation and validation of consistency. More...
Public Member Functions  
def  __init__ (self, data, clusters, **kwargs) 
Initializes Silhouette method for analysis. More...  
def  process (self) 
Calculates Silhouette score for each object from input data. More...  
def  get_score (self) 
Returns Silhouette score for each object from input data. More...  
Represents Silhouette method that is used interpretation and validation of consistency.
The silhouette value is a measure of how similar an object is to its own cluster compared to other clusters. Be aware that silhouette method is applicable for K algorithm family, such as KMeans, KMedians, KMedoids, XMeans, etc., not not applicable for DBSCAN, OPTICS, CURE, etc. The Silhouette value is calculated using following formula:
\[s\left ( i \right )=\frac{ b\left ( i \right )  a\left ( i \right ) }{ max\left \{ a\left ( i \right ), b\left ( i \right ) \right \}}\]
where \(a\left ( i \right )\)  is average distance from object i to objects in its own cluster, \(b\left ( i \right )\)  is average distance from object i to objects in the nearest cluster (the appropriate among other clusters).
Here is an example where Silhouette score is calculated for KMeans's clustering result:
Let's perform clustering of the same sample by KMeans algorithm using different K
values (2, 4, 6 and 8) and estimate clustering results using Silhouette method.
There is visualized results that were done by Silhouette method. K = 4
is the optimal amount of clusters in line with Silhouette method because the score for each point is close to 1.0
and the average score for K = 4
is biggest value among others K
.
Definition at line 30 of file silhouette.py.
def pyclustering.cluster.silhouette.silhouette.__init__  (  self,  
data,  
clusters,  
**  kwargs  
) 
Initializes Silhouette method for analysis.
[in]  data  (array_like): Input data that was used for cluster analysis and that is presented as list of points or distance matrix (defined by parameter 'data_type', by default data is considered as a list of points). 
[in]  clusters  (list): Clusters that have been obtained after cluster analysis. 
[in]  **kwargs  Arbitrary keyword arguments (available arguments: 'metric'). 
Keyword Args:
Definition at line 129 of file silhouette.py.
def pyclustering.cluster.silhouette.silhouette.get_score  (  self  ) 
Returns Silhouette score for each object from input data.
Definition at line 202 of file silhouette.py.
def pyclustering.cluster.silhouette.silhouette.process  (  self  ) 
Calculates Silhouette score for each object from input data.
Definition at line 168 of file silhouette.py.