pyclustering.cluster.silhouette.silhouette Class Reference

Represents Silhouette method that is used interpretation and validation of consistency. More...

Public Member Functions

def __init__ (self, data, clusters, kwargs)
 Initializes Silhouette method for analysis. More...
 
def process (self)
 Calculates Silhouette score for each object from input data. More...
 
def get_score (self)
 Returns Silhouette score for each object from input data. More...
 

Detailed Description

Represents Silhouette method that is used interpretation and validation of consistency.

The silhouette value is a measure of how similar an object is to its own cluster compared to other clusters. Be aware that silhouette method is applicable for K algorithm family, such as K-Means, K-Medians, K-Medoids, X-Means, etc., not not applicable for DBSCAN, OPTICS, CURE, etc. The Silhouette value is calculated using following formula:

\[s\left ( i \right )=\frac{ b\left ( i \right ) - a\left ( i \right ) }{ max\left \{ a\left ( i \right ), b\left ( i \right ) \right \}}\]

where $a\left ( i \right )$ - is average distance from object i to objects in its own cluster, $b\left ( i \right )$ - is average distance from object i to objects in the nearest cluster (the appropriate among other clusters).

Here is an example where Silhouette score is calculated for K-Means's clustering result:

from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
from pyclustering.cluster.kmeans import kmeans
from pyclustering.cluster.silhouette import silhouette
from pyclustering.samples.definitions import SIMPLE_SAMPLES
from pyclustering.utils import read_sample
# Read data 'SampleSimple3' from Simple Sample collection.
sample = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE3);
# Prepare initial centers
centers = kmeans_plusplus_initializer(sample, 4).initialize();
# Perform cluster analysis
kmeans_instance = kmeans(sample, centers);
kmeans_instance.process();
clusters = kmeans_instance.get_clusters();
# Calculate Silhouette score
score = silhouette(sample, clusters).process().get_score()
See also
kmeans, kmedoids, kmedians, xmeans, elbow

Definition at line 45 of file silhouette.py.

Constructor & Destructor Documentation

◆ __init__()

def pyclustering.cluster.silhouette.silhouette.__init__ (   self,
  data,
  clusters,
  kwargs 
)

Initializes Silhouette method for analysis.

Parameters
[in]data(array_like): Input data that was used for cluster analysis and that is presented as list of points or distance matrix (defined by parameter 'data_type', by default data is considered as a list of points).
[in]clusters(list): Cluster that have been obtained after cluster analysis.
[in]**kwargsArbitrary keyword arguments (available arguments: 'metric').

Keyword Args:

  • metric (distance_metric): Metric that was used for cluster analysis and should be used for Silhouette score calculation (by default Square Euclidean distance).
  • data_type (string): Data type of input sample 'data' that is processed by the algorithm ('points', 'distance_matrix').
  • ccore (bool): If True then CCORE (C++ implementation of pyclustering library) is used (by default True).

Definition at line 84 of file silhouette.py.

Member Function Documentation

◆ get_score()

def pyclustering.cluster.silhouette.silhouette.get_score (   self)

Returns Silhouette score for each object from input data.

See also
process

Definition at line 157 of file silhouette.py.

◆ process()

def pyclustering.cluster.silhouette.silhouette.process (   self)

Calculates Silhouette score for each object from input data.

Returns
(silhouette) Instance of the method (self).

Definition at line 123 of file silhouette.py.


The documentation for this class was generated from the following file: