Represents Silhouette method that is used interpretation and validation of consistency. More...

Public Member Functions
def	__init__ (self, data, clusters, **kwargs)
	Initializes Silhouette method for analysis. More...

def	process (self)
	Calculates Silhouette score for each object from input data. More...

def	get_score (self)
	Returns Silhouette score for each object from input data. More...

Detailed Description

Represents Silhouette method that is used interpretation and validation of consistency.

The silhouette value is a measure of how similar an object is to its own cluster compared to other clusters. Be aware that silhouette method is applicable for K algorithm family, such as K-Means, K-Medians, K-Medoids, X-Means, etc., not not applicable for DBSCAN, OPTICS, CURE, etc. The Silhouette value is calculated using following formula:

\[s\left ( i \right )=\frac{ b\left ( i \right ) - a\left ( i \right ) }{ max\left \{ a\left ( i \right ), b\left ( i \right ) \right \}}\]

where \(a\left ( i \right )\) - is average distance from object i to objects in its own cluster, \(b\left ( i \right )\) - is average distance from object i to objects in the nearest cluster (the appropriate among other clusters).

Here is an example where Silhouette score is calculated for K-Means's clustering result:

from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
from pyclustering.cluster.kmeans import kmeans
from pyclustering.cluster.silhouette import silhouette
 
from pyclustering.samples.definitions import SIMPLE_SAMPLES
from pyclustering.utils import read_sample
 
# Read data 'SampleSimple3' from Simple Sample collection.
sample = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE3)
 
# Prepare initial centers
centers = kmeans_plusplus_initializer(sample, 4).initialize()
 
# Perform cluster analysis
kmeans_instance = kmeans(sample, centers)
kmeans_instance.process()
clusters = kmeans_instance.get_clusters()
 
# Calculate Silhouette score
score = silhouette(sample, clusters).process().get_score()

Let's perform clustering of the same sample by K-Means algorithm using different K values (2, 4, 6 and 8) and estimate clustering results using Silhouette method.

from pyclustering.cluster.kmeans import kmeans
from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
from pyclustering.cluster.silhouette import silhouette
 
from pyclustering.samples.definitions import SIMPLE_SAMPLES
from pyclustering.utils import read_sample
 
import matplotlib.pyplot as plt
 
def get_score(sample, amount_clusters):
    # Prepare initial centers for K-Means algorithm.
    centers = kmeans_plusplus_initializer(sample, amount_clusters).initialize()
 
    # Perform cluster analysis.
    kmeans_instance = kmeans(sample, centers)
    kmeans_instance.process()
    clusters = kmeans_instance.get_clusters()
 
    # Calculate Silhouette score.
    return silhouette(sample, clusters).process().get_score()
 
def draw_score(figure, position, title, score):
    ax = figure.add_subplot(position)
    ax.bar(range(0, len(score)), score, width=0.7)
    ax.set_title(title)
    ax.set_xlim(0, len(score))
    ax.set_xticklabels([])
    ax.grid()
 
# Read data 'SampleSimple3' from Simple Sample collection.
sample = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE3)
 
# Perform cluster analysis and estimation by Silhouette.
score_2 = get_score(sample, 2)  # K = 2 (amount of clusters).
score_4 = get_score(sample, 4)  # K = 4 - optimal.
score_6 = get_score(sample, 6)  # K = 6.
score_8 = get_score(sample, 8)  # K = 8.
 
# Visualize results.
figure = plt.figure()
 
# Visualize each result separately.
draw_score(figure, 221, 'K = 2', score_2)
draw_score(figure, 222, 'K = 4 (optimal)', score_4)
draw_score(figure, 223, 'K = 6', score_6)
draw_score(figure, 224, 'K = 8', score_8)
 
# Show a plot with visualized results.
plt.show()

There is visualized results that were done by Silhouette method. K = 4 is the optimal amount of clusters in line with Silhouette method because the score for each point is close to 1.0 and the average score for K = 4 is biggest value among others K.

Fig. 1. Silhouette scores for various K.

See also: kmeans, kmedoids, kmedians, xmeans, elbow

Definition at line 30 of file silhouette.py.

Constructor & Destructor Documentation

◆ init()

def pyclustering.cluster.silhouette.silhouette.__init__	(		self,
			data,
			clusters,
		**	kwargs
	)

Initializes Silhouette method for analysis.

Parameters

[in]	data	(array_like): Input data that was used for cluster analysis and that is presented as list of points or distance matrix (defined by parameter 'data_type', by default data is considered as a list of points).
[in]	clusters	(list): Clusters that have been obtained after cluster analysis.
[in]	**kwargs	Arbitrary keyword arguments (available arguments: 'metric').

Keyword Args:

metric (distance_metric): Metric that was used for cluster analysis and should be used for Silhouette score calculation (by default Square Euclidean distance).
data_type (string): Data type of input sample 'data' that is processed by the algorithm ('points', 'distance_matrix').
ccore (bool): If True then CCORE (C++ implementation of pyclustering library) is used (by default True).

Definition at line 129 of file silhouette.py.

Member Function Documentation

◆ get_score()

def pyclustering.cluster.silhouette.silhouette.get_score ( self )

Returns Silhouette score for each object from input data.

See also: process

Definition at line 202 of file silhouette.py.

◆ process()

def pyclustering.cluster.silhouette.silhouette.process ( self )

Calculates Silhouette score for each object from input data.

Returns: (silhouette) Instance of the method (self).

Definition at line 168 of file silhouette.py.

The documentation for this class was generated from the following file:

pyclustering/cluster/silhouette.py