Class represents K-Means clustering algorithm. More...

Public Member Functions
def	__init__ (self, data, initial_centers, tolerance=0.001, ccore=True, kwargs)
	Constructor of clustering algorithm K-Means. More...

def	process (self)
	Performs cluster analysis in line with rules of K-Means algorithm. More...

def	get_clusters (self)
	Returns list of allocated clusters, each cluster contains indexes of objects in list of data. More...

def	get_centers (self)
	Returns list of centers of allocated clusters. More...

def	get_total_wce (self)
	Returns sum of metric errors that depends on metric that was used for clustering (by default SSE - Sum of Squared Errors). More...

def	get_cluster_encoding (self)
	Returns clustering result representation type that indicate how clusters are encoded. More...

Detailed Description

Class represents K-Means clustering algorithm.

CCORE implementation of the algorithm uses thread pool to parallelize the clustering process.

K-Means clustering results depend on initial centers. Algorithm K-Means++ can used for initialization initial centers from module 'pyclustering.cluster.center_initializer'.

K-Means clustering results. At the left - 'Simple03.data' sample, at the right - 'Lsun.data' sample.

Example #1 - Clustering using K-Means++ for center initialization:

from pyclustering.cluster.kmeans import kmeans, kmeans_visualizer
from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
from pyclustering.samples.definitions import FCPS_SAMPLES
from pyclustering.utils import read_sample
# Load list of points for cluster analysis.
sample = read_sample(FCPS_SAMPLES.SAMPLE_TWO_DIAMONDS)
# Prepare initial centers using K-Means++ method.
initial_centers = kmeans_plusplus_initializer(sample, 2).initialize()
# Create instance of K-Means algorithm with prepared centers.
kmeans_instance = kmeans(sample, initial_centers)
# Run cluster analysis and obtain results.
kmeans_instance.process()
clusters = kmeans_instance.get_clusters()
final_centers = kmeans_instance.get_centers()
# Visualize obtained results
kmeans_visualizer.show_clusters(sample, clusters, final_centers)

Example #2 - Clustering using specific distance metric, for example, Manhattan distance:

# prepare input data and initial centers for cluster analysis using K-Means
# create metric that will be used for clustering
manhattan_metric = distance_metric(type_metric.MANHATTAN)
# create instance of K-Means using specific distance metric:
kmeans_instance = kmeans(sample, initial_centers, metric=manhattan_metric)
# run cluster analysis and obtain results
kmeans_instance.process()
clusters = kmeans_instance.get_clusters()

See also: center_initializer

Definition at line 272 of file kmeans.py.

Constructor & Destructor Documentation

◆ init()

def pyclustering.cluster.kmeans.kmeans.__init__	(	self,
		data,
		initial_centers,
		tolerance = `0.001`,
		ccore = `True`,
		kwargs
	)

Constructor of clustering algorithm K-Means.

Center initializer can be used for creating initial centers, for example, K-Means++ method.

Parameters

[in]	data	(array_like): Input data that is presented as array of points (objects), each point should be represented by array_like data structure.
[in]	initial_centers	(array_like): Initial coordinates of centers of clusters that are represented by array_like data structure: [center1, center2, ...].
[in]	tolerance	(double): Stop condition: if maximum value of change of centers of clusters is less than tolerance then algorithm stops processing.
[in]	ccore	(bool): Defines should be CCORE library (C++ pyclustering library) used instead of Python code or not.
[in]	**kwargs	Arbitrary keyword arguments (available arguments: 'observer', 'metric', 'itermax').

Keyword Args:

observer (kmeans_observer): Observer of the algorithm to collect information about clustering process on each iteration.
metric (distance_metric): Metric that is used for distance calculation between two points (by default euclidean square distance).
itermax (uint): Maximum number of iterations that is used for clustering process (by default: 200).