Class represents K-Means clustering algorithm. More...

Public Member Functions
def	__init__ (self, data, initial_centers, tolerance=0.001, ccore=True, kwargs)
	Constructor of clustering algorithm K-Means. More...

def	process (self)
	Performs cluster analysis in line with rules of K-Means algorithm. More...

def	get_clusters (self)
	Returns list of allocated clusters, each cluster contains indexes of objects in list of data. More...

def	get_centers (self)
	Returns list of centers of allocated clusters. More...

def	get_total_wce (self)
	Returns sum of metric errors that depends on metric that was used for clustering (by default SSE - Sum of Squared Errors). More...

def	get_cluster_encoding (self)
	Returns clustering result representation type that indicate how clusters are encoded. More...

Detailed Description

Class represents K-Means clustering algorithm.

CCORE option can be used to use the pyclustering core - C/C++ shared library for processing that significantly increases performance.

CCORE implementation of the algorithm uses thread pool to parallelize the clustering process.

K-Means clustering results depend on initial centers. Algorithm K-Means++ can used for initialization initial centers from module 'pyclustering.cluster.center_initializer'.

K-Means clustering results. At the left - 'Simple03.data' sample, at the right - 'Lsun.data' sample.

Example #1 - Trivial clustering:

# load list of points for cluster analysis
sample = read_sample(path)
# create instance of K-Means algorithm
kmeans_instance = kmeans(sample, [ [0.0, 0.1], [2.5, 2.6] ])
# run cluster analysis and obtain results
kmeans_instance.process()
clusters = kmeans_instance.get_clusters()

Example #2 - Clustering using K-Means++ for center initialization:

# load list of points for cluster analysis
sample = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE2)
# initialize initial centers using K-Means++ method
initial_centers = kmeans_plusplus_initializer(sample, 3).initialize()
# create instance of K-Means algorithm with prepared centers
kmeans_instance = kmeans(sample, initial_centers)
# run cluster analysis and obtain results
kmeans_instance.process()
clusters = kmeans_instance.get_clusters()
final_centers = kmeans_instance.get_centers()

See also: center_initializer

Definition at line 272 of file kmeans.py.

Constructor & Destructor Documentation

◆ init()

def pyclustering.cluster.kmeans.kmeans.__init__	(	self,
		data,
		initial_centers,
		tolerance = `0.001`,
		ccore = `True`,
		kwargs
	)

Constructor of clustering algorithm K-Means.

Center initializer can be used for creating initial centers, for example, K-Means++ method.

Parameters

[in]	data	(array_like): Input data that is presented as array of points (objects), each point should be represented by array_like data structure.
[in]	initial_centers	(array_like): Initial coordinates of centers of clusters that are represented by array_like data structure: [center1, center2, ...].
[in]	tolerance	(double): Stop condition: if maximum value of change of centers of clusters is less than tolerance then algorithm stops processing.
[in]	ccore	(bool): Defines should be CCORE library (C++ pyclustering library) used instead of Python code or not.
[in]	**kwargs	Arbitrary keyword arguments (available arguments: 'observer', 'metric', 'itermax').

Keyword Args:

observer (kmeans_observer): Observer of the algorithm to collect information about clustering process on each iteration.
metric (distance_metric): Metric that is used for distance calculation between two points (by default euclidean square distance).
itermax (uint): Maximum number of iterations that is used for clustering process (by default: 200).