pyclustering.cluster.kmeans.kmeans Class Reference

Class represents K-Means clustering algorithm. More...

Public Member Functions

def __init__ (self, data, initial_centers, tolerance=0.001, ccore=True, kwargs)
 Constructor of clustering algorithm K-Means. More...
 
def process (self)
 Performs cluster analysis in line with rules of K-Means algorithm. More...
 
def get_clusters (self)
 Returns list of allocated clusters, each cluster contains indexes of objects in list of data. More...
 
def get_centers (self)
 Returns list of centers of allocated clusters. More...
 
def get_total_wce (self)
 Returns sum of metric errors that depends on metric that was used for clustering (by default SSE - Sum of Squared Errors). More...
 
def get_cluster_encoding (self)
 Returns clustering result representation type that indicate how clusters are encoded. More...
 

Detailed Description

Class represents K-Means clustering algorithm.

CCORE implementation of the algorithm uses thread pool to parallelize the clustering process.

K-Means clustering results depend on initial centers. Algorithm K-Means++ can used for initialization initial centers from module 'pyclustering.cluster.center_initializer'.

kmeans_example_clustering.png
K-Means clustering results. At the left - 'Simple03.data' sample, at the right - 'Lsun.data' sample.

Example #1 - Clustering using K-Means++ for center initialization:

from pyclustering.cluster.kmeans import kmeans, kmeans_visualizer
from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
from pyclustering.samples.definitions import FCPS_SAMPLES
from pyclustering.utils import read_sample
# Load list of points for cluster analysis.
sample = read_sample(FCPS_SAMPLES.SAMPLE_TWO_DIAMONDS)
# Prepare initial centers using K-Means++ method.
initial_centers = kmeans_plusplus_initializer(sample, 2).initialize()
# Create instance of K-Means algorithm with prepared centers.
kmeans_instance = kmeans(sample, initial_centers)
# Run cluster analysis and obtain results.
kmeans_instance.process()
clusters = kmeans_instance.get_clusters()
final_centers = kmeans_instance.get_centers()
# Visualize obtained results
kmeans_visualizer.show_clusters(sample, clusters, final_centers)

Example #2 - Clustering using specific distance metric, for example, Manhattan distance:

# prepare input data and initial centers for cluster analysis using K-Means
# create metric that will be used for clustering
manhattan_metric = distance_metric(type_metric.MANHATTAN)
# create instance of K-Means using specific distance metric:
kmeans_instance = kmeans(sample, initial_centers, metric=manhattan_metric)
# run cluster analysis and obtain results
kmeans_instance.process()
clusters = kmeans_instance.get_clusters()
See also
center_initializer

Definition at line 272 of file kmeans.py.

Constructor & Destructor Documentation

◆ __init__()

def pyclustering.cluster.kmeans.kmeans.__init__ (   self,
  data,
  initial_centers,
  tolerance = 0.001,
  ccore = True,
  kwargs 
)

Constructor of clustering algorithm K-Means.

Center initializer can be used for creating initial centers, for example, K-Means++ method.

Parameters
[in]data(array_like): Input data that is presented as array of points (objects), each point should be represented by array_like data structure.
[in]initial_centers(array_like): Initial coordinates of centers of clusters that are represented by array_like data structure: [center1, center2, ...].
[in]tolerance(double): Stop condition: if maximum value of change of centers of clusters is less than tolerance then algorithm stops processing.
[in]ccore(bool): Defines should be CCORE library (C++ pyclustering library) used instead of Python code or not.
[in]**kwargsArbitrary keyword arguments (available arguments: 'observer', 'metric', 'itermax').

Keyword Args:

  • observer (kmeans_observer): Observer of the algorithm to collect information about clustering process on each iteration.
  • metric (distance_metric): Metric that is used for distance calculation between two points (by default euclidean square distance).
  • itermax (uint): Maximum number of iterations that is used for clustering process (by default: 200).
See also
center_initializer

Definition at line 326 of file kmeans.py.

Member Function Documentation

◆ get_centers()

def pyclustering.cluster.kmeans.kmeans.get_centers (   self)

Returns list of centers of allocated clusters.

See also
process()
get_clusters()

Definition at line 447 of file kmeans.py.

◆ get_cluster_encoding()

def pyclustering.cluster.kmeans.kmeans.get_cluster_encoding (   self)

Returns clustering result representation type that indicate how clusters are encoded.

Returns
(type_encoding) Clustering result representation.
See also
get_clusters()

Definition at line 476 of file kmeans.py.

◆ get_clusters()

def pyclustering.cluster.kmeans.kmeans.get_clusters (   self)

Returns list of allocated clusters, each cluster contains indexes of objects in list of data.

See also
process()
get_centers()

Definition at line 435 of file kmeans.py.

Referenced by pyclustering.samples.answer_reader.get_cluster_lengths(), and pyclustering.cluster.optics.optics.process().

◆ get_total_wce()

def pyclustering.cluster.kmeans.kmeans.get_total_wce (   self)

Returns sum of metric errors that depends on metric that was used for clustering (by default SSE - Sum of Squared Errors).

Sum of metric errors is calculated using distance between point and its center:

\[error=\sum_{i=0}^{N}distance(x_{i}-center(x_{i}))\]

See also
process()
get_clusters()

Definition at line 462 of file kmeans.py.

◆ process()

def pyclustering.cluster.kmeans.kmeans.process (   self)

Performs cluster analysis in line with rules of K-Means algorithm.

Returns
(kmeans) Returns itself (K-Means instance).
Remarks
Results of clustering can be obtained using corresponding get methods.
See also
get_clusters()
get_centers()

Definition at line 365 of file kmeans.py.


The documentation for this class was generated from the following file: