pyclustering  0.10.1
pyclustring is a Python, C++ data mining library.
pyclustering.cluster.kmedians.kmedians Class Reference

Class represents clustering algorithm K-Medians. More...

Public Member Functions

def __init__ (self, data, initial_medians, tolerance=0.001, ccore=True, **kwargs)
 Constructor of clustering algorithm K-Medians. More...
 
def process (self)
 Performs cluster analysis in line with rules of K-Medians algorithm. More...
 
def predict (self, points)
 Calculates the closest cluster to each point. More...
 
def get_clusters (self)
 Returns list of allocated clusters, each cluster contains indexes of objects in list of data. More...
 
def get_medians (self)
 Returns list of centers of allocated clusters. More...
 
def get_total_wce (self)
 Returns sum of metric errors that depends on metric that was used for clustering (by default SSE - Sum of Squared Errors). More...
 
def get_cluster_encoding (self)
 Returns clustering result representation type that indicate how clusters are encoded. More...
 

Detailed Description

Class represents clustering algorithm K-Medians.

The algorithm is less sensitive to outliers than K-Means. Medians are calculated instead of centroids.

Example:

from pyclustering.cluster.kmedians import kmedians
from pyclustering.cluster import cluster_visualizer
from pyclustering.utils import read_sample
from pyclustering.samples.definitions import FCPS_SAMPLES
# Load list of points for cluster analysis.
sample = read_sample(FCPS_SAMPLES.SAMPLE_TWO_DIAMONDS)
# Create instance of K-Medians algorithm.
initial_medians = [[0.0, 0.1], [2.5, 0.7]]
kmedians_instance = kmedians(sample, initial_medians)
# Run cluster analysis and obtain results.
kmedians_instance.process()
clusters = kmedians_instance.get_clusters()
medians = kmedians_instance.get_medians()
# Visualize clustering results.
visualizer = cluster_visualizer()
visualizer.append_clusters(clusters, sample)
visualizer.append_cluster(initial_medians, marker='*', markersize=10)
visualizer.append_cluster(medians, marker='*', markersize=10)
visualizer.show()

Definition at line 26 of file kmedians.py.

Constructor & Destructor Documentation

◆ __init__()

def pyclustering.cluster.kmedians.kmedians.__init__ (   self,
  data,
  initial_medians,
  tolerance = 0.001,
  ccore = True,
**  kwargs 
)

Constructor of clustering algorithm K-Medians.

Parameters
[in]data(list): Input data that is presented as list of points (objects), each point should be represented by list or tuple.
[in]initial_medians(list): Initial coordinates of medians of clusters that are represented by list: [center1, center2, ...].
[in]tolerance(double): Stop condition: if maximum value of change of centers of clusters is less than tolerance than algorithm will stop processing
[in]ccore(bool): Defines should be CCORE library (C++ pyclustering library) used instead of Python code or not.
[in]**kwargsArbitrary keyword arguments (available arguments: 'metric', 'itermax').

Keyword Args:

  • metric (distance_metric): Metric that is used for distance calculation between two points.
  • itermax (uint): Maximum number of iterations for cluster analysis.

Definition at line 60 of file kmedians.py.

Member Function Documentation

◆ get_cluster_encoding()

def pyclustering.cluster.kmedians.kmedians.get_cluster_encoding (   self)

Returns clustering result representation type that indicate how clusters are encoded.

Returns
(type_encoding) Clustering result representation.
See also
get_clusters()

Definition at line 219 of file kmedians.py.

◆ get_clusters()

def pyclustering.cluster.kmedians.kmedians.get_clusters (   self)

Returns list of allocated clusters, each cluster contains indexes of objects in list of data.

See also
process()
get_medians()

Definition at line 178 of file kmedians.py.

Referenced by pyclustering.samples.answer_reader.get_cluster_lengths(), and pyclustering.cluster.optics.optics.process().

◆ get_medians()

def pyclustering.cluster.kmedians.kmedians.get_medians (   self)

Returns list of centers of allocated clusters.

See also
process()
get_clusters()

Definition at line 190 of file kmedians.py.

◆ get_total_wce()

def pyclustering.cluster.kmedians.kmedians.get_total_wce (   self)

Returns sum of metric errors that depends on metric that was used for clustering (by default SSE - Sum of Squared Errors).

Sum of metric errors is calculated using distance between point and its center:

\[error=\sum_{i=0}^{N}distance(x_{i}-center(x_{i}))\]

See also
process()
get_clusters()

Definition at line 205 of file kmedians.py.

◆ predict()

def pyclustering.cluster.kmedians.kmedians.predict (   self,
  points 
)

Calculates the closest cluster to each point.

Parameters
[in]points(array_like): Points for which closest clusters are calculated.
Returns
(list) List of closest clusters for each point. Each cluster is denoted by index. Return empty collection if 'process()' method was not called.

An example how to calculate (or predict) the closest cluster to specified points.

from pyclustering.cluster.kmedians import kmedians
from pyclustering.samples.definitions import SIMPLE_SAMPLES
from pyclustering.utils import read_sample
# Load list of points for cluster analysis.
sample = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE3)
# Initial centers for sample 'Simple3'.
initial_medians = [[0.2, 0.1], [4.0, 1.0], [2.0, 2.0], [2.3, 3.9]]
# Create instance of K-Medians algorithm with prepared centers.
kmedians_instance = kmedians(sample, initial_medians)
# Run cluster analysis.
kmedians_instance.process()
# Calculate the closest cluster to following two points.
points = [[0.25, 0.2], [2.5, 4.0]]
closest_clusters = kmedians_instance.predict(points)
print(closest_clusters)

Definition at line 133 of file kmedians.py.

◆ process()

def pyclustering.cluster.kmedians.kmedians.process (   self)

Performs cluster analysis in line with rules of K-Medians algorithm.

Returns
(kmedians) Returns itself (K-Medians instance).
Remarks
Results of clustering can be obtained using corresponding get methods.
See also
get_clusters()
get_medians()

Definition at line 93 of file kmedians.py.


The documentation for this class was generated from the following file:
pyclustering.cluster
pyclustering module for cluster analysis.
Definition: __init__.py:1
pyclustering.utils
Utils that are used by modules of pyclustering.
Definition: __init__.py:1
pyclustering.utils.read_sample
def read_sample(filename)
Returns data sample from simple text file.
Definition: __init__.py:30
pyclustering.cluster.kmedians
Cluster analysis algorithm: K-Medians.
Definition: kmedians.py:1