pyclustering.cluster.kmedoids.kmedoids Class Reference

Class represents clustering algorithm K-Medoids (another one title is PAM - Partitioning Around Medoids). More...

Public Member Functions

def __init__ (self, data, initial_index_medoids, tolerance=0.001, ccore=True, kwargs)
 Constructor of clustering algorithm K-Medoids. More...
 
def process (self)
 Performs cluster analysis in line with rules of K-Medoids algorithm. More...
 
def get_clusters (self)
 Returns list of allocated clusters, each cluster contains indexes of objects in list of data. More...
 
def get_medoids (self)
 Returns list of medoids of allocated clusters represented by indexes from the input data. More...
 
def get_cluster_encoding (self)
 Returns clustering result representation type that indicate how clusters are encoded. More...
 

Detailed Description

Class represents clustering algorithm K-Medoids (another one title is PAM - Partitioning Around Medoids).

The algorithm is less sensitive to outliers tham K-Means. The principle difference between K-Medoids and K-Medians is that K-Medoids uses existed points from input data space as medoids, but median in K-Medians can be unreal object (not from input data space).

CCORE option can be used to use core pyclustering - C/C++ shared library for processing that significantly increases performance.

Clustering example:

# load list of points for cluster analysis
sample = read_sample(path)
# set random initial medoids
initial_medoids = [1, 10]
# create instance of K-Medoids algorithm
kmedoids_instance = kmedoids(sample, initial_medoids)
# run cluster analysis and obtain results
kmedoids_instance.process();
clusters = kmedoids_instance.get_clusters()
# show allocated clusters
print(clusters)

Metric for calculation distance between points can be specified by parameter additional 'metric':

# create Minkowski distance metric with degree equals to '2'
metric = distance_metric(type_metric.MINKOWSKI, degree=2)
# create K-Medoids algorithm with specific distance metric
kmedoids_instance = kmedoids(sample, initial_medoids, metric=metric)
# run cluster analysis and obtain results
kmedoids_instance.process()
clusters = kmedoids_instance.get_clusters()

Distance matrix can be used instead of sequence of points to increase performance and for that purpose parameter 'data_type' should be used:

# calculate distance matrix for sample
sample = read_sample(path_to_sample)
matrix = calculate_distance_matrix(sample)
# create K-Medoids algorithm for processing distance matrix instead of points
kmedoids_instance = kmedoids(matrix, initial_medoids, data_type='distance_matrix')
# run cluster analysis and obtain results
kmedoids_instance.process()
clusters = kmedoids_instance.get_clusters()
medoids = kmedoids_instance.get_medoids()

Definition at line 41 of file kmedoids.py.

Constructor & Destructor Documentation

◆ __init__()

def pyclustering.cluster.kmedoids.kmedoids.__init__ (   self,
  data,
  initial_index_medoids,
  tolerance = 0.001,
  ccore = True,
  kwargs 
)

Constructor of clustering algorithm K-Medoids.

Parameters
[in]data(list): Input data that is presented as list of points (objects), each point should be represented by list or tuple.
[in]initial_index_medoids(list): Indexes of intial medoids (indexes of points in input data).
[in]tolerance(double): Stop condition: if maximum value of distance change of medoids of clusters is less than tolerance than algorithm will stop processing.
[in]ccore(bool): If specified than CCORE library (C++ pyclustering library) is used for clustering instead of Python code.
[in]**kwargsArbitrary keyword arguments (available arguments: 'metric', 'data_type').

Keyword Args:

  • metric (distance_metric): Metric that is used for distance calculation between two points.
  • data_type (string): Data type of input sample 'data' that is processed by the algorithm ('points', 'distance_matrix').

Definition at line 101 of file kmedoids.py.

Member Function Documentation

◆ get_cluster_encoding()

def pyclustering.cluster.kmedoids.kmedoids.get_cluster_encoding (   self)

Returns clustering result representation type that indicate how clusters are encoded.

Returns
(type_encoding) Clustering result representation.
See also
get_clusters()

Definition at line 187 of file kmedoids.py.

◆ get_clusters()

def pyclustering.cluster.kmedoids.kmedoids.get_clusters (   self)

Returns list of allocated clusters, each cluster contains indexes of objects in list of data.

See also
process()
get_medoids()

Definition at line 163 of file kmedoids.py.

Referenced by pyclustering.samples.answer_reader.get_cluster_lengths(), and pyclustering.cluster.optics.optics.process().

◆ get_medoids()

def pyclustering.cluster.kmedoids.kmedoids.get_medoids (   self)

Returns list of medoids of allocated clusters represented by indexes from the input data.

See also
process()
get_clusters()

Definition at line 175 of file kmedoids.py.

◆ process()

def pyclustering.cluster.kmedoids.kmedoids.process (   self)

Performs cluster analysis in line with rules of K-Medoids algorithm.

Returns
(kmedoids) Returns itself (K-Medoids instance).
Remarks
Results of clustering can be obtained using corresponding get methods.
See also
get_clusters()
get_medoids()

Definition at line 130 of file kmedoids.py.


The documentation for this class was generated from the following file: