3 @brief Cluster analysis algorithm: K-Medoids (PAM - Partitioning Around Medoids). 4 @details Implementation based on papers @cite book::algorithms_for_clustering_data, @cite book::finding_groups_in_data. 6 @authors Andrei Novikov (pyclustering@yandex.ru) 8 @copyright GNU Public License 10 @cond GNU_PUBLIC_LICENSE 11 PyClustering is free software: you can redistribute it and/or modify 12 it under the terms of the GNU General Public License as published by 13 the Free Software Foundation, either version 3 of the License, or 14 (at your option) any later version. 16 PyClustering is distributed in the hope that it will be useful, 17 but WITHOUT ANY WARRANTY; without even the implied warranty of 18 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 19 GNU General Public License for more details. 21 You should have received a copy of the GNU General Public License 22 along with this program. If not, see <http://www.gnu.org/licenses/>. 35 import pyclustering.core.kmedoids_wrapper
as wrapper
37 from pyclustering.core.wrapper
import ccore_library
38 from pyclustering.core.metric_wrapper
import metric_wrapper
43 @brief Class represents clustering algorithm K-Medoids (another one title is PAM - Partitioning Around Medoids). 44 @details The algorithm is less sensitive to outliers tham K-Means. The principle difference between K-Medoids and K-Medians is that 45 K-Medoids uses existed points from input data space as medoids, but median in K-Medians can be unreal object (not from 48 CCORE option can be used to use core pyclustering - C/C++ shared library for processing that significantly increases performance. 52 # load list of points for cluster analysis 53 sample = read_sample(path) 55 # set random initial medoids 56 initial_medoids = [1, 10] 58 # create instance of K-Medoids algorithm 59 kmedoids_instance = kmedoids(sample, initial_medoids) 61 # run cluster analysis and obtain results 62 kmedoids_instance.process(); 63 clusters = kmedoids_instance.get_clusters() 65 # show allocated clusters 69 Metric for calculation distance between points can be specified by parameter additional 'metric': 71 # create Minkowski distance metric with degree equals to '2' 72 metric = distance_metric(type_metric.MINKOWSKI, degree=2) 74 # create K-Medoids algorithm with specific distance metric 75 kmedoids_instance = kmedoids(sample, initial_medoids, metric=metric) 77 # run cluster analysis and obtain results 78 kmedoids_instance.process() 79 clusters = kmedoids_instance.get_clusters() 82 Distance matrix can be used instead of sequence of points to increase performance and for that purpose parameter 'data_type' should be used: 84 # calculate distance matrix for sample 85 sample = read_sample(path_to_sample) 86 matrix = calculate_distance_matrix(sample) 88 # create K-Medoids algorithm for processing distance matrix instead of points 89 kmedoids_instance = kmedoids(matrix, initial_medoids, data_type='distance_matrix') 91 # run cluster analysis and obtain results 92 kmedoids_instance.process() 94 clusters = kmedoids_instance.get_clusters() 95 medoids = kmedoids_instance.get_medoids() 101 def __init__(self, data, initial_index_medoids, tolerance=0.001, ccore=True, **kwargs):
103 @brief Constructor of clustering algorithm K-Medoids. 105 @param[in] data (list): Input data that is presented as list of points (objects), each point should be represented by list or tuple. 106 @param[in] initial_index_medoids (list): Indexes of intial medoids (indexes of points in input data). 107 @param[in] tolerance (double): Stop condition: if maximum value of distance change of medoids of clusters is less than tolerance than algorithm will stop processing. 108 @param[in] ccore (bool): If specified than CCORE library (C++ pyclustering library) is used for clustering instead of Python code. 109 @param[in] **kwargs: Arbitrary keyword arguments (available arguments: 'metric', 'data_type'). 111 <b>Keyword Args:</b><br> 112 - metric (distance_metric): Metric that is used for distance calculation between two points. 113 - data_type (string): Data type of input sample 'data' that is processed by the algorithm ('points', 'distance_matrix'). 122 self.
__data_type = kwargs.get(
'data_type',
'points')
125 self.
__ccore = ccore
and self.
__metric.get_type() != type_metric.USER_DEFINED
127 self.
__ccore = ccore_library.workable()
132 @brief Performs cluster analysis in line with rules of K-Medoids algorithm. 134 @return (kmedoids) Returns itself (K-Medoids instance). 136 @remark Results of clustering can be obtained using corresponding get methods. 144 ccore_metric = metric_wrapper.create_instance(self.
__metric)
148 changes = float(
'inf')
152 while changes > stop_condition:
165 @brief Returns list of allocated clusters, each cluster contains indexes of objects in list of data. 177 @brief Returns list of medoids of allocated clusters represented by indexes from the input data. 189 @brief Returns clustering result representation type that indicate how clusters are encoded. 191 @return (type_encoding) Clustering result representation. 197 return type_encoding.CLUSTER_INDEX_LIST_SEPARATION
200 def __create_distance_calculator(self):
202 @brief Creates distance calculator in line with algorithms parameters. 204 @return (callable) Distance calculator. 212 return lambda index1, index2: self.
__pointer_data.item((index1, index2))
217 raise TypeError(
"Unknown type of data is specified '%s'" % self.
__data_type)
220 def __update_clusters(self):
222 @brief Calculate distance to each point from the each cluster. 223 @details Nearest points are captured by according clusters and as a result clusters are updated. 225 @return (list) updated clusters as list of clusters where each cluster contains indexes of objects from data. 235 dist_optim = float(
'Inf')
240 if dist < dist_optim:
244 clusters[index_optim].append(index_point)
249 def __update_medoids(self):
251 @brief Find medoids of clusters in line with contained objects. 253 @return (list) list of medoids for current number of clusters. 261 medoid_indexes[index] = medoid_index
263 return medoid_indexes
Module provides various distance metrics - abstraction of the notion of distance in a metric space...
def __update_clusters(self)
Calculate distance to each point from the each cluster.
Utils that are used by modules of pyclustering.
def __update_medoids(self)
Find medoids of clusters in line with contained objects.
def get_cluster_encoding(self)
Returns clustering result representation type that indicate how clusters are encoded.
Module for representing clustering results.
Distance metric performs distance calculation between two points in line with encapsulated function...
def __init__(self, data, initial_index_medoids, tolerance=0.001, ccore=True, kwargs)
Constructor of clustering algorithm K-Medoids.
def __create_distance_calculator(self)
Creates distance calculator in line with algorithms parameters.
Class represents clustering algorithm K-Medoids (another one title is PAM - Partitioning Around Medoi...
def get_clusters(self)
Returns list of allocated clusters, each cluster contains indexes of objects in list of data...
def process(self)
Performs cluster analysis in line with rules of K-Medoids algorithm.
def get_medoids(self)
Returns list of medoids of allocated clusters represented by indexes from the input data...