3 @brief Cluster analysis algorithm: K-Medoids. 4 @details Implementation based on papers @cite book::algorithms_for_clustering_data, @cite book::finding_groups_in_data. 6 @authors Andrei Novikov (pyclustering@yandex.ru) 8 @copyright GNU Public License 10 @cond GNU_PUBLIC_LICENSE 11 PyClustering is free software: you can redistribute it and/or modify 12 it under the terms of the GNU General Public License as published by 13 the Free Software Foundation, either version 3 of the License, or 14 (at your option) any later version. 16 PyClustering is distributed in the hope that it will be useful, 17 but WITHOUT ANY WARRANTY; without even the implied warranty of 18 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 19 GNU General Public License for more details. 21 You should have received a copy of the GNU General Public License 22 along with this program. If not, see <http://www.gnu.org/licenses/>. 35 import pyclustering.core.kmedoids_wrapper
as wrapper
37 from pyclustering.core.wrapper
import ccore_library
38 from pyclustering.core.metric_wrapper
import metric_wrapper
43 @brief Class represents clustering algorithm K-Medoids. 44 @details The algorithm is less sensitive to outliers tham K-Means. The principle difference between K-Medoids and K-Medians is that 45 K-Medoids uses existed points from input data space as medoids, but median in K-Medians can be unreal object (not from 50 from pyclustering.cluster.kmedoids import kmedoids 51 from pyclustering.cluster import cluster_visualizer 52 from pyclustering.utils import read_sample 53 from pyclustering.samples.definitions import FCPS_SAMPLES 55 # Load list of points for cluster analysis. 56 sample = read_sample(FCPS_SAMPLES.SAMPLE_TWO_DIAMONDS) 58 # Set random initial medoids. 59 initial_medoids = [1, 500] 61 # Create instance of K-Medoids algorithm. 62 kmedoids_instance = kmedoids(sample, initial_medoids) 64 # Run cluster analysis and obtain results. 65 kmedoids_instance.process() 66 clusters = kmedoids_instance.get_clusters() 68 # Show allocated clusters. 72 visualizer = cluster_visualizer() 73 visualizer.append_clusters(clusters, sample) 77 Metric for calculation distance between points can be specified by parameter additional 'metric': 79 # create Minkowski distance metric with degree equals to '2' 80 metric = distance_metric(type_metric.MINKOWSKI, degree=2) 82 # create K-Medoids algorithm with specific distance metric 83 kmedoids_instance = kmedoids(sample, initial_medoids, metric=metric) 85 # run cluster analysis and obtain results 86 kmedoids_instance.process() 87 clusters = kmedoids_instance.get_clusters() 90 Distance matrix can be used instead of sequence of points to increase performance and for that purpose parameter 'data_type' should be used: 92 # calculate distance matrix for sample 93 sample = read_sample(path_to_sample) 94 matrix = calculate_distance_matrix(sample) 96 # create K-Medoids algorithm for processing distance matrix instead of points 97 kmedoids_instance = kmedoids(matrix, initial_medoids, data_type='distance_matrix') 99 # run cluster analysis and obtain results 100 kmedoids_instance.process() 102 clusters = kmedoids_instance.get_clusters() 103 medoids = kmedoids_instance.get_medoids() 109 def __init__(self, data, initial_index_medoids, tolerance=0.001, ccore=True, **kwargs):
111 @brief Constructor of clustering algorithm K-Medoids. 113 @param[in] data (list): Input data that is presented as list of points (objects), each point should be represented by list or tuple. 114 @param[in] initial_index_medoids (list): Indexes of intial medoids (indexes of points in input data). 115 @param[in] tolerance (double): Stop condition: if maximum value of distance change of medoids of clusters is less than tolerance than algorithm will stop processing. 116 @param[in] ccore (bool): If specified than CCORE library (C++ pyclustering library) is used for clustering instead of Python code. 117 @param[in] **kwargs: Arbitrary keyword arguments (available arguments: 'metric', 'data_type', 'itermax'). 119 <b>Keyword Args:</b><br> 120 - metric (distance_metric): Metric that is used for distance calculation between two points. 121 - data_type (string): Data type of input sample 'data' that is processed by the algorithm ('points', 'distance_matrix'). 122 - itermax (uint): Maximum number of iteration for cluster analysis. 131 self.
__data_type = kwargs.get(
'data_type',
'points')
132 self.
__itermax = kwargs.get(
'itermax', 200)
136 self.
__ccore = ccore
and self.
__metric.get_type() != type_metric.USER_DEFINED
138 self.
__ccore = ccore_library.workable()
143 @brief Performs cluster analysis in line with rules of K-Medoids algorithm. 145 @return (kmedoids) Returns itself (K-Medoids instance). 147 @remark Results of clustering can be obtained using corresponding get methods. 155 ccore_metric = metric_wrapper.create_instance(self.
__metric)
159 changes = float(
'inf')
177 @brief Returns list of allocated clusters, each cluster contains indexes of objects in list of data. 189 @brief Returns list of medoids of allocated clusters represented by indexes from the input data. 201 @brief Returns clustering result representation type that indicate how clusters are encoded. 203 @return (type_encoding) Clustering result representation. 209 return type_encoding.CLUSTER_INDEX_LIST_SEPARATION
212 def __create_distance_calculator(self):
214 @brief Creates distance calculator in line with algorithms parameters. 216 @return (callable) Distance calculator. 224 return lambda index1, index2: self.
__pointer_data.item((index1, index2))
229 raise TypeError(
"Unknown type of data is specified '%s'" % self.
__data_type)
232 def __update_clusters(self):
234 @brief Calculate distance to each point from the each cluster. 235 @details Nearest points are captured by according clusters and as a result clusters are updated. 237 @return (list) updated clusters as list of clusters where each cluster contains indexes of objects from data. 247 dist_optim = float(
'Inf')
252 if dist < dist_optim:
256 clusters[index_optim].append(index_point)
261 def __update_medoids(self):
263 @brief Find medoids of clusters in line with contained objects. 265 @return (list) list of medoids for current number of clusters. 273 medoid_indexes[index] = medoid_index
275 return medoid_indexes
Module provides various distance metrics - abstraction of the notion of distance in a metric space...
def __update_clusters(self)
Calculate distance to each point from the each cluster.
Utils that are used by modules of pyclustering.
def __update_medoids(self)
Find medoids of clusters in line with contained objects.
def get_cluster_encoding(self)
Returns clustering result representation type that indicate how clusters are encoded.
Module for representing clustering results.
Distance metric performs distance calculation between two points in line with encapsulated function...
def __init__(self, data, initial_index_medoids, tolerance=0.001, ccore=True, kwargs)
Constructor of clustering algorithm K-Medoids.
def __create_distance_calculator(self)
Creates distance calculator in line with algorithms parameters.
Class represents clustering algorithm K-Medoids.
def get_clusters(self)
Returns list of allocated clusters, each cluster contains indexes of objects in list of data...
def process(self)
Performs cluster analysis in line with rules of K-Medoids algorithm.
def get_medoids(self)
Returns list of medoids of allocated clusters represented by indexes from the input data...