3 @brief Cluster analysis algorithm: agglomerative algorithm. 4 @details Implementation based on paper @cite book::algorithms_for_clustering_data::1988. 6 @authors Andrei Novikov (pyclustering@yandex.ru) 8 @copyright GNU Public License 10 @cond GNU_PUBLIC_LICENSE 11 PyClustering is free software: you can redistribute it and/or modify 12 it under the terms of the GNU General Public License as published by 13 the Free Software Foundation, either version 3 of the License, or 14 (at your option) any later version. 16 PyClustering is distributed in the hope that it will be useful, 17 but WITHOUT ANY WARRANTY; without even the implied warranty of 18 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 19 GNU General Public License for more details. 21 You should have received a copy of the GNU General Public License 22 along with this program. If not, see <http://www.gnu.org/licenses/>. 28 from enum
import IntEnum;
34 from pyclustering.core.wrapper
import ccore_library
36 import pyclustering.core.agglomerative_wrapper
as wrapper;
41 @brief Enumerator of types of link between clusters. 60 @brief Class represents agglomerative algorithm for cluster analysis. 61 @details Agglomerative algorithm considers each data point (object) as a separate cluster at the beggining and 62 step by step finds the best pair of clusters for merge until required amount of clusters is obtained. 64 CCORE option can be used to use the pyclustering core - C/C++ shared library for processing that significantly increases performance. 66 Example of agglomerative algorithm where centroid link is used: 68 # sample for cluster analysis (represented by list) 69 sample = read_sample(path_to_sample); 71 # create object that uses python code only 72 agglomerative_instance = agglomerative(sample, 2, link_type.CENTROID_LINK) 75 agglomerative_instance.process(); 77 # obtain results of clustering 78 clusters = agglomerative_instance.get_clusters(); 81 Algorithm performance can be improved if 'ccore' flag is on. In this case C++ library will be called for clustering. 82 There is example of clustering 'LSUN' sample when usage of single or complete link will take a lot of resources and 83 when core usage is prefereble. 85 # sample Lsun for cluster analysis 86 lsun_sample = read_sample(FCPS_SAMPLES.SAMPLE_LSUN); 88 # create instance of the algorithm that will use ccore library (the last argument) 89 agglomerative_instance = agglomerative(lsun_sample, 3, link_type.SINGLE_LINK, True); 92 agglomerative_instance.process(); 94 # get result and visualize it 95 lsun_clusters = agglomerative_instance.get_clusters(); 96 visualizer = cluster_visualizer(); 97 visualizer.append_clusters(lsun_clusters, lsun_sample); 101 Example of agglomerative clustering using different links: 102 @image html agglomerative_lsun_clustering_single_link.png 106 def __init__(self, data, number_clusters, link = None, ccore = True):
108 @brief Constructor of agglomerative hierarchical algorithm. 110 @param[in] data (list): Input data that is presented as a list of points (objects), each point should be represented by list, for example 111 [[0.1, 0.2], [0.4, 0.5], [1.3, 0.9]]. 112 @param[in] number_clusters (uint): Number of clusters that should be allocated. 113 @param[in] link (type_link): Link type that is used for calculation similarity between objects and clusters, if it is not specified centroid link will be used by default. 114 @param[in] ccore (bool): Defines should be CCORE (C++ pyclustering library) used instead of Python code or not (by default it is 'False'). 128 self.
__ccore = ccore_library.workable();
136 @brief Performs cluster analysis in line with rules of agglomerative algorithm and similarity. 148 current_number_clusters = len(self.
__clusters);
152 current_number_clusters = len(self.
__clusters);
157 @brief Returns list of allocated clusters, each cluster contains indexes of objects in list of data. 159 @remark Results of clustering can be obtained using corresponding gets methods. 161 @return (list) List of allocated clusters, each cluster contains indexes of objects in list of data. 172 @brief Returns clustering result representation type that indicate how clusters are encoded. 174 @return (type_encoding) Clustering result representation. 180 return type_encoding.CLUSTER_INDEX_LIST_SEPARATION;
183 def __merge_similar_clusters(self):
185 @brief Merges the most similar clusters in line with link type. 202 raise NameError(
'Not supported similarity is used');
205 def __merge_by_average_link(self):
207 @brief Merges the most similar clusters in line with average link type. 211 minimum_average_distance = float(
'Inf');
213 for index_cluster1
in range(0, len(self.
__clusters)):
214 for index_cluster2
in range(index_cluster1 + 1, len(self.
__clusters)):
217 candidate_average_distance = 0.0;
218 for index_object1
in self.
__clusters[index_cluster1]:
219 for index_object2
in self.
__clusters[index_cluster2]:
222 candidate_average_distance /= (len(self.
__clusters[index_cluster1]) + len(self.
__clusters[index_cluster2]));
224 if (candidate_average_distance < minimum_average_distance):
225 minimum_average_distance = candidate_average_distance;
226 indexes = [index_cluster1, index_cluster2];
232 def __merge_by_centroid_link(self):
234 @brief Merges the most similar clusters in line with centroid link type. 238 minimum_centroid_distance = float(
'Inf');
241 for index1
in range(0, len(self.
__centers)):
242 for index2
in range(index1 + 1, len(self.
__centers)):
244 if (distance < minimum_centroid_distance):
245 minimum_centroid_distance = distance;
246 indexes = [index1, index2];
255 def __merge_by_complete_link(self):
257 @brief Merges the most similar clusters in line with complete link type. 261 minimum_complete_distance = float(
'Inf');
264 for index_cluster1
in range(0, len(self.
__clusters)):
265 for index_cluster2
in range(index_cluster1 + 1, len(self.
__clusters)):
268 if (candidate_maximum_distance < minimum_complete_distance):
269 minimum_complete_distance = candidate_maximum_distance;
270 indexes = [index_cluster1, index_cluster2];
276 def __calculate_farthest_distance(self, index_cluster1, index_cluster2):
278 @brief Finds two farthest objects in two specified clusters in terms and returns distance between them. 280 @param[in] (uint) Index of the first cluster. 281 @param[in] (uint) Index of the second cluster. 283 @return The farthest euclidean distance between two clusters. 286 candidate_maximum_distance = 0.0;
287 for index_object1
in self.
__clusters[index_cluster1]:
288 for index_object2
in self.
__clusters[index_cluster2]:
291 if (distance > candidate_maximum_distance):
292 candidate_maximum_distance = distance;
294 return candidate_maximum_distance;
297 def __merge_by_signle_link(self):
299 @brief Merges the most similar clusters in line with single link type. 303 minimum_single_distance = float(
'Inf');
306 for index_cluster1
in range(0, len(self.
__clusters)):
307 for index_cluster2
in range(index_cluster1 + 1, len(self.
__clusters)):
310 if (candidate_minimum_distance < minimum_single_distance):
311 minimum_single_distance = candidate_minimum_distance;
312 indexes = [index_cluster1, index_cluster2];
318 def __calculate_nearest_distance(self, index_cluster1, index_cluster2):
320 @brief Finds two nearest objects in two specified clusters and returns distance between them. 322 @param[in] (uint) Index of the first cluster. 323 @param[in] (uint) Index of the second cluster. 325 @return The nearest euclidean distance between two clusters. 328 candidate_minimum_distance = float(
'Inf');
330 for index_object1
in self.
__clusters[index_cluster1]:
331 for index_object2
in self.
__clusters[index_cluster2]:
333 if (distance < candidate_minimum_distance):
334 candidate_minimum_distance = distance;
336 return candidate_minimum_distance;
339 def __calculate_center(self, cluster):
341 @brief Calculates new center. 343 @return (list) New value of the center of the specified cluster. 348 center = [0] * dimension;
349 for index_point
in cluster:
350 for index_dimension
in range(0, dimension):
351 center[index_dimension] += self.
__pointer_data[index_point][index_dimension];
353 for index_dimension
in range(0, dimension):
354 center[index_dimension] /= len(cluster);
def __calculate_farthest_distance(self, index_cluster1, index_cluster2)
Finds two farthest objects in two specified clusters in terms and returns distance between them...
def __merge_by_signle_link(self)
Merges the most similar clusters in line with single link type.
Utils that are used by modules of pyclustering.
Module for representing clustering results.
Enumerator of types of link between clusters.
def __merge_by_centroid_link(self)
Merges the most similar clusters in line with centroid link type.
Class represents agglomerative algorithm for cluster analysis.
def __init__(self, data, number_clusters, link=None, ccore=True)
Constructor of agglomerative hierarchical algorithm.
def __merge_by_average_link(self)
Merges the most similar clusters in line with average link type.
def __calculate_center(self, cluster)
Calculates new center.
def process(self)
Performs cluster analysis in line with rules of agglomerative algorithm and similarity.
def get_cluster_encoding(self)
Returns clustering result representation type that indicate how clusters are encoded.
def __calculate_nearest_distance(self, index_cluster1, index_cluster2)
Finds two nearest objects in two specified clusters and returns distance between them.
def __merge_similar_clusters(self)
Merges the most similar clusters in line with link type.
def get_clusters(self)
Returns list of allocated clusters, each cluster contains indexes of objects in list of data...
def __merge_by_complete_link(self)
Merges the most similar clusters in line with complete link type.