3 @brief Cluster analysis algorithm: CLARANS. 4 @details Implementation based on paper @cite article::clarans::1. 6 @authors Andrei Novikov (pyclustering@yandex.ru) 8 @copyright GNU Public License 10 @cond GNU_PUBLIC_LICENSE 11 PyClustering is free software: you can redistribute it and/or modify 12 it under the terms of the GNU General Public License as published by 13 the Free Software Foundation, either version 3 of the License, or 14 (at your option) any later version. 16 PyClustering is distributed in the hope that it will be useful, 17 but WITHOUT ANY WARRANTY; without even the implied warranty of 18 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 19 GNU General Public License for more details. 21 You should have received a copy of the GNU General Public License 22 along with this program. If not, see <http://www.gnu.org/licenses/>. 37 @brief Class represents clustering algorithm CLARANS (a method for clustering objects for spatial data mining). 41 def __init__(self, data, number_clusters, numlocal, maxneighbor):
43 @brief Constructor of clustering algorithm CLARANS. 44 @details The higher the value of maxneighbor, the closer is CLARANS to K-Medoids, and the longer is each search of a local minima. 46 @param[in] data (list): Input data that is presented as list of points (objects), each point should be represented by list or tuple. 47 @param[in] number_clusters (uint): Amount of clusters that should be allocated. 48 @param[in] numlocal (uint): The number of local minima obtained (amount of iterations for solving the problem). 49 @param[in] maxneighbor (uint): The maximum number of neighbors examined. 68 def __verify_arguments(self):
70 @brief Verify input parameters for the algorithm and throw exception in case of incorrectness. 74 raise ValueError(
"Input data is empty (size: '%d')." % len(self.
__pointer_data))
77 raise ValueError(
"Amount of cluster (current value: '%d') for allocation should be greater than 0." %
81 raise ValueError(
"Local minima (current value: '%d') should be greater or equal to 0." % self.
__numlocal)
84 raise ValueError(
"Maximum number of neighbors (current value: '%d') should be greater or " 90 @brief Performs cluster analysis in line with rules of CLARANS algorithm. 92 @return (clarans) Returns itself (CLARANS instance). 123 @brief Returns allocated clusters by the algorithm. 125 @remark Allocated clusters can be returned only after data processing (use method process()), otherwise empty list is returned. 127 @return (list) List of allocated clusters, each cluster contains indexes of objects in list of data. 139 @brief Returns list of medoids of allocated clusters. 151 @brief Returns clustering result representation type that indicate how clusters are encoded. 153 @return (type_encoding) Clustering result representation. 159 return type_encoding.CLUSTER_INDEX_LIST_SEPARATION
162 def __update_clusters(self, medoids):
164 @brief Forms cluster in line with specified medoids by calculation distance from each point to medoids. 169 self.
__clusters = [[]
for i
in range(len(medoids))]
174 for index
in range(len(medoids)):
177 if (dist < dist_optim)
or (index
is 0):
181 self.
__clusters[index_optim].append(index_point)
182 self.
__belong[index_point] = index_optim
188 def __optimize_configuration(self):
190 @brief Finds quasi-optimal medoids and updates in line with them clusters in line with algorithm's rules. 197 current_medoid_cluster_index = self.
__belong[current_medoid_index]
200 candidate_medoid_index = random.randint(0, len(self.
__pointer_data) - 1)
202 while candidate_medoid_index
in self.
__current:
203 candidate_medoid_index = random.randint(0, len(self.
__pointer_data) - 1)
209 point_cluster_index = self.
__belong[point_index]
210 point_medoid_index = self.
__current[point_cluster_index]
214 other_medoid_cluster_index = self.
__belong[other_medoid_index]
224 distance_nearest = float(
'inf')
225 if ( (point_medoid_index != candidate_medoid_index)
and (point_medoid_index != current_medoid_cluster_index) ):
229 if (point_cluster_index == current_medoid_cluster_index):
231 if (distance_candidate >= distance_nearest):
232 candidate_cost += distance_nearest - distance_current
236 candidate_cost += distance_candidate - distance_current
238 elif (point_cluster_index == other_medoid_cluster_index):
240 if (distance_candidate > distance_nearest):
245 candidate_cost += distance_candidate - distance_nearest
247 if (candidate_cost < 0):
249 self.
__current[current_medoid_cluster_index] = candidate_medoid_index
261 def __find_another_nearest_medoid(self, point_index, current_medoid_index):
263 @brief Finds the another nearest medoid for the specified point that is differ from the specified medoid. 265 @param[in] point_index: index of point in dataspace for that searching of medoid in current list of medoids is perfomed. 266 @param[in] current_medoid_index: index of medoid that shouldn't be considered as a nearest. 268 @return (uint) index of the another nearest medoid for the point. 271 other_medoid_index = -1
272 other_distance_nearest = float(
'inf')
274 if (index_medoid != current_medoid_index):
277 if other_distance_candidate < other_distance_nearest:
278 other_distance_nearest = other_distance_candidate
279 other_medoid_index = index_medoid
281 return other_medoid_index
284 def __calculate_estimation(self):
286 @brief Calculates estimation (cost) of the current clusters. The lower the estimation, 287 the more optimally configuration of clusters. 289 @return (double) estimation of current clusters. 293 for index_cluster
in range(0, len(self.
__clusters)):
295 index_medoid = self.
__current[index_cluster]
296 for index_point
in cluster:
def get_clusters(self)
Returns allocated clusters by the algorithm.
def __update_clusters(self, medoids)
Forms cluster in line with specified medoids by calculation distance from each point to medoids...
def __verify_arguments(self)
Verify input parameters for the algorithm and throw exception in case of incorrectness.
Utils that are used by modules of pyclustering.
def __init__(self, data, number_clusters, numlocal, maxneighbor)
Constructor of clustering algorithm CLARANS.
Module for representing clustering results.
def __calculate_estimation(self)
Calculates estimation (cost) of the current clusters.
def __optimize_configuration(self)
Finds quasi-optimal medoids and updates in line with them clusters in line with algorithm's rules...
def get_medoids(self)
Returns list of medoids of allocated clusters.
def __find_another_nearest_medoid(self, point_index, current_medoid_index)
Finds the another nearest medoid for the specified point that is differ from the specified medoid...
def get_cluster_encoding(self)
Returns clustering result representation type that indicate how clusters are encoded.
def process(self)
Performs cluster analysis in line with rules of CLARANS algorithm.
Class represents clustering algorithm CLARANS (a method for clustering objects for spatial data minin...