3 @brief Cluster analysis algorithm: CLARANS. 4 @details Implementation based on paper @cite article::clarans::1. 6 @authors Andrei Novikov (pyclustering@yandex.ru) 8 @copyright GNU Public License 10 @cond GNU_PUBLIC_LICENSE 11 PyClustering is free software: you can redistribute it and/or modify 12 it under the terms of the GNU General Public License as published by 13 the Free Software Foundation, either version 3 of the License, or 14 (at your option) any later version. 16 PyClustering is distributed in the hope that it will be useful, 17 but WITHOUT ANY WARRANTY; without even the implied warranty of 18 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 19 GNU General Public License for more details. 21 You should have received a copy of the GNU General Public License 22 along with this program. If not, see <http://www.gnu.org/licenses/>. 37 @brief Class represents clustering algorithm CLARANS (a method for clustering objects for spatial data mining). 41 def __init__(self, data, number_clusters, numlocal, maxneighbor):
43 @brief Constructor of clustering algorithm CLARANS. 44 @details The higher the value of maxneighbor, the closer is CLARANS to K-Medoids (PAM - Partitioning Around Medoids), and the longer is each search of a local minima. 46 @param[in] data (list): Input data that is presented as list of points (objects), each point should be represented by list or tuple. 47 @param[in] number_clusters (uint): amount of clusters that should be allocated. 48 @param[in] numlocal (uint): the number of local minima obtained (amount of iterations for solving the problem). 49 @param[in] maxneighbor (uint): the maximum number of neighbors examined. 68 @brief Performs cluster analysis in line with rules of CLARANS algorithm. 98 @brief Returns allocated clusters by the algorithm. 100 @remark Allocated clusters can be returned only after data processing (use method process()), otherwise empty list is returned. 102 @return (list) List of allocated clusters, each cluster contains indexes of objects in list of data. 114 @brief Returns list of medoids of allocated clusters. 126 @brief Returns clustering result representation type that indicate how clusters are encoded. 128 @return (type_encoding) Clustering result representation. 134 return type_encoding.CLUSTER_INDEX_LIST_SEPARATION;
137 def __update_clusters(self, medoids):
139 @brief Forms cluster in line with specified medoids by calculation distance from each point to medoids. 144 self.
__clusters = [[]
for i
in range(len(medoids))];
149 for index
in range(len(medoids)):
152 if ( (dist < dist_optim)
or (index
is 0)):
156 self.
__clusters[index_optim].append(index_point);
157 self.
__belong[index_point] = index_optim;
163 def __optimize_configuration(self):
165 @brief Finds quasi-optimal medoids and updates in line with them clusters in line with algorithm's rules. 172 current_medoid_cluster_index = self.
__belong[current_medoid_index];
175 candidate_medoid_index = random.randint(0, len(self.
__pointer_data) - 1);
177 while (candidate_medoid_index
in self.
__current):
178 candidate_medoid_index = random.randint(0, len(self.
__pointer_data) - 1);
180 candidate_cost = 0.0;
184 point_cluster_index = self.
__belong[point_index];
185 point_medoid_index = self.
__current[point_cluster_index];
189 other_medoid_cluster_index = self.
__belong[other_medoid_index];
199 distance_nearest = float(
'inf');
200 if ( (point_medoid_index != candidate_medoid_index)
and (point_medoid_index != current_medoid_cluster_index) ):
204 if (point_cluster_index == current_medoid_cluster_index):
206 if (distance_candidate >= distance_nearest):
207 candidate_cost += distance_nearest - distance_current;
211 candidate_cost += distance_candidate - distance_current;
213 elif (point_cluster_index == other_medoid_cluster_index):
215 if (distance_candidate > distance_nearest):
220 candidate_cost += distance_candidate - distance_nearest;
222 if (candidate_cost < 0):
224 self.
__current[current_medoid_cluster_index] = candidate_medoid_index;
236 def __find_another_nearest_medoid(self, point_index, current_medoid_index):
238 @brief Finds the another nearest medoid for the specified point that is differ from the specified medoid. 240 @param[in] point_index: index of point in dataspace for that searching of medoid in current list of medoids is perfomed. 241 @param[in] current_medoid_index: index of medoid that shouldn't be considered as a nearest. 243 @return (uint) index of the another nearest medoid for the point. 246 other_medoid_index = -1;
247 other_distance_nearest = float(
'inf');
249 if (index_medoid != current_medoid_index):
252 if (other_distance_candidate < other_distance_nearest):
253 other_distance_nearest = other_distance_candidate;
254 other_medoid_index = index_medoid;
256 return other_medoid_index;
259 def __calculate_estimation(self):
261 @brief Calculates estimation (cost) of the current clusters. The lower the estimation, 262 the more optimally configuration of clusters. 264 @return (double) estimation of current clusters. 268 for index_cluster
in range(0, len(self.
__clusters)):
270 index_medoid = self.
__current[index_cluster];
271 for index_point
in cluster:
def get_clusters(self)
Returns allocated clusters by the algorithm.
def __update_clusters(self, medoids)
Forms cluster in line with specified medoids by calculation distance from each point to medoids...
Utils that are used by modules of pyclustering.
def __init__(self, data, number_clusters, numlocal, maxneighbor)
Constructor of clustering algorithm CLARANS.
Module for representing clustering results.
def __calculate_estimation(self)
Calculates estimation (cost) of the current clusters.
def __optimize_configuration(self)
Finds quasi-optimal medoids and updates in line with them clusters in line with algorithm's rules...
def get_medoids(self)
Returns list of medoids of allocated clusters.
def __find_another_nearest_medoid(self, point_index, current_medoid_index)
Finds the another nearest medoid for the specified point that is differ from the specified medoid...
def get_cluster_encoding(self)
Returns clustering result representation type that indicate how clusters are encoded.
def process(self)
Performs cluster analysis in line with rules of CLARANS algorithm.
Class represents clustering algorithm CLARANS (a method for clustering objects for spatial data minin...