3 @brief Cluster analysis algorithm: K-Medians 4 @details Implementation based on paper @cite book::algorithms_for_clustering_data. 6 @authors Andrei Novikov (pyclustering@yandex.ru) 8 @copyright GNU Public License 10 @cond GNU_PUBLIC_LICENSE 11 PyClustering is free software: you can redistribute it and/or modify 12 it under the terms of the GNU General Public License as published by 13 the Free Software Foundation, either version 3 of the License, or 14 (at your option) any later version. 16 PyClustering is distributed in the hope that it will be useful, 17 but WITHOUT ANY WARRANTY; without even the implied warranty of 18 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 19 GNU General Public License for more details. 21 You should have received a copy of the GNU General Public License 22 along with this program. If not, see <http://www.gnu.org/licenses/>. 34 import pyclustering.core.kmedians_wrapper
as wrapper
36 from pyclustering.core.wrapper
import ccore_library
37 from pyclustering.core.metric_wrapper
import metric_wrapper
42 @brief Class represents clustering algorithm K-Medians. 43 @details The algorithm is less sensitive to outliers than K-Means. Medians are calculated instead of centroids. 45 CCORE option can be used to use the pyclustering core - C/C++ shared library for processing that significantly increases performance. 49 # load list of points for cluster analysis 50 sample = read_sample(path); 52 # create instance of K-Medians algorithm 53 kmedians_instance = kmedians(sample, [ [0.0, 0.1], [2.5, 2.6] ]); 55 # run cluster analysis and obtain results 56 kmedians_instance.process(); 57 kmedians_instance.get_clusters(); 62 def __init__(self, data, initial_centers, tolerance=0.001, ccore=True, **kwargs):
64 @brief Constructor of clustering algorithm K-Medians. 66 @param[in] data (list): Input data that is presented as list of points (objects), each point should be represented by list or tuple. 67 @param[in] initial_centers (list): Initial coordinates of medians of clusters that are represented by list: [center1, center2, ...]. 68 @param[in] tolerance (double): Stop condition: if maximum value of change of centers of clusters is less than tolerance than algorithm will stop processing 69 @param[in] ccore (bool): Defines should be CCORE library (C++ pyclustering library) used instead of Python code or not. 70 @param[in] **kwargs: Arbitrary keyword arguments (available arguments: 'metric'). 72 <b>Keyword Args:</b><br> 73 - metric (distance_metric): Metric that is used for distance calculation between two points. 85 self.
__ccore = ccore
and self.
__metric.get_type() != type_metric.USER_DEFINED
87 self.
__ccore = ccore_library.workable()
92 @brief Performs cluster analysis in line with rules of K-Medians algorithm. 94 @return (kmedians) Returns itself (K-Medians instance). 96 @remark Results of clustering can be obtained using corresponding get methods. 104 ccore_metric = metric_wrapper.create_instance(self.
__metric)
108 changes = float(
'inf')
112 raise NameError(
'Dimension of the input data and dimension of the initial medians must be equal.')
118 changes = max([self.
__metric(self.
__medians[index], updated_centers[index])
for index
in range(len(updated_centers))])
127 @brief Returns list of allocated clusters, each cluster contains indexes of objects in list of data. 139 @brief Returns list of centers of allocated clusters. 151 @brief Returns clustering result representation type that indicate how clusters are encoded. 153 @return (type_encoding) Clustering result representation. 159 return type_encoding.CLUSTER_INDEX_LIST_SEPARATION
162 def __update_clusters(self):
164 @brief Calculate Manhattan distance to each point from the each cluster. 165 @details Nearest points are captured by according clusters and as a result clusters are updated. 167 @return (list) updated clusters as list of clusters where each cluster contains indexes of objects from data. 171 clusters = [[]
for i
in range(len(self.
__medians))]
179 if (dist < dist_optim)
or (index
is 0):
183 clusters[index_optim].append(index_point)
186 clusters = [cluster
for cluster
in clusters
if len(cluster) > 0]
191 def __update_medians(self):
193 @brief Calculate medians of clusters in line with contained objects. 195 @return (list) list of medians for current number of clusters. 199 medians = [[]
for i
in range(len(self.
__clusters))]
202 medians[index] = [ 0.0
for i
in range(len(self.
__pointer_data[0]))]
208 relative_index_median = int(math.floor((length_cluster - 1) / 2))
209 index_median = sorted_cluster[relative_index_median]
211 if (length_cluster % 2) == 0:
212 index_median_second = sorted_cluster[relative_index_median + 1]
213 medians[index][index_dimension] = (self.
__pointer_data[index_median][index_dimension] + self.
__pointer_data[index_median_second][index_dimension]) / 2.0
216 medians[index][index_dimension] = self.
__pointer_data[index_median][index_dimension]
Module provides various distance metrics - abstraction of the notion of distance in a metric space...
Module for representing clustering results.
Distance metric performs distance calculation between two points in line with encapsulated function...