3 @brief Cluster analysis algorithm: K-Medians 4 @details Implementation based on paper @cite book::algorithms_for_clustering_data. 6 @authors Andrei Novikov (pyclustering@yandex.ru) 8 @copyright GNU Public License 10 @cond GNU_PUBLIC_LICENSE 11 PyClustering is free software: you can redistribute it and/or modify 12 it under the terms of the GNU General Public License as published by 13 the Free Software Foundation, either version 3 of the License, or 14 (at your option) any later version. 16 PyClustering is distributed in the hope that it will be useful, 17 but WITHOUT ANY WARRANTY; without even the implied warranty of 18 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 19 GNU General Public License for more details. 21 You should have received a copy of the GNU General Public License 22 along with this program. If not, see <http://www.gnu.org/licenses/>. 34 import pyclustering.core.kmedians_wrapper
as wrapper
36 from pyclustering.core.wrapper
import ccore_library
37 from pyclustering.core.metric_wrapper
import metric_wrapper
42 @brief Class represents clustering algorithm K-Medians. 43 @details The algorithm is less sensitive to outliers than K-Means. Medians are calculated instead of centroids. 47 from pyclustering.cluster.kmedians import kmedians 48 from pyclustering.cluster import cluster_visualizer 49 from pyclustering.utils import read_sample 50 from pyclustering.samples.definitions import FCPS_SAMPLES 52 # Load list of points for cluster analysis. 53 sample = read_sample(FCPS_SAMPLES.SAMPLE_TWO_DIAMONDS) 55 # Create instance of K-Medians algorithm. 56 initial_medians = [[0.0, 0.1], [2.5, 0.7]] 57 kmedians_instance = kmedians(sample, initial_medians) 59 # Run cluster analysis and obtain results. 60 kmedians_instance.process() 61 clusters = kmedians_instance.get_clusters() 62 medians = kmedians_instance.get_medians() 64 # Visualize clustering results. 65 visualizer = cluster_visualizer() 66 visualizer.append_clusters(clusters, sample) 67 visualizer.append_cluster(initial_medians, marker='*', markersize=10) 68 visualizer.append_cluster(medians, marker='*', markersize=10) 74 def __init__(self, data, initial_centers, tolerance=0.001, ccore=True, **kwargs):
76 @brief Constructor of clustering algorithm K-Medians. 78 @param[in] data (list): Input data that is presented as list of points (objects), each point should be represented by list or tuple. 79 @param[in] initial_centers (list): Initial coordinates of medians of clusters that are represented by list: [center1, center2, ...]. 80 @param[in] tolerance (double): Stop condition: if maximum value of change of centers of clusters is less than tolerance than algorithm will stop processing 81 @param[in] ccore (bool): Defines should be CCORE library (C++ pyclustering library) used instead of Python code or not. 82 @param[in] **kwargs: Arbitrary keyword arguments (available arguments: 'metric', 'itermax'). 84 <b>Keyword Args:</b><br> 85 - metric (distance_metric): Metric that is used for distance calculation between two points. 86 - itermax (uint): Maximum number of iterations for cluster analysis. 94 self.
__itermax = kwargs.get(
'itermax', 100)
99 self.
__ccore = ccore
and self.
__metric.get_type() != type_metric.USER_DEFINED
101 self.
__ccore = ccore_library.workable()
106 @brief Performs cluster analysis in line with rules of K-Medians algorithm. 108 @return (kmedians) Returns itself (K-Medians instance). 110 @remark Results of clustering can be obtained using corresponding get methods. 118 ccore_metric = metric_wrapper.create_instance(self.
__metric)
122 changes = float(
'inf')
126 raise NameError(
'Dimension of the input data and dimension of the initial medians must be equal.')
133 changes = max([self.
__metric(self.
__medians[index], updated_centers[index])
for index
in range(len(updated_centers))])
144 @brief Returns list of allocated clusters, each cluster contains indexes of objects in list of data. 156 @brief Returns list of centers of allocated clusters. 168 @brief Returns clustering result representation type that indicate how clusters are encoded. 170 @return (type_encoding) Clustering result representation. 176 return type_encoding.CLUSTER_INDEX_LIST_SEPARATION
179 def __update_clusters(self):
181 @brief Calculate Manhattan distance to each point from the each cluster. 182 @details Nearest points are captured by according clusters and as a result clusters are updated. 184 @return (list) updated clusters as list of clusters where each cluster contains indexes of objects from data. 188 clusters = [[]
for i
in range(len(self.
__medians))]
196 if (dist < dist_optim)
or (index == 0):
200 clusters[index_optim].append(index_point)
203 clusters = [cluster
for cluster
in clusters
if len(cluster) > 0]
208 def __update_medians(self):
210 @brief Calculate medians of clusters in line with contained objects. 212 @return (list) list of medians for current number of clusters. 216 medians = [[]
for i
in range(len(self.
__clusters))]
219 medians[index] = [0.0
for i
in range(len(self.
__pointer_data[0]))]
225 relative_index_median = int(math.floor((length_cluster - 1) / 2))
226 index_median = sorted_cluster[relative_index_median]
228 if (length_cluster % 2) == 0:
229 index_median_second = sorted_cluster[relative_index_median + 1]
230 medians[index][index_dimension] = (self.
__pointer_data[index_median][index_dimension] + self.
__pointer_data[index_median_second][index_dimension]) / 2.0
233 medians[index][index_dimension] = self.
__pointer_data[index_median][index_dimension]
Module provides various distance metrics - abstraction of the notion of distance in a metric space...
Module for representing clustering results.
Distance metric performs distance calculation between two points in line with encapsulated function...