3 @brief Cluster analysis algorithm: BSAS (Basic Sequential Algorithmic Scheme). 4 @details Implementation based on paper @cite book::pattern_recognition::2009. 6 @authors Andrei Novikov (pyclustering@yandex.ru) 8 @copyright GNU Public License 10 @cond GNU_PUBLIC_LICENSE 11 PyClustering is free software: you can redistribute it and/or modify 12 it under the terms of the GNU General Public License as published by 13 the Free Software Foundation, either version 3 of the License, or 14 (at your option) any later version. 16 PyClustering is distributed in the hope that it will be useful, 17 but WITHOUT ANY WARRANTY; without even the implied warranty of 18 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 19 GNU General Public License for more details. 21 You should have received a copy of the GNU General Public License 22 along with this program. If not, see <http://www.gnu.org/licenses/>. 28 from pyclustering.core.wrapper
import ccore_library;
29 from pyclustering.core.bsas_wrapper
import bsas
as bsas_wrapper;
30 from pyclustering.core.metric_wrapper
import metric_wrapper;
40 @brief Visualizer of BSAS algorithm's results. 41 @details BSAS visualizer provides visualization services that are specific for BSAS algorithm. 48 @brief Display BSAS clustering results. 50 @param[in] sample (list): Dataset that was used for clustering. 51 @param[in] clusters (array_like): Clusters that were allocated by the algorithm. 52 @param[in] representatives (array_like): Allocated representatives correspond to clusters. 53 @param[in] **kwargs: Arbitrary keyword arguments (available arguments: 'figure', 'display', 'offset'). 55 <b>Keyword Args:</b><br> 56 - figure (figure): If 'None' then new is figure is created, otherwise specified figure is used for visualization. 57 - display (bool): If 'True' then figure will be shown by the method, otherwise it should be shown manually using matplotlib function 'plt.show()'. 58 - offset (uint): Specify axes index on the figure where results should be drawn (only if argument 'figure' is specified). 60 @return (figure) Figure where clusters were drawn. 64 figure = kwargs.get(
'figure',
None);
65 display = kwargs.get(
'display',
True);
66 offset = kwargs.get(
'offset', 0);
69 visualizer.append_clusters(clusters, sample, canvas=offset);
71 for cluster_index
in range(len(clusters)):
72 visualizer.append_cluster_attribute(offset, cluster_index, [representatives[cluster_index]],
'*', 10);
74 return visualizer.show(figure=figure, display=display);
79 @brief Class represents BSAS clustering algorithm - basic sequential algorithmic scheme. 80 @details Algorithm has two mandatory parameters: maximum allowable number of clusters and threshold 81 of dissimilarity or in other words maximum distance between points. Distance metric also can 82 be specified using 'metric' parameters, by default 'Manhattan' distance is used. 83 BSAS using following rule for updating cluster representative: 86 \vec{m}_{C_{k}}^{new}=\frac{ \left ( n_{C_{k}^{new}} - 1 \right )\vec{m}_{C_{k}}^{old} + \vec{x} }{n_{C_{k}^{new}}} 89 Clustering results of this algorithm depends on objects order in input data. 93 # Read data sample from 'Simple02.data'. 94 sample = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE2); 96 # Prepare algorithm's parameters. 100 # Create instance of BSAS algorithm. 101 bsas_instance = bsas(sample, max_clusters, threshold); 102 bsas_instance.process(); 104 # Get clustering results. 105 clusters = bsas_instance.get_clusters(); 106 representatives = bsas_instance.get_representatives(); 109 bsas_visualizer.show_clusters(sample, clusters, representatives); 112 @see pyclustering.cluster.mbsas, pyclustering.cluster.ttsas 116 def __init__(self, data, maximum_clusters, threshold, ccore=True, **kwargs):
118 @brief Creates classical BSAS algorithm. 120 @param[in] data (list): Input data that is presented as list of points (objects), each point should be represented by list or tuple. 121 @param[in] maximum_clusters: Maximum allowable number of clusters that can be allocated during processing. 122 @param[in] threshold: Threshold of dissimilarity (maximum distance) between points. 123 @param[in] ccore (bool): If True than DLL CCORE (C++ solution) will be used for solving. 124 @param[in] **kwargs: Arbitrary keyword arguments (available arguments: 'metric'). 126 <b>Keyword Args:</b><br> 127 - metric (distance_metric): Metric that is used for distance calculation between two points. 132 self.
_amount = maximum_clusters;
135 self.
_ccore = ccore
and self.
_metric.get_type() != type_metric.USER_DEFINED;
141 self.
_ccore = ccore_library.workable();
146 @brief Performs cluster analysis in line with rules of BSAS algorithm. 148 @remark Results of clustering can be obtained using corresponding get methods. 151 @see get_representatives() 161 def __process_by_ccore(self):
162 ccore_metric = metric_wrapper.create_instance(self.
_metric);
166 def __prcess_by_python(self):
170 for i
in range(1, len(self.
_data)):
171 point = self.
_data[i];
184 @brief Returns list of allocated clusters, each cluster contains indexes of objects in list of data. 187 @see get_representatives() 195 @brief Returns list of representatives of allocated clusters. 206 @brief Returns clustering result representation type that indicate how clusters are encoded. 208 @return (type_encoding) Clustering result representation. 214 return type_encoding.CLUSTER_INDEX_LIST_SEPARATION;
217 def _find_nearest_cluster(self, point):
219 @brief Find nearest cluster to the specified point. 221 @param[in] point (list): Point from dataset. 223 @return (uint, double) Index of nearest cluster and distance to it. 227 nearest_distance = float(
'inf');
231 if distance < nearest_distance:
232 index_cluster = index;
233 nearest_distance = distance;
235 return index_cluster, nearest_distance;
238 def _update_representative(self, index_cluster, point):
240 @brief Update cluster representative in line with new cluster size and added point to it. 242 @param[in] index_cluster (uint): Index of cluster whose representative should be updated. 243 @param[in] point (list): Point that was added to cluster. 246 length = len(self.
_clusters[index_cluster]);
249 for dimension
in range(len(rep)):
250 rep[dimension] = ( (length - 1) * rep[dimension] + point[dimension] ) / length;
Common visualizer of clusters on 1D, 2D or 3D surface.
pyclustering module for cluster analysis.
def get_cluster_encoding(self)
Returns clustering result representation type that indicate how clusters are encoded.
Class represents BSAS clustering algorithm - basic sequential algorithmic scheme. ...
def get_representatives(self)
Returns list of representatives of allocated clusters.
def process(self)
Performs cluster analysis in line with rules of BSAS algorithm.
Module provides various distance metrics - abstraction of the notion of distance in a metric space...
Module for representing clustering results.
Distance metric performs distance calculation between two points in line with encapsulated function...
def __init__(self, data, maximum_clusters, threshold, ccore=True, kwargs)
Creates classical BSAS algorithm.
def _find_nearest_cluster(self, point)
Find nearest cluster to the specified point.
def get_clusters(self)
Returns list of allocated clusters, each cluster contains indexes of objects in list of data...
def __prcess_by_python(self)
Visualizer of BSAS algorithm's results.
def show_clusters(sample, clusters, representatives, kwargs)
Display BSAS clustering results.
def __process_by_ccore(self)
def _update_representative(self, index_cluster, point)
Update cluster representative in line with new cluster size and added point to it.