3 @brief pyclustering module for cluster analysis. 5 @authors Andrei Novikov (pyclustering@yandex.ru) 7 @copyright GNU Public License 9 @cond GNU_PUBLIC_LICENSE 10 PyClustering is free software: you can redistribute it and/or modify 11 it under the terms of the GNU General Public License as published by 12 the Free Software Foundation, either version 3 of the License, or 13 (at your option) any later version. 15 PyClustering is distributed in the hope that it will be useful, 16 but WITHOUT ANY WARRANTY; without even the implied warranty of 17 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 18 GNU General Public License for more details. 20 You should have received a copy of the GNU General Public License 21 along with this program. If not, see <http://www.gnu.org/licenses/>. 31 import matplotlib.pyplot
as plt
32 import matplotlib.gridspec
as gridspec
34 warnings.warn(
"Impossible to import matplotlib (please, install 'matplotlib'), pyclustering's visualization " 35 "functionality is not available.")
42 @brief Description of cluster for representation on canvas. 46 def __init__(self, cluster, data, marker, markersize, color):
48 @brief Constructor of cluster representation on the canvas. 50 @param[in] cluster (list): Single cluster that consists of objects or indexes from data. 51 @param[in] data (list): Objects that should be displayed, can be None if clusters consist of objects instead of indexes. 52 @param[in] marker (string): Type of marker that is used for drawing objects. 53 @param[in] markersize (uint): Size of marker that is used for drawing objects. 54 @param[in] color (string): Color of the marker that is used for drawing objects. 79 @brief Visualizer for cluster in multi-dimensional data. 80 @details This cluster visualizer is useful for clusters in data whose dimension is greater than 3. The 81 multidimensional visualizer helps to overcome 'cluster_visualizer' shortcoming - ability to display 82 clusters in 1D, 2D or 3D dimensional data space. 84 Example of clustering results visualization where 'Iris' is used: 86 from pyclustering.utils import read_sample 87 from pyclustering.samples.definitions import FAMOUS_SAMPLES 88 from pyclustering.cluster import cluster_visualizer_multidim 90 # load 4D data sample 'Iris' 91 sample_4d = read_sample(FAMOUS_SAMPLES.SAMPLE_IRIS) 93 # initialize 3 initial centers using K-Means++ algorithm 94 centers = kmeans_plusplus_initializer(sample_4d, 3).initialize() 96 # performs cluster analysis using X-Means 97 xmeans_instance = xmeans(sample_4d, centers) 98 xmeans_instance.process() 99 clusters = xmeans_instance.get_clusters() 101 # visualize obtained clusters in multi-dimensional space 102 visualizer = cluster_visualizer_multidim() 103 visualizer.append_clusters(clusters, sample_4d) 104 visualizer.show(max_row_size=3) 107 Visualized clustering results of 'Iris' data (multi-dimensional data): 108 @image html xmeans_clustering_famous_iris.png "Fig. 1. X-Means clustering results (data 'Iris')." 110 Sometimes no need to display results in all dimensions. Parameter 'filter' can be used to display only 111 interesting coordinate pairs. Here is an example of visualization of pair coordinates (x0, x1) and (x0, x2) for 112 previous clustering results: 114 visualizer = cluster_visualizer_multidim() 115 visualizer.append_clusters(clusters, sample_4d) 116 visualizer.show(pair_filter=[[0, 1], [0, 2]]) 119 Visualized results of specified coordinate pairs: 120 @image html xmeans_clustering_famous_iris_filtered.png "Fig. 2. X-Means clustering results (x0, x1) and (x0, x2) (data 'Iris')." 126 @brief Constructs cluster visualizer for multidimensional data. 127 @details The visualizer is suitable more data whose dimension is bigger than 3. 135 def append_cluster(self, cluster, data = None, marker = '.', markersize = None, color = None):
137 @brief Appends cluster for visualization. 139 @param[in] cluster (list): cluster that may consist of indexes of objects from the data or object itself. 140 @param[in] data (list): If defines that each element of cluster is considered as a index of object from the data. 141 @param[in] marker (string): Marker that is used for displaying objects from cluster on the canvas. 142 @param[in] markersize (uint): Size of marker. 143 @param[in] color (string): Color of marker. 145 @return Returns index of cluster descriptor on the canvas. 148 if len(cluster) == 0:
149 raise ValueError(
"Empty cluster is provided.")
151 markersize = markersize
or 5
153 index_color = len(self.
__clusters) % len(color_list.TITLES)
154 color = color_list.TITLES[index_color]
162 @brief Appends list of cluster for visualization. 164 @param[in] clusters (list): List of clusters where each cluster may consist of indexes of objects from the data or object itself. 165 @param[in] data (list): If defines that each element of cluster is considered as a index of object from the data. 166 @param[in] marker (string): Marker that is used for displaying objects from clusters on the canvas. 167 @param[in] markersize (uint): Size of marker. 171 for cluster
in clusters:
175 def show(self, pair_filter=None, **kwargs):
177 @brief Shows clusters (visualize) in multi-dimensional space. 179 @param[in] pair_filter (list): List of coordinate pairs that should be displayed. This argument is used as a filter. 180 @param[in] **kwargs: Arbitrary keyword arguments (available arguments: 'visible_axis' 'visible_labels', 'visible_grid', 'row_size'). 182 <b>Keyword Args:</b><br> 183 - visible_axis (bool): Defines visibility of axes on each canvas, if True - axes are visible. 184 By default axis of each canvas are not displayed. 185 - visible_labels (bool): Defines visibility of labels on each canvas, if True - labels is displayed. 186 By default labels of each canvas are displayed. 187 - visible_grid (bool): Defines visibility of grid on each canvas, if True - grid is displayed. 188 By default grid of each canvas is displayed. 189 - max_row_size (uint): Maximum number of canvases on one row. 194 raise ValueError(
"There is no non-empty clusters for visualization.")
197 dimension = len(cluster_data[0])
199 acceptable_pairs = pair_filter
or []
206 amount_axis = len(pairs)
211 for index
in range(amount_axis):
213 axis_storage.append(ax)
221 def __create_grid_spec(self, amount_axis, max_row_size):
223 @brief Create grid specification for figure to place canvases. 225 @param[in] amount_axis (uint): Amount of canvases that should be organized by the created grid specification. 226 @param[in] max_row_size (max_row_size): Maximum number of canvases on one row. 228 @return (gridspec.GridSpec) Grid specification to place canvases on figure. 231 row_size = amount_axis
232 if row_size > max_row_size:
233 row_size = max_row_size
235 col_size = math.ceil(amount_axis / row_size)
236 return gridspec.GridSpec(col_size, row_size)
239 def __create_pairs(self, dimension, acceptable_pairs):
241 @brief Create coordinate pairs that should be displayed. 243 @param[in] dimension (uint): Data-space dimension. 244 @param[in] acceptable_pairs (list): List of coordinate pairs that should be displayed. 246 @return (list) List of coordinate pairs that should be displayed. 249 if len(acceptable_pairs) > 0:
250 return acceptable_pairs
252 return list(itertools.combinations(range(dimension), 2))
255 def __create_canvas(self, dimension, pairs, position, **kwargs):
257 @brief Create new canvas with user defined parameters to display cluster or chunk of cluster on it. 259 @param[in] dimension (uint): Data-space dimension. 260 @param[in] pairs (list): Pair of coordinates that will be displayed on the canvas. If empty than label will not 261 be displayed on the canvas. 262 @param[in] position (uint): Index position of canvas on a grid. 263 @param[in] **kwargs: Arbitrary keyword arguments (available arguments: 'visible_axis' 'visible_labels', 'visible_grid'). 265 <b>Keyword Args:</b><br> 266 - visible_axis (bool): Defines visibility of axes on each canvas, if True - axes are visible. 267 By default axis are not displayed. 268 - visible_labels (bool): Defines visibility of labels on each canvas, if True - labels is displayed. 269 By default labels are displayed. 270 - visible_grid (bool): Defines visibility of grid on each canvas, if True - grid is displayed. 271 By default grid is displayed. 273 @return (matplotlib.Axis) Canvas to display cluster of chuck of cluster. 276 visible_grid = kwargs.get(
'visible_grid',
True)
277 visible_labels = kwargs.get(
'visible_labels',
True)
278 visible_axis = kwargs.get(
'visible_axis',
False)
284 ax.set_xlabel(
"x%d" % pairs[position][0])
285 ax.set_ylabel(
"x%d" % pairs[position][1])
287 ax.set_ylim(-0.5, 0.5)
288 ax.set_yticklabels([])
294 ax.set_yticklabels([])
295 ax.set_xticklabels([])
300 def __draw_canvas_cluster(self, axis_storage, cluster_descr, pairs):
302 @brief Draw clusters. 304 @param[in] axis_storage (list): List of matplotlib axis where cluster dimensional chunks are displayed. 305 @param[in] cluster_descr (canvas_cluster_descr): Canvas cluster descriptor that should be displayed. 306 @param[in] pairs (list): List of coordinates that should be displayed. 310 for index_axis
in range(len(axis_storage)):
311 for item
in cluster_descr.cluster:
318 def __draw_cluster_item_multi_dimension(self, ax, pair, item, cluster_descr):
320 @brief Draw cluster chunk defined by pair coordinates in data space with dimension greater than 1. 322 @param[in] ax (axis): Matplotlib axis that is used to display chunk of cluster point. 323 @param[in] pair (list): Coordinate of the point that should be displayed. 324 @param[in] item (list): Data point or index of data point. 325 @param[in] cluster_descr (canvas_cluster_descr): Cluster description whose point is visualized. 329 index_dimension1 = pair[0]
330 index_dimension2 = pair[1]
332 if cluster_descr.data
is None:
333 ax.plot(item[index_dimension1], item[index_dimension2],
334 color=cluster_descr.color, marker=cluster_descr.marker, markersize=cluster_descr.markersize)
336 ax.plot(cluster_descr.data[item][index_dimension1], cluster_descr.data[item][index_dimension2],
337 color=cluster_descr.color, marker=cluster_descr.marker, markersize=cluster_descr.markersize)
340 def __draw_cluster_item_one_dimension(self, ax, item, cluster_descr):
342 @brief Draw cluster point in one dimensional data space.. 344 @param[in] ax (axis): Matplotlib axis that is used to display chunk of cluster point. 345 @param[in] item (list): Data point or index of data point. 346 @param[in] cluster_descr (canvas_cluster_descr): Cluster description whose point is visualized. 350 if cluster_descr.data
is None:
351 ax.plot(item[0], 0.0,
352 color=cluster_descr.color, marker=cluster_descr.marker, markersize=cluster_descr.markersize)
354 ax.plot(cluster_descr.data[item][0], 0.0,
355 color=cluster_descr.color, marker=cluster_descr.marker, markersize=cluster_descr.markersize)
361 @brief Common visualizer of clusters on 1D, 2D or 3D surface. 362 @details Use 'cluster_visualizer_multidim' visualizer in case of data dimension is greater than 3. 364 @see cluster_visualizer_multidim 368 def __init__(self, number_canvases = 1, size_row = 1, titles = None):
370 @brief Constructor of cluster visualizer. 372 @param[in] number_canvases (uint): Number of canvases that is used for visualization. 373 @param[in] size_row (uint): Amount of canvases that can be placed in one row. 374 @param[in] titles (list): List of canvas's titles. 378 # load 2D data sample 379 sample_2d = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE1); 381 # load 3D data sample 382 sample_3d = read_sample(FCPS_SAMPLES.SAMPLE_HEPTA); 384 # extract clusters from the first sample using DBSCAN algorithm 385 dbscan_instance = dbscan(sample_2d, 0.4, 2, False); 386 dbscan_instance.process(); 387 clusters_sample_2d = dbscan_instance.get_clusters(); 389 # extract clusters from the second sample using DBSCAN algorithm 390 dbscan_instance = dbscan(sample_3d, 1, 3, True); 391 dbscan_instance.process(); 392 clusters_sample_3d = dbscan_instance.get_clusters(); 394 # create plot with two canvases where each row contains 2 canvases. 397 visualizer = cluster_visualizer(size, row_size); 399 # place clustering result of sample_2d to the first canvas 400 visualizer.append_clusters(clusters_sample_2d, sample_2d, 0, markersize = 5); 402 # place clustering result of sample_3d to the second canvas 403 visualizer.append_clusters(clusters_sample_3d, sample_3d, 1, markersize = 30); 417 if titles
is not None:
424 def append_cluster(self, cluster, data = None, canvas = 0, marker = '.', markersize = None, color = None):
426 @brief Appends cluster to canvas for drawing. 428 @param[in] cluster (list): cluster that may consist of indexes of objects from the data or object itself. 429 @param[in] data (list): If defines that each element of cluster is considered as a index of object from the data. 430 @param[in] canvas (uint): Number of canvas that should be used for displaying cluster. 431 @param[in] marker (string): Marker that is used for displaying objects from cluster on the canvas. 432 @param[in] markersize (uint): Size of marker. 433 @param[in] color (string): Color of marker. 435 @return Returns index of cluster descriptor on the canvas. 439 if len(cluster) == 0:
443 raise ValueError(
"Canvas index '%d' is out of range [0; %d]." % self.
__number_canvases or canvas)
447 color = color_list.TITLES[index_color]
453 dimension = len(cluster[0])
457 raise ValueError(
"Only clusters with the same dimension of objects can be displayed on canvas.")
460 dimension = len(data[0])
464 raise ValueError(
"Only clusters with the same dimension of objects can be displayed on canvas.")
466 if (dimension < 1)
or (dimension > 3):
467 raise ValueError(
"Only objects with size dimension 1 (1D plot), 2 (2D plot) or 3 (3D plot) " 468 "can be displayed. For multi-dimensional data use 'cluster_visualizer_multidim'.")
470 if markersize
is None:
471 if (dimension == 1)
or (dimension == 2):
481 @brief Append cluster attribure for cluster on specific canvas. 482 @details Attribute it is data that is visualized for specific cluster using its color, marker and markersize if last two is not specified. 484 @param[in] index_canvas (uint): Index canvas where cluster is located. 485 @param[in] index_cluster (uint): Index cluster whose attribute should be added. 486 @param[in] data (list): List of points (data) that represents attribute. 487 @param[in] marker (string): Marker that is used for displaying objects from cluster on the canvas. 488 @param[in] markersize (uint): Size of marker. 493 attribute_marker = marker
494 if attribute_marker
is None:
495 attribute_marker = cluster_descr.marker
497 attribure_markersize = markersize
498 if attribure_markersize
is None:
499 attribure_markersize = cluster_descr.markersize
501 attribute_color = cluster_descr.color
503 added_attribute_cluster_descriptor =
canvas_cluster_descr(data,
None, attribute_marker, attribure_markersize, attribute_color)
504 self.
__canvas_clusters[index_canvas][index_cluster].attributes.append(added_attribute_cluster_descriptor)
507 def append_clusters(self, clusters, data = None, canvas = 0, marker = '.', markersize = None):
509 @brief Appends list of cluster to canvas for drawing. 511 @param[in] clusters (list): List of clusters where each cluster may consist of indexes of objects from the data or object itself. 512 @param[in] data (list): If defines that each element of cluster is considered as a index of object from the data. 513 @param[in] canvas (uint): Number of canvas that should be used for displaying clusters. 514 @param[in] marker (string): Marker that is used for displaying objects from clusters on the canvas. 515 @param[in] markersize (uint): Size of marker. 519 for cluster
in clusters:
525 @brief Set title for specified canvas. 527 @param[in] text (string): Title for canvas. 528 @param[in] canvas (uint): Index of canvas where title should be displayed. 533 raise NameError(
'Canvas does ' + canvas +
' not exists.')
540 @brief Returns cluster color on specified canvas. 546 def show(self, figure=None, invisible_axis=True, visible_grid=True, display=True, shift=None):
548 @brief Shows clusters (visualize). 550 @param[in] figure (fig): Defines requirement to use specified figure, if None - new figure is created for drawing clusters. 551 @param[in] invisible_axis (bool): Defines visibility of axes on each canvas, if True - axes are invisible. 552 @param[in] visible_grid (bool): Defines visibility of grid on each canvas, if True - grid is displayed. 553 @param[in] display (bool): Defines requirement to display clusters on a stage, if True - clusters are displayed, 554 if False - plt.show() should be called by user." 555 @param[in] shift (uint): Force canvas shift value - defines canvas index from which custers should be visualized. 557 @return (fig) Figure where clusters are shown. 562 if canvas_shift
is None:
563 if figure
is not None:
564 canvas_shift = len(figure.get_axes())
568 if figure
is not None:
569 cluster_figure = figure
571 cluster_figure = plt.figure()
574 maximum_rows = math.ceil( (self.
__number_canvases + canvas_shift) / maximum_cols)
576 grid_spec = gridspec.GridSpec(maximum_rows, maximum_cols)
580 if len(canvas_data) == 0:
586 if (dimension == 1)
or (dimension == 2):
587 ax = cluster_figure.add_subplot(grid_spec[index_canvas + canvas_shift])
589 ax = cluster_figure.add_subplot(grid_spec[index_canvas + canvas_shift], projection=
'3d')
591 if len(canvas_data) == 0:
592 plt.setp(ax, visible=
False)
594 for cluster_descr
in canvas_data:
597 for attribute_descr
in cluster_descr.attributes:
600 if invisible_axis
is True:
601 ax.xaxis.set_ticklabels([])
602 ax.yaxis.set_ticklabels([])
605 ax.zaxis.set_ticklabels([])
610 ax.grid(visible_grid)
615 return cluster_figure
618 def __draw_canvas_cluster(self, ax, dimension, cluster_descr):
620 @brief Draw canvas cluster descriptor. 622 @param[in] ax (Axis): Axis of the canvas where canvas cluster descriptor should be displayed. 623 @param[in] dimension (uint): Canvas dimension. 624 @param[in] cluster_descr (canvas_cluster_descr): Canvas cluster descriptor that should be displayed. 626 @return (fig) Figure where clusters are shown. 630 cluster = cluster_descr.cluster
631 data = cluster_descr.data
632 marker = cluster_descr.marker
633 markersize = cluster_descr.markersize
634 color = cluster_descr.color
639 ax.plot(item[0], 0.0, color = color, marker = marker, markersize = markersize)
641 ax.plot(data[item][0], 0.0, color = color, marker = marker, markersize = markersize)
645 ax.plot(item[0], item[1], color = color, marker = marker, markersize = markersize)
647 ax.plot(data[item][0], data[item][1], color = color, marker = marker, markersize = markersize)
651 ax.scatter(item[0], item[1], item[2], c = color, marker = marker, s = markersize)
653 ax.scatter(data[item][0], data[item][1], data[item][2], c = color, marker = marker, s = markersize)
def show(self, figure=None, invisible_axis=True, visible_grid=True, display=True, shift=None)
Shows clusters (visualize).
Common visualizer of clusters on 1D, 2D or 3D surface.
def append_cluster(self, cluster, data=None, canvas=0, marker='.', markersize=None, color=None)
Appends cluster to canvas for drawing.
def get_cluster_color(self, index_cluster, index_canvas)
Returns cluster color on specified canvas.
def append_cluster(self, cluster, data=None, marker='.', markersize=None, color=None)
Appends cluster for visualization.
def set_canvas_title(self, text, canvas=0)
Set title for specified canvas.
def __init__(self)
Constructs cluster visualizer for multidimensional data.
def __create_canvas(self, dimension, pairs, position, kwargs)
Create new canvas with user defined parameters to display cluster or chunk of cluster on it...
def append_cluster_attribute(self, index_canvas, index_cluster, data, marker=None, markersize=None)
Append cluster attribure for cluster on specific canvas.
markersize
Size of marker that is used for drawing objects.
data
Data where objects are stored.
def show(self, pair_filter=None, kwargs)
Shows clusters (visualize) in multi-dimensional space.
def __draw_canvas_cluster(self, ax, dimension, cluster_descr)
Draw canvas cluster descriptor.
Colors used by pyclustering library for visualization.
def __create_grid_spec(self, amount_axis, max_row_size)
Create grid specification for figure to place canvases.
def __init__(self, cluster, data, marker, markersize, color)
Constructor of cluster representation on the canvas.
marker
Marker that is used for drawing objects.
attributes
Attribures of the clusters - additional collections of data points that are regarded to the cluster...
def append_clusters(self, clusters, data=None, marker='.', markersize=None)
Appends list of cluster for visualization.
def __draw_cluster_item_multi_dimension(self, ax, pair, item, cluster_descr)
Draw cluster chunk defined by pair coordinates in data space with dimension greater than 1...
def __draw_cluster_item_one_dimension(self, ax, item, cluster_descr)
Draw cluster point in one dimensional data space.
Description of cluster for representation on canvas.
def __init__(self, number_canvases=1, size_row=1, titles=None)
Constructor of cluster visualizer.
Visualizer for cluster in multi-dimensional data.
cluster
Cluster that may consist of objects or indexes of objects from data.
def append_clusters(self, clusters, data=None, canvas=0, marker='.', markersize=None)
Appends list of cluster to canvas for drawing.
color
Color that is used for coloring marker.
def __create_pairs(self, dimension, acceptable_pairs)
Create coordinate pairs that should be displayed.
def __draw_canvas_cluster(self, axis_storage, cluster_descr, pairs)
Draw clusters.