3 @brief pyclustering module for cluster analysis. 5 @authors Andrei Novikov (pyclustering@yandex.ru) 7 @copyright GNU Public License 9 @cond GNU_PUBLIC_LICENSE 10 PyClustering is free software: you can redistribute it and/or modify 11 it under the terms of the GNU General Public License as published by 12 the Free Software Foundation, either version 3 of the License, or 13 (at your option) any later version. 15 PyClustering is distributed in the hope that it will be useful, 16 but WITHOUT ANY WARRANTY; without even the implied warranty of 17 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 18 GNU General Public License for more details. 20 You should have received a copy of the GNU General Public License 21 along with this program. If not, see <http://www.gnu.org/licenses/>. 31 import matplotlib.pyplot
as plt
32 import matplotlib.gridspec
as gridspec
34 warnings.warn(
"Impossible to import matplotlib (please, install 'matplotlib'), pyclustering's visualization " 35 "functionality is not available.")
42 @brief Description of cluster for representation on canvas. 46 def __init__(self, cluster, data, marker, markersize, color):
48 @brief Constructor of cluster representation on the canvas. 50 @param[in] cluster (list): Single cluster that consists of objects or indexes from data. 51 @param[in] data (list): Objects that should be displayed, can be None if clusters consist of objects instead of indexes. 52 @param[in] marker (string): Type of marker that is used for drawing objects. 53 @param[in] markersize (uint): Size of marker that is used for drawing objects. 54 @param[in] color (string): Color of the marker that is used for drawing objects. 78 @brief Visualizer for cluster in multi-dimensional data. 79 @details This cluster visualizer is useful for clusters in data whose dimension is greater than 3. The 80 multidimensional visualizer helps to overcome 'cluster_visualizer' shortcoming - ability to display 81 clusters in 1D, 2D or 3D dimensional data space. 83 Example of clustering results visualization where 'Iris' is used: 85 from pyclustering.utils import read_sample 86 from pyclustering.samples.definitions import FAMOUS_SAMPLES 87 from pyclustering.cluster import cluster_visualizer_multidim 89 # load 4D data sample 'Iris' 90 sample_4d = read_sample(FAMOUS_SAMPLES.SAMPLE_IRIS) 92 # initialize 3 initial centers using K-Means++ algorithm 93 centers = kmeans_plusplus_initializer(sample_4d, 3).initialize() 95 # performs cluster analysis using X-Means 96 xmeans_instance = xmeans(sample_4d, centers) 97 xmeans_instance.process() 98 clusters = xmeans_instance.get_clusters() 100 # visualize obtained clusters in multi-dimensional space 101 visualizer = cluster_visualizer_multidim() 102 visualizer.append_clusters(clusters, sample_4d) 103 visualizer.show(max_row_size=3) 106 Visualized clustering results of 'Iris' data (multi-dimensional data): 107 @image html xmeans_clustering_famous_iris.png "Fig. 1. X-Means clustering results (data 'Iris')." 109 Sometimes no need to display results in all dimensions. Parameter 'filter' can be used to display only 110 interesting coordinate pairs. Here is an example of visualization of pair coordinates (x0, x1) and (x0, x2) for 111 previous clustering results: 113 visualizer = cluster_visualizer_multidim() 114 visualizer.append_clusters(clusters, sample_4d) 115 visualizer.show(pair_filter=[[0, 1], [0, 2]]) 118 Visualized results of specified coordinate pairs: 119 @image html xmeans_clustering_famous_iris_filtered.png "Fig. 2. X-Means clustering results (x0, x1) and (x0, x2) (data 'Iris')." 125 @brief Constructs cluster visualizer for multidimensional data. 126 @details The visualizer is suitable more data whose dimension is bigger than 3. 134 def append_cluster(self, cluster, data = None, marker = '.', markersize = None, color = None):
136 @brief Appends cluster for visualization. 138 @param[in] cluster (list): cluster that may consist of indexes of objects from the data or object itself. 139 @param[in] data (list): If defines that each element of cluster is considered as a index of object from the data. 140 @param[in] marker (string): Marker that is used for displaying objects from cluster on the canvas. 141 @param[in] markersize (uint): Size of marker. 142 @param[in] color (string): Color of marker. 144 @return Returns index of cluster descriptor on the canvas. 147 if len(cluster) == 0:
148 raise ValueError(
"Empty cluster is provided.")
150 markersize = markersize
or 5
152 index_color = len(self.
__clusters) % len(color_list.TITLES)
153 color = color_list.TITLES[index_color]
161 @brief Appends list of cluster for visualization. 163 @param[in] clusters (list): List of clusters where each cluster may consist of indexes of objects from the data or object itself. 164 @param[in] data (list): If defines that each element of cluster is considered as a index of object from the data. 165 @param[in] marker (string): Marker that is used for displaying objects from clusters on the canvas. 166 @param[in] markersize (uint): Size of marker. 170 for cluster
in clusters:
174 def save(self, filename, **kwargs):
177 @brief Saves figure to the specified file. 179 @param[in] filename (string): File where the visualized clusters should be stored. 180 @param[in] **kwargs: Arbitrary keyword arguments (available arguments: 'visible_axis' 'visible_labels', 'visible_grid', 'row_size', 'show'). 182 <b>Keyword Args:</b><br> 183 - visible_axis (bool): Defines visibility of axes on each canvas, if True - axes are visible. 184 By default axis of each canvas are not displayed. 185 - visible_labels (bool): Defines visibility of labels on each canvas, if True - labels is displayed. 186 By default labels of each canvas are displayed. 187 - visible_grid (bool): Defines visibility of grid on each canvas, if True - grid is displayed. 188 By default grid of each canvas is displayed. 189 - max_row_size (uint): Maximum number of canvases on one row. 193 if len(filename) == 0:
194 raise ValueError(
"Impossible to save visualization to file: empty file path is specified.")
197 visible_axis=kwargs.get(
'visible_axis',
False),
198 visible_labels=kwargs.get(
'visible_labels',
True),
199 visible_grid=kwargs.get(
'visible_grid',
True),
200 max_row_size=kwargs.get(
'max_row_size', 4))
201 plt.savefig(filename)
204 def show(self, pair_filter=None, **kwargs):
206 @brief Shows clusters (visualize) in multi-dimensional space. 208 @param[in] pair_filter (list): List of coordinate pairs that should be displayed. This argument is used as a filter. 209 @param[in] **kwargs: Arbitrary keyword arguments (available arguments: 'visible_axis' 'visible_labels', 'visible_grid', 'row_size', 'show'). 211 <b>Keyword Args:</b><br> 212 - visible_axis (bool): Defines visibility of axes on each canvas, if True - axes are visible. 213 By default axis of each canvas are not displayed. 214 - visible_labels (bool): Defines visibility of labels on each canvas, if True - labels is displayed. 215 By default labels of each canvas are displayed. 216 - visible_grid (bool): Defines visibility of grid on each canvas, if True - grid is displayed. 217 By default grid of each canvas is displayed. 218 - max_row_size (uint): Maximum number of canvases on one row. By default the maximum value is 4. 219 - show (bool): If True - then displays visualized clusters. By default is `True`. 224 raise ValueError(
"There is no non-empty clusters for visualization.")
227 dimension = len(cluster_data[0])
229 acceptable_pairs = pair_filter
or []
236 amount_axis = len(pairs)
241 for index
in range(amount_axis):
243 axis_storage.append(ax)
248 if kwargs.get(
'show',
True):
252 def __create_grid_spec(self, amount_axis, max_row_size):
254 @brief Create grid specification for figure to place canvases. 256 @param[in] amount_axis (uint): Amount of canvases that should be organized by the created grid specification. 257 @param[in] max_row_size (max_row_size): Maximum number of canvases on one row. 259 @return (gridspec.GridSpec) Grid specification to place canvases on figure. 262 row_size = amount_axis
263 if row_size > max_row_size:
264 row_size = max_row_size
266 col_size = math.ceil(amount_axis / row_size)
267 return gridspec.GridSpec(col_size, row_size)
270 def __create_pairs(self, dimension, acceptable_pairs):
272 @brief Create coordinate pairs that should be displayed. 274 @param[in] dimension (uint): Data-space dimension. 275 @param[in] acceptable_pairs (list): List of coordinate pairs that should be displayed. 277 @return (list) List of coordinate pairs that should be displayed. 280 if len(acceptable_pairs) > 0:
281 return acceptable_pairs
283 return list(itertools.combinations(range(dimension), 2))
286 def __create_canvas(self, dimension, pairs, position, **kwargs):
288 @brief Create new canvas with user defined parameters to display cluster or chunk of cluster on it. 290 @param[in] dimension (uint): Data-space dimension. 291 @param[in] pairs (list): Pair of coordinates that will be displayed on the canvas. If empty than label will not 292 be displayed on the canvas. 293 @param[in] position (uint): Index position of canvas on a grid. 294 @param[in] **kwargs: Arbitrary keyword arguments (available arguments: 'visible_axis' 'visible_labels', 'visible_grid'). 296 <b>Keyword Args:</b><br> 297 - visible_axis (bool): Defines visibility of axes on each canvas, if True - axes are visible. 298 By default axis are not displayed. 299 - visible_labels (bool): Defines visibility of labels on each canvas, if True - labels is displayed. 300 By default labels are displayed. 301 - visible_grid (bool): Defines visibility of grid on each canvas, if True - grid is displayed. 302 By default grid is displayed. 304 @return (matplotlib.Axis) Canvas to display cluster of chuck of cluster. 307 visible_grid = kwargs.get(
'visible_grid',
True)
308 visible_labels = kwargs.get(
'visible_labels',
True)
309 visible_axis = kwargs.get(
'visible_axis',
False)
315 ax.set_xlabel(
"x%d" % pairs[position][0])
316 ax.set_ylabel(
"x%d" % pairs[position][1])
318 ax.set_ylim(-0.5, 0.5)
319 ax.set_yticklabels([])
325 ax.set_yticklabels([])
326 ax.set_xticklabels([])
331 def __draw_canvas_cluster(self, axis_storage, cluster_descr, pairs):
333 @brief Draw clusters. 335 @param[in] axis_storage (list): List of matplotlib axis where cluster dimensional chunks are displayed. 336 @param[in] cluster_descr (canvas_cluster_descr): Canvas cluster descriptor that should be displayed. 337 @param[in] pairs (list): List of coordinates that should be displayed. 341 for index_axis
in range(len(axis_storage)):
342 for item
in cluster_descr.cluster:
349 def __draw_cluster_item_multi_dimension(self, ax, pair, item, cluster_descr):
351 @brief Draw cluster chunk defined by pair coordinates in data space with dimension greater than 1. 353 @param[in] ax (axis): Matplotlib axis that is used to display chunk of cluster point. 354 @param[in] pair (list): Coordinate of the point that should be displayed. 355 @param[in] item (list): Data point or index of data point. 356 @param[in] cluster_descr (canvas_cluster_descr): Cluster description whose point is visualized. 360 index_dimension1 = pair[0]
361 index_dimension2 = pair[1]
363 if cluster_descr.data
is None:
364 ax.plot(item[index_dimension1], item[index_dimension2],
365 color=cluster_descr.color, marker=cluster_descr.marker, markersize=cluster_descr.markersize)
367 ax.plot(cluster_descr.data[item][index_dimension1], cluster_descr.data[item][index_dimension2],
368 color=cluster_descr.color, marker=cluster_descr.marker, markersize=cluster_descr.markersize)
371 def __draw_cluster_item_one_dimension(self, ax, item, cluster_descr):
373 @brief Draw cluster point in one dimensional data space.. 375 @param[in] ax (axis): Matplotlib axis that is used to display chunk of cluster point. 376 @param[in] item (list): Data point or index of data point. 377 @param[in] cluster_descr (canvas_cluster_descr): Cluster description whose point is visualized. 381 if cluster_descr.data
is None:
382 ax.plot(item[0], 0.0,
383 color=cluster_descr.color, marker=cluster_descr.marker, markersize=cluster_descr.markersize)
385 ax.plot(cluster_descr.data[item][0], 0.0,
386 color=cluster_descr.color, marker=cluster_descr.marker, markersize=cluster_descr.markersize)
392 @brief Common visualizer of clusters on 1D, 2D or 3D surface. 393 @details Use 'cluster_visualizer_multidim' visualizer in case of data dimension is greater than 3. 395 @see cluster_visualizer_multidim 399 def __init__(self, number_canvases=1, size_row=1, titles=None):
401 @brief Constructor of cluster visualizer. 403 @param[in] number_canvases (uint): Number of canvases that is used for visualization. 404 @param[in] size_row (uint): Amount of canvases that can be placed in one row. 405 @param[in] titles (list): List of canvas's titles. 409 # load 2D data sample 410 sample_2d = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE1); 412 # load 3D data sample 413 sample_3d = read_sample(FCPS_SAMPLES.SAMPLE_HEPTA); 415 # extract clusters from the first sample using DBSCAN algorithm 416 dbscan_instance = dbscan(sample_2d, 0.4, 2, False); 417 dbscan_instance.process(); 418 clusters_sample_2d = dbscan_instance.get_clusters(); 420 # extract clusters from the second sample using DBSCAN algorithm 421 dbscan_instance = dbscan(sample_3d, 1, 3, True); 422 dbscan_instance.process(); 423 clusters_sample_3d = dbscan_instance.get_clusters(); 425 # create plot with two canvases where each row contains 2 canvases. 428 visualizer = cluster_visualizer(size, row_size); 430 # place clustering result of sample_2d to the first canvas 431 visualizer.append_clusters(clusters_sample_2d, sample_2d, 0, markersize = 5); 433 # place clustering result of sample_3d to the second canvas 434 visualizer.append_clusters(clusters_sample_3d, sample_3d, 1, markersize = 30); 448 if titles
is not None:
455 def append_cluster(self, cluster, data=None, canvas=0, marker='.', markersize=None, color=None):
457 @brief Appends cluster to canvas for drawing. 459 @param[in] cluster (list): cluster that may consist of indexes of objects from the data or object itself. 460 @param[in] data (list): If defines that each element of cluster is considered as a index of object from the data. 461 @param[in] canvas (uint): Number of canvas that should be used for displaying cluster. 462 @param[in] marker (string): Marker that is used for displaying objects from cluster on the canvas. 463 @param[in] markersize (uint): Size of marker. 464 @param[in] color (string): Color of marker. 466 @return Returns index of cluster descriptor on the canvas. 470 if len(cluster) == 0:
474 raise ValueError(
"Canvas index '%d' is out of range [0; %d]." % self.
__number_canvases or canvas)
478 color = color_list.TITLES[index_color]
484 dimension = len(cluster[0])
488 raise ValueError(
"Only clusters with the same dimension of objects can be displayed on canvas.")
491 dimension = len(data[0])
495 raise ValueError(
"Only clusters with the same dimension of objects can be displayed on canvas.")
497 if (dimension < 1)
or (dimension > 3):
498 raise ValueError(
"Only objects with size dimension 1 (1D plot), 2 (2D plot) or 3 (3D plot) " 499 "can be displayed. For multi-dimensional data use 'cluster_visualizer_multidim'.")
501 if markersize
is None:
502 if (dimension == 1)
or (dimension == 2):
512 @brief Append cluster attribure for cluster on specific canvas. 513 @details Attribute it is data that is visualized for specific cluster using its color, marker and markersize if last two is not specified. 515 @param[in] index_canvas (uint): Index canvas where cluster is located. 516 @param[in] index_cluster (uint): Index cluster whose attribute should be added. 517 @param[in] data (list): List of points (data) that represents attribute. 518 @param[in] marker (string): Marker that is used for displaying objects from cluster on the canvas. 519 @param[in] markersize (uint): Size of marker. 524 attribute_marker = marker
525 if attribute_marker
is None:
526 attribute_marker = cluster_descr.marker
528 attribure_markersize = markersize
529 if attribure_markersize
is None:
530 attribure_markersize = cluster_descr.markersize
532 attribute_color = cluster_descr.color
534 added_attribute_cluster_descriptor =
canvas_cluster_descr(data,
None, attribute_marker, attribure_markersize, attribute_color)
535 self.
__canvas_clusters[index_canvas][index_cluster].attributes.append(added_attribute_cluster_descriptor)
538 def append_clusters(self, clusters, data=None, canvas=0, marker='.', markersize=None):
540 @brief Appends list of cluster to canvas for drawing. 542 @param[in] clusters (list): List of clusters where each cluster may consist of indexes of objects from the data or object itself. 543 @param[in] data (list): If defines that each element of cluster is considered as a index of object from the data. 544 @param[in] canvas (uint): Number of canvas that should be used for displaying clusters. 545 @param[in] marker (string): Marker that is used for displaying objects from clusters on the canvas. 546 @param[in] markersize (uint): Size of marker. 550 for cluster
in clusters:
556 @brief Set title for specified canvas. 558 @param[in] text (string): Title for canvas. 559 @param[in] canvas (uint): Index of canvas where title should be displayed. 564 raise NameError(
'Canvas does ' + canvas +
' not exists.')
571 @brief Returns cluster color on specified canvas. 577 def save(self, filename, **kwargs):
580 @brief Saves figure to the specified file. 582 @param[in] filename (string): File where the visualized clusters should be stored. 583 @param[in] **kwargs: Arbitrary keyword arguments (available arguments: 'invisible_axis', 'visible_grid'). 585 <b>Keyword Args:</b><br> 586 - invisible_axis (bool): Defines visibility of axes on each canvas, if `True` - axes are invisible. 587 By default axis are invisible. 588 - visible_grid (bool): Defines visibility of grid on each canvas, if `True` - grid is displayed. 589 By default grid of each canvas is displayed. 591 There is an example how to save visualized clusters to the PNG file without showing them on a screen: 593 from pyclustering.cluster import cluster_visualizer 595 data = [[1.1], [1.7], [3.7], [5.3], [2.5], [-1.5], [-0.9], [6.3], [6.5], [8.1]] 596 clusters = [[0, 1, 2, 4, 5, 6], [3, 7, 8, 9]] 598 visualizer = cluster_visualizer() 599 visualizer.append_clusters(clusters, data) 600 visualizer.save("1-dimensional-clustering.png") 605 if len(filename) == 0:
606 raise ValueError(
"Impossible to save visualization to file: empty file path is specified.")
608 invisible_axis = kwargs.get(
'invisible_axis',
True)
609 visible_grid = kwargs.get(
'visible_grid',
True)
611 self.
show(
None, invisible_axis, visible_grid,
False)
612 plt.savefig(filename)
615 def show(self, figure=None, invisible_axis=True, visible_grid=True, display=True, shift=None):
617 @brief Shows clusters (visualize). 619 @param[in] figure (fig): Defines requirement to use specified figure, if None - new figure is created for drawing clusters. 620 @param[in] invisible_axis (bool): Defines visibility of axes on each canvas, if True - axes are invisible. 621 @param[in] visible_grid (bool): Defines visibility of grid on each canvas, if True - grid is displayed. 622 @param[in] display (bool): Defines requirement to display clusters on a stage, if True - clusters are displayed, 623 if False - plt.show() should be called by user." 624 @param[in] shift (uint): Force canvas shift value - defines canvas index from which custers should be visualized. 626 @return (fig) Figure where clusters are shown. 631 if canvas_shift
is None:
632 if figure
is not None:
633 canvas_shift = len(figure.get_axes())
637 if figure
is not None:
638 cluster_figure = figure
640 cluster_figure = plt.figure()
643 maximum_rows = math.ceil( (self.
__number_canvases + canvas_shift) / maximum_cols)
645 grid_spec = gridspec.GridSpec(maximum_rows, maximum_cols)
649 if len(canvas_data) == 0:
655 if (dimension == 1)
or (dimension == 2):
656 ax = cluster_figure.add_subplot(grid_spec[index_canvas + canvas_shift])
658 ax = cluster_figure.add_subplot(grid_spec[index_canvas + canvas_shift], projection=
'3d')
660 if len(canvas_data) == 0:
661 plt.setp(ax, visible=
False)
663 for cluster_descr
in canvas_data:
666 for attribute_descr
in cluster_descr.attributes:
669 if invisible_axis
is True:
670 ax.xaxis.set_ticklabels([])
671 ax.yaxis.set_ticklabels([])
674 ax.zaxis.set_ticklabels([])
679 ax.grid(visible_grid)
684 return cluster_figure
687 def __draw_canvas_cluster(self, ax, dimension, cluster_descr):
689 @brief Draw canvas cluster descriptor. 691 @param[in] ax (Axis): Axis of the canvas where canvas cluster descriptor should be displayed. 692 @param[in] dimension (uint): Canvas dimension. 693 @param[in] cluster_descr (canvas_cluster_descr): Canvas cluster descriptor that should be displayed. 695 @return (fig) Figure where clusters are shown. 699 cluster = cluster_descr.cluster
700 data = cluster_descr.data
701 marker = cluster_descr.marker
702 markersize = cluster_descr.markersize
703 color = cluster_descr.color
708 ax.plot(item[0], 0.0, color = color, marker = marker, markersize = markersize)
710 ax.plot(data[item][0], 0.0, color = color, marker = marker, markersize = markersize)
714 ax.plot(item[0], item[1], color = color, marker = marker, markersize = markersize)
716 ax.plot(data[item][0], data[item][1], color = color, marker = marker, markersize = markersize)
720 ax.scatter(item[0], item[1], item[2], c = color, marker = marker, s = markersize)
722 ax.scatter(data[item][0], data[item][1], data[item][2], c = color, marker = marker, s = markersize)
def show(self, figure=None, invisible_axis=True, visible_grid=True, display=True, shift=None)
Shows clusters (visualize).
Common visualizer of clusters on 1D, 2D or 3D surface.
def append_cluster(self, cluster, data=None, canvas=0, marker='.', markersize=None, color=None)
Appends cluster to canvas for drawing.
def get_cluster_color(self, index_cluster, index_canvas)
Returns cluster color on specified canvas.
def append_cluster(self, cluster, data=None, marker='.', markersize=None, color=None)
Appends cluster for visualization.
def set_canvas_title(self, text, canvas=0)
Set title for specified canvas.
def __init__(self)
Constructs cluster visualizer for multidimensional data.
def __create_canvas(self, dimension, pairs, position, kwargs)
Create new canvas with user defined parameters to display cluster or chunk of cluster on it...
def append_cluster_attribute(self, index_canvas, index_cluster, data, marker=None, markersize=None)
Append cluster attribure for cluster on specific canvas.
markersize
Size of marker that is used for drawing objects.
data
Data where objects are stored.
def show(self, pair_filter=None, kwargs)
Shows clusters (visualize) in multi-dimensional space.
def __draw_canvas_cluster(self, ax, dimension, cluster_descr)
Draw canvas cluster descriptor.
Colors used by pyclustering library for visualization.
def save(self, filename, kwargs)
Saves figure to the specified file.
def __create_grid_spec(self, amount_axis, max_row_size)
Create grid specification for figure to place canvases.
def __init__(self, cluster, data, marker, markersize, color)
Constructor of cluster representation on the canvas.
marker
Marker that is used for drawing objects.
attributes
Attribures of the clusters - additional collections of data points that are regarded to the cluster...
def append_clusters(self, clusters, data=None, marker='.', markersize=None)
Appends list of cluster for visualization.
def __draw_cluster_item_multi_dimension(self, ax, pair, item, cluster_descr)
Draw cluster chunk defined by pair coordinates in data space with dimension greater than 1...
def __draw_cluster_item_one_dimension(self, ax, item, cluster_descr)
Draw cluster point in one dimensional data space.
Description of cluster for representation on canvas.
def save(self, filename, kwargs)
Saves figure to the specified file.
def __init__(self, number_canvases=1, size_row=1, titles=None)
Constructor of cluster visualizer.
Visualizer for cluster in multi-dimensional data.
cluster
Cluster that may consist of objects or indexes of objects from data.
def append_clusters(self, clusters, data=None, canvas=0, marker='.', markersize=None)
Appends list of cluster to canvas for drawing.
color
Color that is used for coloring marker.
def __create_pairs(self, dimension, acceptable_pairs)
Create coordinate pairs that should be displayed.
def __draw_canvas_cluster(self, axis_storage, cluster_descr, pairs)
Draw clusters.