3 @brief Cluster analysis algorithm: CLIQUE 4 @details Implementation based on paper @cite article::clique::1. 6 @authors Andrei Novikov (pyclustering@yandex.ru) 8 @copyright GNU Public License 10 @cond GNU_PUBLIC_LICENSE 11 PyClustering is free software: you can redistribute it and/or modify 12 it under the terms of the GNU General Public License as published by 13 the Free Software Foundation, either version 3 of the License, or 14 (at your option) any later version. 16 PyClustering is distributed in the hope that it will be useful, 17 but WITHOUT ANY WARRANTY; without even the implied warranty of 18 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 19 GNU General Public License for more details. 21 You should have received a copy of the GNU General Public License 22 along with this program. If not, see <http://www.gnu.org/licenses/>. 33 from pyclustering.core.wrapper
import ccore_library
35 import pyclustering.core.clique_wrapper
as wrapper
40 import matplotlib.gridspec
as gridspec
41 import matplotlib.pyplot
as plt
42 import matplotlib.patches
as patches
43 import matplotlib.animation
as animation
44 except Exception
as error_instance:
46 warnings.warn(
"Impossible to import matplotlib (please, install 'matplotlib'), pyclustering's visualization " 47 "functionality is not available (details: '%s')." % str(error_instance))
52 @brief Visualizer of CLIQUE algorithm's results. 53 @details CLIQUE visualizer provides visualization services that are specific for CLIQUE algorithm, for example, 54 to display grid and its density. 58 __maximum_density_alpha = 0.6
63 @brief Show CLIQUE blocks as a grid in data space. 64 @details Each block contains points and according to this density is displayed. CLIQUE grid helps to visualize 65 grid that was used for clustering process. 67 @param[in] cells (list): List of cells that is produced by CLIQUE algorithm. 68 @param[in] data (array_like): Input data that was used for clustering process. 71 dimension = cells[0].dimensions
75 amount_canvases = int(dimension * (dimension - 1) / 2)
78 grid_spec = gridspec.GridSpec(1, amount_canvases)
80 pairs = list(itertools.combinations(range(dimension), 2))
81 if len(pairs) == 0: pairs = [(0, 0)]
83 for index
in range(amount_canvases):
84 ax = figure.add_subplot(grid_spec[index])
85 clique_visualizer.__draw_cells(ax, cells, pairs[index])
86 clique_visualizer.__draw_two_dimension_data(ax, data, pairs[index])
94 @brief Display CLIQUE clustering results. 96 @param[in] data (list): Data that was used for clustering. 97 @param[in] clusters (array_like): Clusters that were allocated by the algorithm. 98 @param[in] noise (array_like): Noise that were allocated by the algorithm. 102 visualizer.append_clusters(clusters, data)
103 visualizer.append_cluster(noise
or [], data, marker=
'x')
108 def __draw_two_dimension_data(ax, data, pair):
110 @brief Display data in two-dimensional canvas. 112 @param[in] ax (Axis): Canvas where data should be displayed. 113 @param[in] data (list): Data points that should be displayed. 114 @param[in] pair (tuple): Pair of dimension indexes. 117 ax.set_xlabel(
"x%d" % pair[0])
118 ax.set_ylabel(
"x%d" % pair[1])
122 ax.plot(point[pair[0]], point[pair[1]], color=
'red', marker=
'.')
124 ax.plot(point[pair[0]], 0, color=
'red', marker=
'.')
125 ax.yaxis.set_ticklabels([])
129 def __draw_cells(ax, cells, pair):
132 density_scale = max(len(cell.points)
for cell
in cells)
134 clique_visualizer.__draw_cell(ax, pair, cell, density_scale)
138 def __draw_cell(ax, pair, cell, density_scale):
139 max_corner, min_corner = clique_visualizer.__get_rectangle_description(cell, pair)
141 belong_cluster = (len(cell.points) > 0)
143 if density_scale != 0.0:
144 density_scale = clique_visualizer.__maximum_density_alpha * len(cell.points) / density_scale
146 face_color = matplotlib.colors.to_rgba(
'blue', alpha=density_scale)
147 edge_color = matplotlib.colors.to_rgba(
'black', alpha=1.0)
149 rect = patches.Rectangle(min_corner, max_corner[0] - min_corner[0], max_corner[1] - min_corner[1],
151 facecolor=face_color,
152 edgecolor=edge_color,
159 def __get_rectangle_description(cell, pair):
160 max_corner, min_corner = cell.spatial_location.get_corners()
162 max_corner = [max_corner[pair[0]], max_corner[pair[1]]]
163 min_corner = [min_corner[pair[0]], min_corner[pair[1]]]
166 max_corner[1], min_corner[1] = 1.0, -1.0
168 return max_corner, min_corner
174 @brief Geometrical description of CLIQUE block in data space. 175 @details Provides services related to spatial functionality. 183 @brief Creates spatial block in data space. 185 @param[in] max_corner (array_like): Maximum corner coordinates of the block. 186 @param[in] min_corner (array_like): Minimal corner coordinates of the block. 195 @brief Returns string block description. 197 @return String representation of the block. 205 @brief Point is considered as contained if it lies in block (belong to it). 207 @return (bool) True if point is in block, otherwise False. 210 for i
in range(len(point)):
219 @brief Return spatial description of current block. 221 @return (tuple) Pair of maximum and minimum corners (max_corner, min_corner). 230 @brief CLIQUE block contains information about its logical location in grid, spatial location in data space and 231 points that are covered by the block. 235 def __init__(self, logical_location=None, spatial_location=None, points=None, visited=False):
237 @brief Initializes CLIQUE block. 239 @param[in] logical_location (list): Logical location of the block in CLIQUE grid. 240 @param[in] spatial_location (spatial_block): Spatial location in data space. 241 @param[in] points (array_like): Points that belong to this block (can be obtained by method 'capture_points', 242 this parameter is used by CLIQUE in case of processing by C++ implementation when clustering 243 result are passed back to Python code. 244 @param[in] visited (bool): Marks if block is visited during clustering process. 254 @brief Returns string representation of the block using its logical location in CLIQUE grid. 261 @brief Returns string representation of the block using its logical location in CLIQUE grid. 269 @brief Logical location is represented by coordinates in CLIQUE grid, for example, in case of 2x2 grid blocks 270 may have following coordinates: [0, 0], [0, 1], [1, 0], [1, 1]. 271 @return (list) Logical location of the block in CLIQUE grid. 276 @logical_location.setter
279 @brief Assign logical location to CLIQUE block. 281 @param[in] location (list): New logical location of the block in CLIQUE grid. 289 @brief Spatial location is represented by real data space coordinates. 290 @return (spatial_block) Spatial block that describes location in data space. 295 @spatial_location.setter
298 @brief Assign spatial location to CLIQUE block. 300 @param[in] location (spatial_block): New spatial location of the block. 308 @brief Amount of dimensions where CLIQUE block is located. 309 @return (uint) Amount of dimensions where CLIQUE block is located. 317 @brief Points that belong to the CLIQUE block. 318 @details Points are represented by indexes that correspond to points in input data space. 320 @return (array_like) Points that belong to the CLIQUE block. 330 @brief Defines whether block is visited during cluster analysis. 331 @details If cluster analysis has not been performed then value will False. 333 @return (bool) True if block has been visited during processing, False otherwise. 341 @brief Marks or unmarks block as a visited. 342 @details This setter is used by CLIQUE algorithm. 344 @param[in] visited (bool): New visited state for the CLIQUE block. 352 @brief Finds points that belong to this block using availability map to reduce computational complexity by 353 checking whether the point belongs to the block. 354 @details Algorithm complexity of this method is O(n). 356 @param[in] data (array_like): Data where points are represented as coordinates. 357 @param[in] point_availability (array_like): Contains boolean values that denote whether point is already belong 358 to another CLIQUE block. 361 for index_point
in range(len(data)):
362 if (point_availability[index_point]
is True)
and (data[index_point]
in self.
__spatial_location):
364 point_availability[index_point] =
False 369 @brief Forms list of logical location of each neighbor for this particular CLIQUE block. 371 @param[in] edge (uint): Amount of intervals in each dimension that is used for clustering process. 373 @return (list) Logical location of each neighbor for this particular CLIQUE block. 381 position[index_dimension] += 1
382 neighbors.append(position)
386 position[index_dimension] -= 1
387 neighbors.append(position)
395 @brief Coordinate iterator is used to generate logical location description for each CLIQUE block. 396 @details This class is used by CLIQUE algorithm for clustering process. 402 @brief Initializes coordinate iterator for CLIQUE algorithm. 404 @param[in] dimension (uint): Amount of dimensions in input data space. 405 @param[in] intervals (uint): Amount of intervals in each dimension. 415 @brief Returns current block coordinate. 423 @brief Forms logical location for next block. 439 @brief Class implements CLIQUE grid based clustering algorithm. 440 @details CLIQUE automatically finnds subspaces with high-density clusters. It produces identical results 441 irrespective of the order in which the input records are presented and it does not presume any canonical 442 distribution for input data @cite article::clique::1. 444 Here is an example where data in two-dimensional space is clustered using CLIQUE algorithm: 446 from pyclustering.cluster.clique import clique, clique_visualizer 447 from pyclustering.utils import read_sample 448 from pyclustering.samples.definitions import FCPS_SAMPLES 450 # read two-dimensional input data 'Target' 451 data = read_sample(FCPS_SAMPLES.SAMPLE_TARGET) 453 # create CLIQUE algorithm for processing 454 intervals = 10 # defines amount of cells in grid in each dimension 455 threshold = 0 # lets consider each point as non-outlier 456 clique_instance = clique(data, intervals, threshold) 458 # start clustering process and obtain results 459 clique_instance.process() 460 clusters = clique_instance.get_clusters() # allocated clusters 461 noise = clique_instance.get_noise() # points that are considered as outliers (in this example should be empty) 462 cells = clique_instance.get_cells() # CLIQUE blocks that forms grid 464 print("Amount of clusters:", len(clusters)) 466 # visualize clustering results 467 clique_visualizer.show_grid(cells, data) # show grid that has been formed by the algorithm 468 clique_visualizer.show_clusters(data, clusters, noise) # show clustering results 471 In this example 6 clusters are allocated including four small cluster where each such small cluster consists of 472 three points. There are visualized clustering results - grid that has been formed by CLIQUE algorithm with 473 density and clusters itself: 474 @image html clique_clustering_target.png "Fig. 1. CLIQUE clustering results (grid and clusters itself)." 476 Sometimes such small clusters should be considered as outliers taking into account fact that two clusters in the 477 central are relatively huge. To treat them as a noise threshold value should be increased: 480 threshold = 3 # block that contains 3 or less points is considered as a outlier as well as its points 481 clique_instance = clique(data, intervals, threshold) 484 Two clusters are allocated, but in this case some points in cluster-"circle" are also considered as outliers, 485 because CLIQUE operates with blocks, not with points: 486 @image html clique_clustering_with_noise.png "Fig. 2. Noise allocation by CLIQUE." 490 def __init__(self, data, amount_intervals, density_threshold, **kwargs):
492 @brief Create CLIQUE clustering algorithm. 494 @param[in] data (list): Input data (list of points) that should be clustered. 495 @param[in] amount_intervals (uint): Amount of intervals in each dimension that defines amount of CLIQUE block 496 as \f[N_{blocks} = intervals^{dimensions}\f]. 497 @param[in] density_threshold (uint): Minimum number of points that should contain CLIQUE block to consider its 498 points as non-outliers. 499 @param[in] **kwargs: Arbitrary keyword arguments (available arguments: 'ccore'). 501 <b>Keyword Args:</b><br> 502 - ccore (bool): By default is True. If True then C++ implementation is used for cluster analysis, otherwise 503 Python implementation is used. 510 self.
__ccore = kwargs.get(
'ccore',
True)
512 self.
__ccore = ccore_library.workable()
525 @brief Performs clustering process in line with rules of CLIQUE clustering algorithm. 527 @return (clique) Returns itself (CLIQUE instance). 545 @brief Returns allocated clusters. 547 @remark Allocated clusters are returned only after data processing (method process()). Otherwise empty list is returned. 549 @return (list) List of allocated clusters, each cluster contains indexes of objects in list of data. 560 @brief Returns allocated noise. 562 @remark Allocated noise is returned only after data processing (method process()). Otherwise empty list is returned. 564 @return (list) List of indexes that are marked as a noise. 575 @brief Returns CLIQUE blocks that are formed during clustering process. 576 @details CLIQUE blocks can be used for visualization purposes. Each CLIQUE block contain its logical location 577 in grid, spatial location in data space and points that belong to block. 579 @return (list) List of CLIQUE blocks. 587 @brief Returns clustering result representation type that indicate how clusters are encoded. 589 @return (type_encoding) Clustering result representation. 595 return type_encoding.CLUSTER_INDEX_LIST_SEPARATION
598 def __process_by_ccore(self):
600 @brief Performs cluster analysis using C++ implementation of CLIQUE algorithm that is used by default if 601 user's target platform is supported. 604 (self.
__clusters, self.
__noise, block_logical_locations, block_max_corners, block_min_corners, block_points) = \
607 amount_cells = len(block_logical_locations)
608 for i
in range(amount_cells):
615 def __process_by_python(self):
617 @brief Performs cluster analysis using Python implementation of CLIQUE algorithm. 626 def __validate_arguments(self):
628 @brief Check input arguments of CLIQUE algorithm and if one of them is not correct then appropriate exception 634 raise ValueError(
"Empty input data. Data should contain at least one point.")
637 raise ValueError(
"Incorrect amount of intervals '%d'. Amount of intervals value should be greater than 0." % self.
__amount_intervals)
640 raise ValueError(
"Incorrect density threshold '%f'. Density threshold should not be negative." % self.
__density_threshold)
643 def __allocate_clusters(self):
645 @brief Performs cluster analysis using formed CLIQUE blocks. 649 if cell.visited
is False:
653 def __expand_cluster(self, cell):
655 @brief Tries to expand cluster from specified cell. 656 @details During expanding points are marked as noise or append to new cluster. 658 @param[in] cell (clique_block): CLIQUE block from that cluster should be expanded. 664 if len(cell.points) > 0:
665 self.
__noise.extend(cell.points)
669 cluster = cell.points[:]
672 for neighbor
in neighbors:
674 cluster.extend(neighbor.points)
677 elif len(neighbor.points) > 0:
678 self.
__noise.extend(neighbor.points)
683 def __get_neighbors(self, cell):
685 @brief Returns neighbors for specified CLIQUE block as clique_block objects. 687 @return (list) Neighbors as clique_block objects. 693 for i
in range(len(location_neighbors)):
697 if not candidate_neighbor.visited:
698 candidate_neighbor.visited =
True 699 neighbors.append(candidate_neighbor)
704 def __create_grid(self):
706 @brief Creates CLIQUE grid that consists of CLIQUE blocks for clustering process. 710 dimension = len(self.
__data[0])
712 cell_sizes = [dimension_length / self.
__amount_intervals for dimension_length
in data_sizes]
717 point_availability = [
True] * len(self.
__data)
719 for index_cell
in range(len(self.
__cells)):
720 logical_location = iterator.get_coordinate()
723 self.
__cells[index_cell].logical_location = logical_location[:]
725 cur_max_corner, cur_min_corner = self.
__get_spatial_location(logical_location, min_corner, max_corner, cell_sizes)
728 self.
__cells[index_cell].capture_points(self.
__data, point_availability)
733 def __location_to_key(self, location):
735 @brief Forms key using logical location of a CLIQUE block. 737 @return (string) Key for CLIQUE block map. 740 return ''.join(str(e) +
'.' for e
in location)
743 def __get_spatial_location(self, logical_location, min_corner, max_corner, cell_sizes):
745 @brief Calculates spatial location for CLIQUE block with logical coordinates defined by logical_location. 747 @param[in] logical_location (list): Logical location of CLIQUE block for that spatial location should be calculated. 748 @param[in] min_corner (list): Minimum corner of an input data. 749 @param[in] max_corner (list): Maximum corner of an input data. 750 @param[in] cell_sizes (list): Size of CLIQUE block in each dimension. 752 @return (list, list): Maximum and minimum corners for the specified CLIQUE block. 755 cur_min_corner = min_corner[:]
756 cur_max_corner = min_corner[:]
757 dimension = len(self.
__data[0])
758 for index_dimension
in range(dimension):
759 cur_min_corner[index_dimension] += cell_sizes[index_dimension] * logical_location[index_dimension]
762 cur_max_corner[index_dimension] = max_corner[index_dimension]
764 cur_max_corner[index_dimension] = cur_min_corner[index_dimension] + cell_sizes[index_dimension]
766 return cur_max_corner, cur_min_corner
769 def __get_data_size_derscription(self):
771 @brief Calculates input data description that is required to create CLIQUE grid. 773 @return (list, list, list): Data size in each dimension, minimum and maximum corners. 776 min_corner = self.
__data[0][:]
777 max_corner = self.
__data[0][:]
779 dimension = len(self.
__data[0])
781 for index_point
in range(1, len(self.
__data)):
782 for index_dimension
in range(dimension):
783 coordinate = self.
__data[index_point][index_dimension]
784 if coordinate > max_corner[index_dimension]:
785 max_corner[index_dimension] = coordinate
787 if coordinate < min_corner[index_dimension]:
788 min_corner[index_dimension] = coordinate
790 data_sizes = [0.0] * dimension
791 for index_dimension
in range(dimension):
792 data_sizes[index_dimension] = max_corner[index_dimension] - min_corner[index_dimension]
794 return data_sizes, min_corner, max_corner
Common visualizer of clusters on 1D, 2D or 3D surface.
pyclustering module for cluster analysis.
Geometrical description of CLIQUE block in data space.
def spatial_location(self)
Spatial location is represented by real data space coordinates.
def points(self)
Points that belong to the CLIQUE block.
def __get_data_size_derscription(self)
Calculates input data description that is required to create CLIQUE grid.
def __str__(self)
Returns string block description.
Class implements CLIQUE grid based clustering algorithm.
def __validate_arguments(self)
Check input arguments of CLIQUE algorithm and if one of them is not correct then appropriate exceptio...
def capture_points(self, data, point_availability)
Finds points that belong to this block using availability map to reduce computational complexity by c...
def get_cells(self)
Returns CLIQUE blocks that are formed during clustering process.
def __init__(self, data, amount_intervals, density_threshold, kwargs)
Create CLIQUE clustering algorithm.
Visualizer of CLIQUE algorithm's results.
def __process_by_ccore(self)
Performs cluster analysis using C++ implementation of CLIQUE algorithm that is used by default if use...
def process(self)
Performs clustering process in line with rules of CLIQUE clustering algorithm.
Module for representing clustering results.
def show_grid(cells, data)
Show CLIQUE blocks as a grid in data space.
def logical_location(self)
Logical location is represented by coordinates in CLIQUE grid, for example, in case of 2x2 grid block...
def __get_spatial_location(self, logical_location, min_corner, max_corner, cell_sizes)
Calculates spatial location for CLIQUE block with logical coordinates defined by logical_location.
def __allocate_clusters(self)
Performs cluster analysis using formed CLIQUE blocks.
def show_clusters(data, clusters, noise=None)
Display CLIQUE clustering results.
def __process_by_python(self)
Performs cluster analysis using Python implementation of CLIQUE algorithm.
def __expand_cluster(self, cell)
Tries to expand cluster from specified cell.
CLIQUE block contains information about its logical location in grid, spatial location in data space ...
def increment(self)
Forms logical location for next block.
def __repr__(self)
Returns string representation of the block using its logical location in CLIQUE grid.
def get_clusters(self)
Returns allocated clusters.
def dimensions(self)
Amount of dimensions where CLIQUE block is located.
def get_noise(self)
Returns allocated noise.
def __get_neighbors(self, cell)
Returns neighbors for specified CLIQUE block as clique_block objects.
def get_location_neighbors(self, edge)
Forms list of logical location of each neighbor for this particular CLIQUE block. ...
def __init__(self, logical_location=None, spatial_location=None, points=None, visited=False)
Initializes CLIQUE block.
def visited(self)
Defines whether block is visited during cluster analysis.
def __location_to_key(self, location)
Forms key using logical location of a CLIQUE block.
def __str__(self)
Returns string representation of the block using its logical location in CLIQUE grid.
def __create_grid(self)
Creates CLIQUE grid that consists of CLIQUE blocks for clustering process.
def __contains__(self, point)
Point is considered as contained if it lies in block (belong to it).
def get_corners(self)
Return spatial description of current block.
def __init__(self, dimension, intervals)
Initializes coordinate iterator for CLIQUE algorithm.
def get_coordinate(self)
Returns current block coordinate.
def __init__(self, max_corner, min_corner)
Creates spatial block in data space.
def get_cluster_encoding(self)
Returns clustering result representation type that indicate how clusters are encoded.
Coordinate iterator is used to generate logical location description for each CLIQUE block...