3 @brief Cluster analysis algorithm: ROCK 4 @details Implementation based on paper @cite inproceedings::rock::1. 6 @authors Andrei Novikov (pyclustering@yandex.ru) 8 @copyright GNU Public License 10 @cond GNU_PUBLIC_LICENSE 11 PyClustering is free software: you can redistribute it and/or modify 12 it under the terms of the GNU General Public License as published by 13 the Free Software Foundation, either version 3 of the License, or 14 (at your option) any later version. 16 PyClustering is distributed in the hope that it will be useful, 17 but WITHOUT ANY WARRANTY; without even the implied warranty of 18 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 19 GNU General Public License for more details. 21 You should have received a copy of the GNU General Public License 22 along with this program. If not, see <http://www.gnu.org/licenses/>. 32 from pyclustering.core.wrapper
import ccore_library;
34 import pyclustering.core.rock_wrapper
as wrapper;
39 @brief Class represents clustering algorithm ROCK. 43 from pyclustering.cluster import cluster_visualizer 44 from pyclustering.cluster.rock import rock 45 from pyclustering.samples.definitions import FCPS_SAMPLES 46 from pyclustering.utils import read_sample 48 # Read sample for clustering from file. 49 sample = read_sample(FCPS_SAMPLES.SAMPLE_HEPTA) 51 # Create instance of ROCK algorithm for cluster analysis. Seven clusters should be allocated. 52 rock_instance = rock(sample, 1.0, 7) 54 # Run cluster analysis. 55 rock_instance.process() 57 # Obtain results of clustering. 58 clusters = rock_instance.get_clusters() 60 # Visualize clustering results. 61 visualizer = cluster_visualizer() 62 visualizer.append_clusters(clusters, sample) 68 def __init__(self, data, eps, number_clusters, threshold = 0.5, ccore = True):
70 @brief Constructor of clustering algorithm ROCK. 72 @param[in] data (list): Input data - list of points where each point is represented by list of coordinates. 73 @param[in] eps (double): Connectivity radius (similarity threshold), points are neighbors if distance between them is less than connectivity radius. 74 @param[in] number_clusters (uint): Defines number of clusters that should be allocated from the input data set. 75 @param[in] threshold (double): Value that defines degree of normalization that influences on choice of clusters for merging during processing. 76 @param[in] ccore (bool): Defines should be CCORE (C++ pyclustering library) used instead of Python code or not. 89 self.
__ccore = ccore_library.workable();
99 @brief Performs cluster analysis in line with rules of ROCK algorithm. 101 @remark Results of clustering can be obtained using corresponding get methods. 119 if (indexes != [-1, -1]):
128 @brief Returns list of allocated clusters, each cluster contains indexes of objects in list of data. 130 @return (list) List of allocated clusters, each cluster contains indexes of objects in list of data. 141 @brief Returns clustering result representation type that indicate how clusters are encoded. 143 @return (type_encoding) Clustering result representation. 149 return type_encoding.CLUSTER_INDEX_LIST_SEPARATION;
152 def __find_pair_clusters(self, clusters):
154 @brief Returns pair of clusters that are best candidates for merging in line with goodness measure. 155 The pair of clusters for which the above goodness measure is maximum is the best pair of clusters to be merged. 157 @param[in] clusters (list): List of clusters that have been allocated during processing, each cluster is represented by list of indexes of points from the input data set. 159 @return (list) List that contains two indexes of clusters (from list 'clusters') that should be merged on this step. 160 It can be equals to [-1, -1] when no links between clusters. 164 maximum_goodness = 0.0;
165 cluster_indexes = [-1, -1];
167 for i
in range(0, len(clusters)):
168 for j
in range(i + 1, len(clusters)):
170 if (goodness > maximum_goodness):
171 maximum_goodness = goodness;
172 cluster_indexes = [i, j];
174 return cluster_indexes;
177 def __calculate_links(self, cluster1, cluster2):
179 @brief Returns number of link between two clusters. 180 @details Link between objects (points) exists only if distance between them less than connectivity radius. 182 @param[in] cluster1 (list): The first cluster. 183 @param[in] cluster2 (list): The second cluster. 185 @return (uint) Number of links between two clusters. 191 for index1
in cluster1:
192 for index2
in cluster2:
198 def __create_adjacency_matrix(self):
200 @brief Creates 2D adjacency matrix (list of lists) where each element described existence of link between points (means that points are neighbors). 206 self.
__adjacency_matrix = [ [ 0
for i
in range(size_data) ]
for j
in range(size_data) ];
207 for i
in range(0, size_data):
208 for j
in range(i + 1, size_data):
210 if (distance <= self.
__eps):
216 def __calculate_goodness(self, cluster1, cluster2):
218 @brief Calculates coefficient 'goodness measurement' between two clusters. The coefficient defines level of suitability of clusters for merging. 220 @param[in] cluster1 (list): The first cluster. 221 @param[in] cluster2 (list): The second cluster. 223 @return Goodness measure between two clusters. 230 return (number_links / devider);
def get_cluster_encoding(self)
Returns clustering result representation type that indicate how clusters are encoded.
Class represents clustering algorithm ROCK.
Utils that are used by modules of pyclustering.
Module for representing clustering results.
def process(self)
Performs cluster analysis in line with rules of ROCK algorithm.
def __calculate_goodness(self, cluster1, cluster2)
Calculates coefficient 'goodness measurement' between two clusters.
def __find_pair_clusters(self, clusters)
Returns pair of clusters that are best candidates for merging in line with goodness measure...
def __calculate_links(self, cluster1, cluster2)
Returns number of link between two clusters.
def __create_adjacency_matrix(self)
Creates 2D adjacency matrix (list of lists) where each element described existence of link between po...
def get_clusters(self)
Returns list of allocated clusters, each cluster contains indexes of objects in list of data...
def __init__(self, data, eps, number_clusters, threshold=0.5, ccore=True)
Constructor of clustering algorithm ROCK.