3 @brief Cluster analysis algorithm: DBSCAN. 4 @details Implementation based on paper @cite inproceedings::dbscan::1. 6 @authors Andrei Novikov (pyclustering@yandex.ru) 8 @copyright GNU Public License 10 @cond GNU_PUBLIC_LICENSE 11 PyClustering is free software: you can redistribute it and/or modify 12 it under the terms of the GNU General Public License as published by 13 the Free Software Foundation, either version 3 of the License, or 14 (at your option) any later version. 16 PyClustering is distributed in the hope that it will be useful, 17 but WITHOUT ANY WARRANTY; without even the implied warranty of 18 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 19 GNU General Public License for more details. 21 You should have received a copy of the GNU General Public License 22 along with this program. If not, see <http://www.gnu.org/licenses/>. 32 from pyclustering.core.wrapper
import ccore_library
34 import pyclustering.core.dbscan_wrapper
as wrapper
39 @brief Class represents clustering algorithm DBSCAN. 40 @details This DBSCAN algorithm is KD-tree optimized. 42 CCORE option can be used to use the pyclustering core - C/C++ shared library for processing that significantly increases performance. 46 # sample for cluster analysis (represented by list) 47 sample = read_sample(path_to_sample); 49 # create object that uses CCORE for processing 50 dbscan_instance = dbscan(sample, 0.5, 3, True); 53 dbscan_instance.process(); 55 # obtain results of clustering 56 clusters = dbscan_instance.get_clusters(); 57 noise = dbscan_instance.get_noise(); 62 def __init__(self, data, eps, neighbors, ccore = True, **kwargs):
64 @brief Constructor of clustering algorithm DBSCAN. 66 @param[in] data (list): Input data that is presented as list of points (objects), each point should be represented by list or tuple. 67 @param[in] eps (double): Connectivity radius between points, points may be connected if distance between them less then the radius. 68 @param[in] neighbors (uint): minimum number of shared neighbors that is required for establish links between points. 69 @param[in] ccore (bool): if True than DLL CCORE (C++ solution) will be used for solving the problem. 70 @param[in] **kwargs: Arbitrary keyword arguments (available arguments: 'data_type'). 72 <b>Keyword Args:</b><br> 73 - data_type (string): Data type of input sample 'data' that is processed by the algorithm ('points', 'distance_matrix'). 86 self.
__data_type = kwargs.get(
'data_type',
'points')
95 self.
__ccore = ccore_library.workable()
100 @brief Performs cluster analysis in line with rules of DBSCAN algorithm. 117 if cluster
is not None:
127 @brief Returns allocated clusters. 129 @remark Allocated clusters can be returned only after data processing (use method process()). Otherwise empty list is returned. 131 @return (list) List of allocated clusters, each cluster contains indexes of objects in list of data. 143 @brief Returns allocated noise. 145 @remark Allocated noise can be returned only after data processing (use method process() before). Otherwise empty list is returned. 147 @return (list) List of indexes that are marked as a noise. 159 @brief Returns clustering result representation type that indicate how clusters are encoded. 161 @return (type_encoding) Clustering result representation. 167 return type_encoding.CLUSTER_INDEX_LIST_SEPARATION
170 def __create_neighbor_searcher(self, data_type):
172 @brief Returns neighbor searcher in line with data type. 174 @param[in] data_type (string): Data type (points or distance matrix). 177 if data_type ==
'points':
179 elif data_type ==
'distance_matrix':
182 raise TypeError(
"Unknown type of data is specified '%s'" % data_type)
185 def __expand_cluster(self, index_point):
187 @brief Expands cluster from specified point in the input data space. 189 @param[in] index_point (list): Index of a point from the data. 191 @return (list) Return tuple of list of indexes that belong to the same cluster and list of points that are marked as noise: (cluster, noise), or None if nothing has been expanded. 200 cluster = [ index_point ]
211 neighbors += [k
for k
in next_neighbors
if ( (k
in neighbors) ==
False)
and k != index_point]
220 def __neighbor_indexes_points(self, index_point):
222 @brief Return neighbors of the specified object in case of sequence of points. 224 @param[in] index_point (uint): Index point whose neighbors are should be found. 226 @return (list) List of indexes of neighbors in line the connectivity radius. 230 return [node_tuple[1].payload
for node_tuple
in kdnodes
if node_tuple[1].payload != index_point]
233 def __neighbor_indexes_distance_matrix(self, index_point):
235 @brief Return neighbors of the specified object in case of distance matrix. 237 @param[in] index_point (uint): Index point whose neighbors are should be found. 239 @return (list) List of indexes of neighbors in line the connectivity radius. 243 return [index_neighbor
for index_neighbor
in range(len(distances))
244 if ((distances[index_neighbor] <= self.
__eps)
and (index_neighbor != index_point))]
Class represents clustering algorithm DBSCAN.
def get_clusters(self)
Returns allocated clusters.
Module for representing clustering results.
Represents KD Tree that is a space-partitioning data structure for organizing points in a k-dimension...
def __init__(self, data, eps, neighbors, ccore=True, kwargs)
Constructor of clustering algorithm DBSCAN.
def process(self)
Performs cluster analysis in line with rules of DBSCAN algorithm.
def get_cluster_encoding(self)
Returns clustering result representation type that indicate how clusters are encoded.
def __expand_cluster(self, index_point)
Expands cluster from specified point in the input data space.
def __neighbor_indexes_points(self, index_point)
Return neighbors of the specified object in case of sequence of points.
def __create_neighbor_searcher(self, data_type)
Returns neighbor searcher in line with data type.
def get_noise(self)
Returns allocated noise.
def __neighbor_indexes_distance_matrix(self, index_point)
Return neighbors of the specified object in case of distance matrix.