3 @brief Module provides various distance metrics - abstraction of the notion of distance in a metric space. 5 @authors Andrei Novikov (pyclustering@yandex.ru) 7 @copyright GNU Public License 9 @cond GNU_PUBLIC_LICENSE 10 PyClustering is free software: you can redistribute it and/or modify 11 it under the terms of the GNU General Public License as published by 12 the Free Software Foundation, either version 3 of the License, or 13 (at your option) any later version. 15 PyClustering is distributed in the hope that it will be useful, 16 but WITHOUT ANY WARRANTY; without even the implied warranty of 17 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 18 GNU General Public License for more details. 20 You should have received a copy of the GNU General Public License 21 along with this program. If not, see <http://www.gnu.org/licenses/>. 29 from enum
import IntEnum
34 @brief Enumeration of supported metrics in the module for distance calculation between two points. 69 @brief Distance metric performs distance calculation between two points in line with encapsulated function, for 70 example, euclidean distance or chebyshev distance, or even user-defined. 74 Example of Euclidean distance metric: 76 metric = distance_metric(type_metric.EUCLIDEAN) 77 distance = metric([1.0, 2.5], [-1.2, 3.4]) 80 Example of Chebyshev distance metric: 82 metric = distance_metric(type_metric.CHEBYSHEV) 83 distance = metric([0.0, 0.0], [2.5, 6.0]) 86 In following example additional argument should be specified (generally, 'degree' is a optional argument that is 87 equal to '2' by default) that is specific for Minkowski distance: 89 metric = distance_metric(type_metric.MINKOWSKI, degree=4) 90 distance = metric([4.0, 9.2, 1.0], [3.4, 2.5, 6.2]) 93 User may define its own function for distance calculation. In this case input is two points, for example, you 94 want to implement your own version of Manhattan distance: 96 from pyclustering.utils.metric import distance_metric, type_metric 98 def my_manhattan(point1, point2): 99 dimension = len(point1) 101 for i in range(dimension): 102 result += abs(point1[i] - point2[i]) * 0.1 105 metric = distance_metric(type_metric.USER_DEFINED, func=my_manhattan) 106 distance = metric([2.0, 3.0], [1.0, 3.0]) 112 @brief Creates distance metric instance for calculation distance between two points. 114 @param[in] metric_type (type_metric): 115 @param[in] **kwargs: Arbitrary keyword arguments (available arguments: 'numpy_usage' 'func' and corresponding additional argument for 116 for specific metric types). 118 <b>Keyword Args:</b><br> 119 - func (callable): Callable object with two arguments (point #1 and point #2) or (object #1 and object #2) in case of numpy usage. 120 This argument is used only if metric is 'type_metric.USER_DEFINED'. 121 - degree (numeric): Only for 'type_metric.MINKOWSKI' - degree of Minkowski equation. 122 - max_range (array_like): Only for 'type_metric.GOWER' - max range in each dimension. 'data' can be used 123 instead of this parameter. 124 - data (array_like): Only for 'type_metric.GOWER' - input data that used for 'max_range' calculation. 125 'max_range' can be used instead of this parameter. 126 - numpy_usage (bool): If True then numpy is used for calculation (by default is False). 139 @brief Calculates distance between two points. 141 @param[in] point1 (list): The first point. 142 @param[in] point2 (list): The second point. 144 @return (double) Distance between two points. 152 @brief Return type of distance metric that is used. 154 @return (type_metric) Type of distance metric. 162 @brief Return additional arguments that are used by distance metric. 164 @return (dict) Additional arguments. 172 @brief Return user-defined function for calculation distance metric. 174 @return (callable): User-defined distance metric function. 182 @brief Start numpy for distance calculation. 183 @details Useful in case matrices to increase performance. No effect in case of type_metric.USER_DEFINED type. 187 if self.
__type != type_metric.USER_DEFINED:
193 @brief Stop using numpy for distance calculation. 194 @details Useful in case of big amount of small data portion when numpy call is longer than calculation itself. 195 No effect in case of type_metric.USER_DEFINED type. 202 def __create_distance_calculator(self):
204 @brief Creates distance metric calculator. 206 @return (callable) Callable object of distance metric calculator. 215 def __create_distance_calculator_basic(self):
217 @brief Creates distance metric calculator that does not use numpy. 219 @return (callable) Callable object of distance metric calculator. 222 if self.
__type == type_metric.EUCLIDEAN:
223 return euclidean_distance
225 elif self.
__type == type_metric.EUCLIDEAN_SQUARE:
226 return euclidean_distance_square
228 elif self.
__type == type_metric.MANHATTAN:
229 return manhattan_distance
231 elif self.
__type == type_metric.CHEBYSHEV:
232 return chebyshev_distance
234 elif self.
__type == type_metric.MINKOWSKI:
237 elif self.
__type == type_metric.CANBERRA:
238 return canberra_distance
240 elif self.
__type == type_metric.CHI_SQUARE:
241 return chi_square_distance
243 elif self.
__type == type_metric.GOWER:
245 return lambda point1, point2:
gower_distance(point1, point2, max_range)
247 elif self.
__type == type_metric.USER_DEFINED:
251 raise ValueError(
"Unknown type of metric: '%d'", self.
__type)
254 def __get_gower_max_range(self):
256 @brief Returns max range for Gower distance using input parameters ('max_range' or 'data'). 258 @return (numpy.array) Max range for Gower distance. 261 max_range = self.
__args.get(
'max_range',
None)
262 if max_range
is None:
263 data = self.
__args.get(
'data',
None)
265 raise ValueError(
"Gower distance requires 'data' or 'max_range' argument to construct metric.")
267 max_range = numpy.max(data, axis=0) - numpy.min(data, axis=0)
268 self.
__args[
'max_range'] = max_range
273 def __create_distance_calculator_numpy(self):
275 @brief Creates distance metric calculator that uses numpy. 277 @return (callable) Callable object of distance metric calculator. 280 if self.
__type == type_metric.EUCLIDEAN:
281 return euclidean_distance_numpy
283 elif self.
__type == type_metric.EUCLIDEAN_SQUARE:
284 return euclidean_distance_square_numpy
286 elif self.
__type == type_metric.MANHATTAN:
287 return manhattan_distance_numpy
289 elif self.
__type == type_metric.CHEBYSHEV:
290 return chebyshev_distance_numpy
292 elif self.
__type == type_metric.MINKOWSKI:
295 elif self.
__type == type_metric.CANBERRA:
296 return canberra_distance_numpy
298 elif self.
__type == type_metric.CHI_SQUARE:
299 return chi_square_distance_numpy
301 elif self.
__type == type_metric.GOWER:
305 elif self.
__type == type_metric.USER_DEFINED:
309 raise ValueError(
"Unknown type of metric: '%d'", self.
__type)
315 @brief Calculate Euclidean distance between two vectors. 316 @details The Euclidean between vectors (points) a and b is calculated by following formula: 319 dist(a, b) = \sqrt{ \sum_{i=0}^{N}(a_{i} - b_{i})^{2} }; 322 Where N is a length of each vector. 324 @param[in] point1 (array_like): The first vector. 325 @param[in] point2 (array_like): The second vector. 327 @return (double) Euclidean distance between two vectors. 329 @see euclidean_distance_square, manhattan_distance, chebyshev_distance 333 return distance ** 0.5
338 @brief Calculate Euclidean distance between two objects using numpy. 340 @param[in] object1 (array_like): The first array_like object. 341 @param[in] object2 (array_like): The second array_like object. 343 @return (double) Euclidean distance between two objects. 346 return numpy.sum(numpy.sqrt(numpy.square(object1 - object2)), axis=1).T
351 @brief Calculate square Euclidean distance between two vectors. 354 dist(a, b) = \sum_{i=0}^{N}(a_{i} - b_{i})^{2}; 357 @param[in] point1 (array_like): The first vector. 358 @param[in] point2 (array_like): The second vector. 360 @return (double) Square Euclidean distance between two vectors. 362 @see euclidean_distance, manhattan_distance, chebyshev_distance 366 for i
in range(len(point1)):
367 distance += (point1[i] - point2[i]) ** 2.0
374 @brief Calculate square Euclidean distance between two objects using numpy. 376 @param[in] object1 (array_like): The first array_like object. 377 @param[in] object2 (array_like): The second array_like object. 379 @return (double) Square Euclidean distance between two objects. 382 return numpy.sum(numpy.square(object1 - object2), axis=1).T
387 @brief Calculate Manhattan distance between between two vectors. 390 dist(a, b) = \sum_{i=0}^{N}\left | a_{i} - b_{i} \right |; 393 @param[in] point1 (array_like): The first vector. 394 @param[in] point2 (array_like): The second vector. 396 @return (double) Manhattan distance between two vectors. 398 @see euclidean_distance_square, euclidean_distance, chebyshev_distance 402 dimension = len(point1)
404 for i
in range(dimension):
405 distance += abs(point1[i] - point2[i])
412 @brief Calculate Manhattan distance between two objects using numpy. 414 @param[in] object1 (array_like): The first array_like object. 415 @param[in] object2 (array_like): The second array_like object. 417 @return (double) Manhattan distance between two objects. 420 return numpy.sum(numpy.absolute(object1 - object2), axis=1).T
425 @brief Calculate Chebyshev distance (maximum metric) between between two vectors. 426 @details Chebyshev distance is a metric defined on a vector space where the distance between two vectors is the 427 greatest of their differences along any coordinate dimension. 430 dist(a, b) = \max_{}i\left (\left | a_{i} - b_{i} \right |\right ); 433 @param[in] point1 (array_like): The first vector. 434 @param[in] point2 (array_like): The second vector. 436 @return (double) Chebyshev distance between two vectors. 438 @see euclidean_distance_square, euclidean_distance, minkowski_distance 442 dimension = len(point1)
444 for i
in range(dimension):
445 distance = max(distance, abs(point1[i] - point2[i]))
452 @brief Calculate Chebyshev distance between two objects using numpy. 454 @param[in] object1 (array_like): The first array_like object. 455 @param[in] object2 (array_like): The second array_like object. 457 @return (double) Chebyshev distance between two objects. 460 return numpy.max(numpy.absolute(object1 - object2), axis=1).T
465 @brief Calculate Minkowski distance between two vectors. 468 dist(a, b) = \sqrt[p]{ \sum_{i=0}^{N}\left(a_{i} - b_{i}\right)^{p} }; 471 @param[in] point1 (array_like): The first vector. 472 @param[in] point2 (array_like): The second vector. 473 @param[in] degree (numeric): Degree of that is used for Minkowski distance. 475 @return (double) Minkowski distance between two vectors. 477 @see euclidean_distance 481 for i
in range(len(point1)):
482 distance += (point1[i] - point2[i]) ** degree
484 return distance ** (1.0 / degree)
489 @brief Calculate Minkowski distance between objects using numpy. 491 @param[in] object1 (array_like): The first array_like object. 492 @param[in] object2 (array_like): The second array_like object. 493 @param[in] degree (numeric): Degree of that is used for Minkowski distance. 495 @return (double) Minkowski distance between two object. 498 return numpy.sum(numpy.power(numpy.power(object1 - object2, degree), 1/degree), axis=1).T
503 @brief Calculate Canberra distance between two vectors. 506 dist(a, b) = \sum_{i=0}^{N}\frac{\left | a_{i} - b_{i} \right |}{\left | a_{i} \right | + \left | b_{i} \right |}; 509 @param[in] point1 (array_like): The first vector. 510 @param[in] point2 (array_like): The second vector. 512 @return (float) Canberra distance between two objects. 516 for i
in range(len(point1)):
517 divider = abs(point1[i]) + abs(point2[i])
521 distance += abs(point1[i] - point2[i]) / divider
528 @brief Calculate Canberra distance between two objects using numpy. 530 @param[in] object1 (array_like): The first vector. 531 @param[in] object2 (array_like): The second vector. 533 @return (float) Canberra distance between two objects. 536 with numpy.errstate(divide=
'ignore', invalid=
'ignore'):
537 result = numpy.divide(numpy.abs(object1 - object2), numpy.abs(object1) + numpy.abs(object2))
539 if len(result.shape) > 1:
540 return numpy.sum(numpy.nan_to_num(result), axis=1).T
542 return numpy.sum(numpy.nan_to_num(result))
547 @brief Calculate Chi square distance between two vectors. 550 dist(a, b) = \sum_{i=0}^{N}\frac{\left ( a_{i} - b_{i} \right )^{2}}{\left | a_{i} \right | + \left | b_{i} \right |}; 553 @param[in] point1 (array_like): The first vector. 554 @param[in] point2 (array_like): The second vector. 556 @return (float) Chi square distance between two objects. 560 for i
in range(len(point1)):
561 divider = abs(point1[i]) + abs(point2[i])
563 distance += ((point1[i] - point2[i]) ** 2.0) / divider
570 @brief Calculate Chi square distance between two vectors using numpy. 572 @param[in] object1 (array_like): The first vector. 573 @param[in] object2 (array_like): The second vector. 575 @return (float) Chi square distance between two objects. 578 with numpy.errstate(divide=
'ignore', invalid=
'ignore'):
579 result = numpy.divide(numpy.power(object1 - object2, 2), numpy.abs(object1) + numpy.abs(object2))
581 if len(result.shape) > 1:
582 return numpy.sum(numpy.nan_to_num(result), axis=1).T
584 return numpy.sum(numpy.nan_to_num(result))
589 @brief Calculate Gower distance between two vectors. 590 @details Implementation is based on the paper @cite article::utils::metric::gower. Gower distance is calculate 591 using following formula: 593 dist\left ( a, b \right )=\frac{1}{p}\sum_{i=0}^{p}\frac{\left | a_{i} - b_{i} \right |}{R_{i}}, 596 where \f$R_{i}\f$ is a max range for ith dimension. \f$R\f$ is defined in line following formula: 599 R=max\left ( X \right )-min\left ( X \right ) 602 @param[in] point1 (array_like): The first vector. 603 @param[in] point2 (array_like): The second vector. 604 @param[in] max_range (array_like): Max range in each data dimension. 606 @return (float) Gower distance between two objects. 610 dimensions = len(point1)
611 for i
in range(dimensions):
612 if max_range[i] != 0.0:
613 distance += abs(point1[i] - point2[i]) / max_range[i]
615 return distance / dimensions
620 @brief Calculate Gower distance between two vectors using numpy. 622 @param[in] point1 (array_like): The first vector. 623 @param[in] point2 (array_like): The second vector. 624 @param[in] max_range (array_like): Max range in each data dimension. 626 @return (float) Gower distance between two objects. 629 with numpy.errstate(divide=
'ignore', invalid=
'ignore'):
630 result = numpy.divide(numpy.abs(point1 - point2), max_range)
632 if len(result.shape) > 1:
633 return numpy.sum(numpy.nan_to_num(result), axis=1).T / len(point1)
635 return numpy.sum(numpy.nan_to_num(result)) / len(point1)
def __create_distance_calculator_basic(self)
Creates distance metric calculator that does not use numpy.
def chi_square_distance(point1, point2)
Calculate Chi square distance between two vectors.
def get_arguments(self)
Return additional arguments that are used by distance metric.
def euclidean_distance_square(point1, point2)
Calculate square Euclidean distance between two vectors.
def minkowski_distance_numpy(object1, object2, degree=2)
Calculate Minkowski distance between objects using numpy.
def chi_square_distance_numpy(object1, object2)
Calculate Chi square distance between two vectors using numpy.
def __create_distance_calculator(self)
Creates distance metric calculator.
def get_type(self)
Return type of distance metric that is used.
def chebyshev_distance_numpy(object1, object2)
Calculate Chebyshev distance between two objects using numpy.
Distance metric performs distance calculation between two points in line with encapsulated function...
def manhattan_distance_numpy(object1, object2)
Calculate Manhattan distance between two objects using numpy.
def __init__(self, metric_type, kwargs)
Creates distance metric instance for calculation distance between two points.
def gower_distance(point1, point2, max_range)
Calculate Gower distance between two vectors.
def get_function(self)
Return user-defined function for calculation distance metric.
def gower_distance_numpy(point1, point2, max_range)
Calculate Gower distance between two vectors using numpy.
def disable_numpy_usage(self)
Stop using numpy for distance calculation.
def canberra_distance(point1, point2)
Calculate Canberra distance between two vectors.
def canberra_distance_numpy(object1, object2)
Calculate Canberra distance between two objects using numpy.
def euclidean_distance_square_numpy(object1, object2)
Calculate square Euclidean distance between two objects using numpy.
def __call__(self, point1, point2)
Calculates distance between two points.
def __create_distance_calculator_numpy(self)
Creates distance metric calculator that uses numpy.
def euclidean_distance(point1, point2)
Calculate Euclidean distance between two vectors.
def manhattan_distance(point1, point2)
Calculate Manhattan distance between between two vectors.
def minkowski_distance(point1, point2, degree=2)
Calculate Minkowski distance between two vectors.
def euclidean_distance_numpy(object1, object2)
Calculate Euclidean distance between two objects using numpy.
def enable_numpy_usage(self)
Start numpy for distance calculation.
def __get_gower_max_range(self)
Returns max range for Gower distance using input parameters ('max_range' or 'data').
Enumeration of supported metrics in the module for distance calculation between two points...
def chebyshev_distance(point1, point2)
Calculate Chebyshev distance (maximum metric) between between two vectors.