3 @brief Module provides various distance metrics - abstraction of the notion of distance in a metric space. 5 @authors Andrei Novikov (pyclustering@yandex.ru) 7 @copyright GNU Public License 9 @cond GNU_PUBLIC_LICENSE 10 PyClustering is free software: you can redistribute it and/or modify 11 it under the terms of the GNU General Public License as published by 12 the Free Software Foundation, either version 3 of the License, or 13 (at your option) any later version. 15 PyClustering is distributed in the hope that it will be useful, 16 but WITHOUT ANY WARRANTY; without even the implied warranty of 17 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 18 GNU General Public License for more details. 20 You should have received a copy of the GNU General Public License 21 along with this program. If not, see <http://www.gnu.org/licenses/>. 29 from enum
import IntEnum
34 @brief Enumeration of supported metrics in the module for distance calculation between two points. 66 @brief Distance metric performs distance calculation between two points in line with encapsulated function, for 67 example, euclidean distance or chebyshev distance, or even user-defined. 71 Example of Euclidean distance metric: 73 metric = distance_metric(type_metric.EUCLIDEAN) 74 distance = metric([1.0, 2.5], [-1.2, 3.4]) 77 Example of Chebyshev distance metric: 79 metric = distance_metric(type_metric.CHEBYSHEV) 80 distance = metric([0.0, 0.0], [2.5, 6.0]) 83 In following example additional argument should be specified (generally, 'degree' is a optional argument that is 84 equal to '2' by default) that is specific for Minkowski distance: 86 metric = distance_metric(type_metric.MINKOWSKI, degree=4) 87 distance = metric([4.0, 9.2, 1.0], [3.4, 2.5, 6.2]) 90 User may define its own function for distance calculation. In this case input is two points, for example, you 91 want to implement your own version of Manhattan distance: 93 from pyclustering.utils.metric import distance_metric, type_metric 95 def my_manhattan(point1, point2): 96 dimension = len(point1) 98 for i in range(dimension): 99 result += abs(point1[i] - point2[i]) * 0.1 102 metric = distance_metric(type_metric.USER_DEFINED, func=my_manhattan) 103 distance = metric([2.0, 3.0], [1.0, 3.0]) 109 @brief Creates distance metric instance for calculation distance between two points. 111 @param[in] metric_type (type_metric): 112 @param[in] **kwargs: Arbitrary keyword arguments (available arguments: 'numpy_usage' 'func' and corresponding additional argument for 113 for specific metric types). 115 <b>Keyword Args:</b><br> 116 - func (callable): Callable object with two arguments (point #1 and point #2) or (object #1 and object #2) in case of numpy usage. 117 This argument is used only if metric is 'type_metric.USER_DEFINED'. 118 - degree (numeric): Only for 'type_metric.MINKOWSKI' - degree of Minkowski equation. 119 - numpy_usage (bool): If True then numpy is used for calculation (by default is False). 132 @brief Calculates distance between two points. 134 @param[in] point1 (list): The first point. 135 @param[in] point2 (list): The second point. 137 @return (double) Distance between two points. 145 @brief Return type of distance metric that is used. 147 @return (type_metric) Type of distance metric. 155 @brief Return additional arguments that are used by distance metric. 157 @return (dict) Additional arguments. 165 @brief Return user-defined function for calculation distance metric. 167 @return (callable): User-defined distance metric function. 175 @brief Start numpy for distance calculation. 176 @details Useful in case matrices to increase performance. No effect in case of type_metric.USER_DEFINED type. 180 if self.
__type != type_metric.USER_DEFINED:
186 @brief Stop using numpy for distance calculation. 187 @details Useful in case of big amount of small data portion when numpy call is longer than calculation itself. 188 No effect in case of type_metric.USER_DEFINED type. 195 def __create_distance_calculator(self):
197 @brief Creates distance metric calculator. 199 @return (callable) Callable object of distance metric calculator. 208 def __create_distance_calculator_basic(self):
210 @brief Creates distance metric calculator that does not use numpy. 212 @return (callable) Callable object of distance metric calculator. 215 if self.
__type == type_metric.EUCLIDEAN:
216 return euclidean_distance
218 elif self.
__type == type_metric.EUCLIDEAN_SQUARE:
219 return euclidean_distance_square
221 elif self.
__type == type_metric.MANHATTAN:
222 return manhattan_distance
224 elif self.
__type == type_metric.CHEBYSHEV:
225 return chebyshev_distance
227 elif self.
__type == type_metric.MINKOWSKI:
230 elif self.
__type == type_metric.CANBERRA:
231 return canberra_distance
233 elif self.
__type == type_metric.CHI_SQUARE:
234 return chi_square_distance
236 elif self.
__type == type_metric.USER_DEFINED:
240 raise ValueError(
"Unknown type of metric: '%d'", self.
__type)
243 def __create_distance_calculator_numpy(self):
245 @brief Creates distance metric calculator that uses numpy. 247 @return (callable) Callable object of distance metric calculator. 250 if self.
__type == type_metric.EUCLIDEAN:
251 return euclidean_distance_numpy
253 elif self.
__type == type_metric.EUCLIDEAN_SQUARE:
254 return euclidean_distance_square_numpy
256 elif self.
__type == type_metric.MANHATTAN:
257 return manhattan_distance_numpy
259 elif self.
__type == type_metric.CHEBYSHEV:
260 return chebyshev_distance_numpy
262 elif self.
__type == type_metric.MINKOWSKI:
265 elif self.
__type == type_metric.CANBERRA:
266 return canberra_distance_numpy
268 elif self.
__type == type_metric.CHI_SQUARE:
269 return chi_square_distance_numpy
271 elif self.
__type == type_metric.USER_DEFINED:
275 raise ValueError(
"Unknown type of metric: '%d'", self.
__type)
281 @brief Calculate Euclidean distance between two vectors. 282 @details The Euclidean between vectors (points) a and b is calculated by following formula: 285 dist(a, b) = \sqrt{ \sum_{i=0}^{N}(a_{i} - b_{i})^{2} }; 288 Where N is a length of each vector. 290 @param[in] point1 (array_like): The first vector. 291 @param[in] point2 (array_like): The second vector. 293 @return (double) Euclidean distance between two vectors. 295 @see euclidean_distance_square, manhattan_distance, chebyshev_distance 299 return distance ** 0.5
304 @brief Calculate Euclidean distance between two objects using numpy. 306 @param[in] object1 (array_like): The first array_like object. 307 @param[in] object2 (array_like): The second array_like object. 309 @return (double) Euclidean distance between two objects. 312 return numpy.sum(numpy.sqrt(numpy.square(object1 - object2)), axis=1).T
317 @brief Calculate square Euclidean distance between two vectors. 320 dist(a, b) = \sum_{i=0}^{N}(a_{i} - b_{i})^{2}; 323 @param[in] point1 (array_like): The first vector. 324 @param[in] point2 (array_like): The second vector. 326 @return (double) Square Euclidean distance between two vectors. 328 @see euclidean_distance, manhattan_distance, chebyshev_distance 332 for i
in range(len(point1)):
333 distance += (point1[i] - point2[i]) ** 2.0
340 @brief Calculate square Euclidean distance between two objects using numpy. 342 @param[in] object1 (array_like): The first array_like object. 343 @param[in] object2 (array_like): The second array_like object. 345 @return (double) Square Euclidean distance between two objects. 348 return numpy.sum(numpy.square(object1 - object2), axis=1).T
353 @brief Calculate Manhattan distance between between two vectors. 356 dist(a, b) = \sum_{i=0}^{N}\left | a_{i} - b_{i} \right |; 359 @param[in] point1 (array_like): The first vector. 360 @param[in] point2 (array_like): The second vector. 362 @return (double) Manhattan distance between two vectors. 364 @see euclidean_distance_square, euclidean_distance, chebyshev_distance 368 dimension = len(point1)
370 for i
in range(dimension):
371 distance += abs(point1[i] - point2[i])
378 @brief Calculate Manhattan distance between two objects using numpy. 380 @param[in] object1 (array_like): The first array_like object. 381 @param[in] object2 (array_like): The second array_like object. 383 @return (double) Manhattan distance between two objects. 386 return numpy.sum(numpy.absolute(object1 - object2), axis=1).T
391 @brief Calculate Chebyshev distance between between two vectors. 394 dist(a, b) = \max_{}i\left (\left | a_{i} - b_{i} \right |\right ); 397 @param[in] point1 (array_like): The first vector. 398 @param[in] point2 (array_like): The second vector. 400 @return (double) Chebyshev distance between two vectors. 402 @see euclidean_distance_square, euclidean_distance, minkowski_distance 406 dimension = len(point1)
408 for i
in range(dimension):
409 distance = max(distance, abs(point1[i] - point2[i]))
416 @brief Calculate Chebyshev distance between two objects using numpy. 418 @param[in] object1 (array_like): The first array_like object. 419 @param[in] object2 (array_like): The second array_like object. 421 @return (double) Chebyshev distance between two objects. 424 return numpy.max(numpy.absolute(object1 - object2), axis=1).T
429 @brief Calculate Minkowski distance between two vectors. 432 dist(a, b) = \sqrt[p]{ \sum_{i=0}^{N}\left(a_{i} - b_{i}\right)^{p} }; 435 @param[in] point1 (array_like): The first vector. 436 @param[in] point2 (array_like): The second vector. 437 @param[in] degree (numeric): Degree of that is used for Minkowski distance. 439 @return (double) Minkowski distance between two vectors. 441 @see euclidean_distance 445 for i
in range(len(point1)):
446 distance += (point1[i] - point2[i]) ** degree
448 return distance ** (1.0 / degree)
453 @brief Calculate Minkowski distance between objects using numpy. 455 @param[in] object1 (array_like): The first array_like object. 456 @param[in] object2 (array_like): The second array_like object. 457 @param[in] degree (numeric): Degree of that is used for Minkowski distance. 459 @return (double) Minkowski distance between two object. 462 return numpy.sum(numpy.power(numpy.power(object1 - object2, degree), 1/degree), axis=1).T
467 @brief Calculate Canberra distance between two vectors. 470 dist(a, b) = \sum_{i=0}^{N}\frac{\left | a_{i} - b_{i} \right |}{\left | a_{i} \right | + \left | b_{i} \right |}; 473 @param[in] point1 (array_like): The first vector. 474 @param[in] point2 (array_like): The second vector. 476 @return (float) Canberra distance between two objects. 480 for i
in range(len(point1)):
481 divider = abs(point1[i]) + abs(point2[i])
485 distance += abs(point1[i] - point2[i]) / divider
492 @brief Calculate Canberra distance between two objects using numpy. 494 @param[in] object1 (array_like): The first vector. 495 @param[in] object2 (array_like): The second vector. 497 @return (float) Canberra distance between two objects. 500 with numpy.errstate(divide=
'ignore', invalid=
'ignore'):
501 result = numpy.divide(numpy.abs(object1 - object2), numpy.abs(object1) + numpy.abs(object2))
503 if len(result.shape) > 1:
504 return numpy.sum(numpy.nan_to_num(result), axis=1).T
506 return numpy.sum(numpy.nan_to_num(result))
511 @brief Calculate Chi square distance between two vectors. 514 dist(a, b) = \sum_{i=0}^{N}\frac{\left ( a_{i} - b_{i} \right )^{2}}{\left | a_{i} \right | + \left | b_{i} \right |}; 517 @param[in] point1 (array_like): The first vector. 518 @param[in] point2 (array_like): The second vector. 520 @return (float) Chi square distance between two objects. 524 for i
in range(len(point1)):
525 divider = abs(point1[i]) + abs(point2[i])
529 distance += ((point1[i] - point2[i]) ** 2.0) / divider
536 @brief Calculate Chi square distance between two vectors using numpy. 538 @param[in] object1 (array_like): The first vector. 539 @param[in] object2 (array_like): The second vector. 541 @return (float) Chi square distance between two objects. 544 with numpy.errstate(divide=
'ignore', invalid=
'ignore'):
545 result = numpy.divide(numpy.power(object1 - object2, 2), numpy.abs(object1) + numpy.abs(object2))
547 if len(result.shape) > 1:
548 return numpy.sum(numpy.nan_to_num(result), axis=1).T
550 return numpy.sum(numpy.nan_to_num(result))
def __create_distance_calculator_basic(self)
Creates distance metric calculator that does not use numpy.
def chi_square_distance(point1, point2)
Calculate Chi square distance between two vectors.
def get_arguments(self)
Return additional arguments that are used by distance metric.
def euclidean_distance_square(point1, point2)
Calculate square Euclidean distance between two vectors.
def minkowski_distance_numpy(object1, object2, degree=2)
Calculate Minkowski distance between objects using numpy.
def chi_square_distance_numpy(object1, object2)
Calculate Chi square distance between two vectors using numpy.
def __create_distance_calculator(self)
Creates distance metric calculator.
def get_type(self)
Return type of distance metric that is used.
def chebyshev_distance_numpy(object1, object2)
Calculate Chebyshev distance between two objects using numpy.
Distance metric performs distance calculation between two points in line with encapsulated function...
def manhattan_distance_numpy(object1, object2)
Calculate Manhattan distance between two objects using numpy.
def __init__(self, metric_type, kwargs)
Creates distance metric instance for calculation distance between two points.
def get_function(self)
Return user-defined function for calculation distance metric.
def disable_numpy_usage(self)
Stop using numpy for distance calculation.
def canberra_distance(point1, point2)
Calculate Canberra distance between two vectors.
def canberra_distance_numpy(object1, object2)
Calculate Canberra distance between two objects using numpy.
def euclidean_distance_square_numpy(object1, object2)
Calculate square Euclidean distance between two objects using numpy.
def __call__(self, point1, point2)
Calculates distance between two points.
def __create_distance_calculator_numpy(self)
Creates distance metric calculator that uses numpy.
def euclidean_distance(point1, point2)
Calculate Euclidean distance between two vectors.
def manhattan_distance(point1, point2)
Calculate Manhattan distance between between two vectors.
def minkowski_distance(point1, point2, degree=2)
Calculate Minkowski distance between two vectors.
def euclidean_distance_numpy(object1, object2)
Calculate Euclidean distance between two objects using numpy.
def enable_numpy_usage(self)
Start numpy for distance calculation.
Enumeration of supported metrics in the module for distance calculation between two points...
def chebyshev_distance(point1, point2)
Calculate Chebyshev distance between between two vectors.