3 @brief Data Structure: CF-Tree 4 @details Implementation based on paper @cite article::birch::1. 6 @authors Andrei Novikov (pyclustering@yandex.ru) 8 @copyright GNU Public License 10 @cond GNU_PUBLIC_LICENSE 11 PyClustering is free software: you can redistribute it and/or modify 12 it under the terms of the GNU General Public License as published by 13 the Free Software Foundation, either version 3 of the License, or 14 (at your option) any later version. 16 PyClustering is distributed in the hope that it will be useful, 17 but WITHOUT ANY WARRANTY; without even the implied warranty of 18 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 19 GNU General Public License for more details. 21 You should have received a copy of the GNU General Public License 22 along with this program. If not, see <http://www.gnu.org/licenses/>. 33 from pyclustering.utils import list_math_addition, list_math_subtraction, list_math_multiplication
36 from enum
import IntEnum
41 @brief Enumeration of measurement types for CF-Tree. 48 CENTROID_EUCLIDEAN_DISTANCE = 0
51 CENTROID_MANHATTAN_DISTANCE = 1
54 AVERAGE_INTER_CLUSTER_DISTANCE = 2
57 AVERAGE_INTRA_CLUSTER_DISTANCE = 3
60 VARIANCE_INCREASE_DISTANCE = 4
65 @brief Enumeration of CF-Node types that are used by CF-Tree. 84 @brief Clustering feature representation. 94 @brief Returns number of points that are encoded. 96 @return (uint) Number of encoded points. 104 @brief Returns linear sum. 106 @return (list) Linear sum. 115 @brief Returns square sum. 117 @return (double) Square sum. 123 def __init__(self, number_points, linear_sum, square_sum):
125 @brief CF-entry constructor. 127 @param[in] number_points (uint): Number of objects that is represented by the entry. 128 @param[in] linear_sum (list): Linear sum of values that represent objects in each dimension. 129 @param[in] square_sum (double): Square sum of values that represent objects. 144 @returns (cfentry) Makes copy of the cfentry instance. 152 @return (string) Default cfentry representation. 160 @brief Default cfentry string representation. 168 @brief Overloaded operator add. Performs addition of two clustering features. 170 @param[in] entry (cfentry): Entry that is added to the current. 172 @return (cfentry) Result of addition of two clustering features. 177 result_linear_sum = list_math_addition(self.
linear_sum, entry.linear_sum)
178 result_square_sum = self.
square_sum + entry.square_sum
180 return cfentry(number_points, result_linear_sum, result_square_sum)
185 @brief Overloaded operator sub. Performs substraction of two clustering features. 186 @details Substraction can't be performed with clustering feature whose description is less then substractor. 188 @param[in] entry (cfentry): Entry that is substracted from the current. 190 @return (cfentry) Result of substraction of two clustering features. 195 result_linear_sum = list_math_subtraction(self.
linear_sum, entry.linear_sum)
196 result_square_sum = self.
square_sum - entry.square_sum
198 if (number_points < 0)
or (result_square_sum < 0):
199 raise NameError(
"Substruction with negative result is not possible for clustering features.")
201 return cfentry(number_points, result_linear_sum, result_square_sum)
206 @brief Overloaded operator eq. 207 @details Performs comparison of two clustering features. 209 @param[in] entry (cfentry): Entry that is used for comparison with current. 211 @return (bool) True is both clustering features are equals in line with tolerance, otherwise False. 218 result &= ((self.
square_sum + tolerance > entry.square_sum)
and (self.
square_sum - tolerance < entry.square_sum))
220 for index_dimension
in range(0, len(self.
linear_sum)):
221 result &= ((self.
linear_sum[index_dimension] + tolerance > entry.linear_sum[index_dimension])
and (self.
linear_sum[index_dimension] - tolerance < entry.linear_sum[index_dimension]))
228 @brief Calculates distance between two clusters in line with measurement type. 230 @details In case of usage CENTROID_EUCLIDIAN_DISTANCE square euclidian distance will be returned. 231 Square root should be taken from the result for obtaining real euclidian distance between 234 @param[in] entry (cfentry): Clustering feature to which distance should be obtained. 235 @param[in] type_measurement (measurement_type): Distance measurement algorithm between two clusters. 237 @return (double) Distance between two clusters. 241 if type_measurement
is measurement_type.CENTROID_EUCLIDEAN_DISTANCE:
242 return euclidean_distance_square(entry.get_centroid(), self.
get_centroid())
244 elif type_measurement
is measurement_type.CENTROID_MANHATTAN_DISTANCE:
245 return manhattan_distance(entry.get_centroid(), self.
get_centroid())
247 elif type_measurement
is measurement_type.AVERAGE_INTER_CLUSTER_DISTANCE:
250 elif type_measurement
is measurement_type.AVERAGE_INTRA_CLUSTER_DISTANCE:
253 elif type_measurement
is measurement_type.VARIANCE_INCREASE_DISTANCE:
257 raise ValueError(
"Unsupported type of measurement '%s' is specified." % type_measurement)
262 @brief Calculates centroid of cluster that is represented by the entry. 263 @details It's calculated once when it's requested after the last changes. 265 @return (list) Centroid of cluster that is represented by the entry. 273 for index_dimension
in range(0, len(self.
linear_sum)):
281 @brief Calculates radius of cluster that is represented by the entry. 282 @details It's calculated once when it's requested after the last changes. 284 @return (double) Radius of cluster that is represented by the entry. 295 if type(centroid) == list:
296 radius_part_2 = 2.0 * sum(list_math_multiplication(self.
linear_sum, centroid))
297 radius_part_3 = self.
number_points * sum(list_math_multiplication(centroid, centroid))
299 radius_part_2 = 2.0 * self.
linear_sum * centroid
302 self.
__radius = ((1.0 / self.
number_points) * (radius_part_1 - radius_part_2 + radius_part_3)) ** 0.5
308 @brief Calculates diameter of cluster that is represented by the entry. 309 @details It's calculated once when it's requested after the last changes. 311 @return (double) Diameter of cluster that is represented by the entry. 327 def __get_average_inter_cluster_distance(self, entry):
329 @brief Calculates average inter cluster distance between current and specified clusters. 331 @param[in] entry (cfentry): Clustering feature to which distance should be obtained. 333 @return (double) Average inter cluster distance. 337 linear_part_distance = sum(list_math_multiplication(self.
linear_sum, entry.linear_sum))
342 def __get_average_intra_cluster_distance(self, entry):
344 @brief Calculates average intra cluster distance between current and specified clusters. 346 @param[in] entry (cfentry): Clustering feature to which distance should be obtained. 348 @return (double) Average intra cluster distance. 352 linear_part_first = list_math_addition(self.
linear_sum, entry.linear_sum)
353 linear_part_second = linear_part_first
355 linear_part_distance = sum(list_math_multiplication(linear_part_first, linear_part_second))
357 general_part_distance = 2.0 * (self.
number_points + entry.number_points) * (self.
square_sum + entry.square_sum) - 2.0 * linear_part_distance
359 return (general_part_distance / ((self.
number_points + entry.number_points) * (self.
number_points + entry.number_points - 1.0))) ** 0.5
362 def __get_variance_increase_distance(self, entry):
364 @brief Calculates variance increase distance between current and specified clusters. 366 @param[in] entry (cfentry): Clustering feature to which distance should be obtained. 368 @return (double) Variance increase distance. 372 linear_part_12 = list_math_addition(self.
linear_sum, entry.linear_sum)
373 variance_part_first = (self.
square_sum + entry.square_sum) - \
374 2.0 * sum(list_math_multiplication(linear_part_12, linear_part_12)) / (self.
number_points + entry.number_points) + \
375 (self.
number_points + entry.number_points) * sum(list_math_multiplication(linear_part_12, linear_part_12)) / (self.
number_points + entry.number_points)**2.0
380 linear_part_22 = sum(list_math_multiplication(entry.linear_sum, entry.linear_sum))
381 variance_part_third = -(entry.square_sum - (2.0 / entry.number_points) * linear_part_22 + entry.number_points * (1.0 / entry.number_points ** 2.0) * linear_part_22)
383 return variance_part_first + variance_part_second + variance_part_third
388 @brief Representation of node of CF-Tree. 394 @brief Constructor of abstract CF node. 396 @param[in] feature (cfentry): Clustering feature of the created node. 397 @param[in] parent (cfnode): Parent of the created node. 398 @param[in] payload (*): Data that is stored by the node. 409 self.
type = cfnode_type.CFNODE_DUMMY
417 @return (string) Default representation of CF node. 421 return 'CF node %s, parent %s, feature %s' % (hex(id(self)), self.
parent, self.
feature)
426 @return (string) String representation of CF node. 434 @brief Calculates distance between nodes in line with specified type measurement. 436 @param[in] node (cfnode): CF-node that is used for calculation distance to the current node. 437 @param[in] type_measurement (measurement_type): Measurement type that is used for calculation distance. 439 @return (double) Distance between two nodes. 448 @brief Representation of clustering feature non-leaf node. 455 @return (list) List of successors of the node. 461 def __init__(self, feature, parent, successors, payload):
463 @brief Create CF Non-leaf node. 465 @param[in] feature (cfentry): Clustering feature of the created node. 466 @param[in] parent (non_leaf_node): Parent of the created node. 467 @param[in] successors (list): List of successors of the node. 468 @param[in] payload (*): Data that is stored by the node. 472 super().
__init__(feature, parent, payload)
475 self.
type = cfnode_type.CFNODE_NONLEAF
482 @return (string) Representation of non-leaf node representation. 485 return 'Non-leaf node %s, parent %s, feature %s, successors: %d' % (hex(id(self)), self.
parent, self.
feature, len(self.
successors))
490 @return (string) String non-leaf representation. 498 @brief Insert successor to the node. 500 @param[in] successor (cfnode): Successor for adding. 504 self.
feature += successor.feature
507 successor.parent = self
512 @brief Remove successor from the node. 514 @param[in] successor (cfnode): Successor for removing. 518 self.
feature -= successor.feature
521 successor.parent = self
526 @brief Merge non-leaf node to the current. 528 @param[in] node (non_leaf_node): Non-leaf node that should be merged with current. 534 for child
in node.successors:
541 @brief Find pair of farthest successors of the node in line with measurement type. 543 @param[in] type_measurement (measurement_type): Measurement type that is used for obtaining farthest successors. 545 @return (list) Pair of farthest successors represented by list [cfnode1, cfnode2]. 549 farthest_node1 =
None 550 farthest_node2 =
None 551 farthest_distance = 0.0
558 candidate_distance = candidate1.get_distance(candidate2, type_measurement)
560 if candidate_distance > farthest_distance:
561 farthest_distance = candidate_distance
562 farthest_node1 = candidate1
563 farthest_node2 = candidate2
565 return [farthest_node1, farthest_node2]
570 @brief Find pair of nearest successors of the node in line with measurement type. 572 @param[in] type_measurement (measurement_type): Measurement type that is used for obtaining nearest successors. 574 @return (list) Pair of nearest successors represented by list. 580 nearest_distance = float(
"Inf")
587 candidate_distance = candidate1.get_distance(candidate2, type_measurement)
589 if candidate_distance < nearest_distance:
590 nearest_distance = candidate_distance
591 nearest_node1 = candidate1
592 nearest_node2 = candidate2
594 return [nearest_node1, nearest_node2]
599 @brief Represents clustering feature leaf node. 606 @return (list) List of leaf nodes. 612 def __init__(self, feature, parent, entries, payload):
614 @brief Create CF Leaf node. 616 @param[in] feature (cfentry): Clustering feature of the created node. 617 @param[in] parent (non_leaf_node): Parent of the created node. 618 @param[in] entries (list): List of entries of the node. 619 @param[in] payload (*): Data that is stored by the node. 623 super().
__init__(feature, parent, payload)
626 self.
type = cfnode_type.CFNODE_LEAF
633 @return (string) Default leaf node represenation. 638 text_entries +=
"\t" + str(entry) +
"\n" 640 return 'Leaf-node %s, parent %s, feature %s, entries: %d %s' % (hex(id(self)), self.
parent, self.
feature, len(self.
entries), text_entries)
645 @return (string) String leaf node representation. 653 @brief Insert new clustering feature to the leaf node. 655 @param[in] entry (cfentry): Clustering feature. 665 @brief Remove clustering feature from the leaf node. 667 @param[in] entry (cfentry): Clustering feature. 677 @brief Merge leaf node to the current. 679 @param[in] node (leaf_node): Leaf node that should be merged with current. 686 for entry
in node.entries:
692 @brief Find pair of farthest entries of the node. 694 @param[in] type_measurement (measurement_type): Measurement type that is used for obtaining farthest entries. 696 @return (list) Pair of farthest entries of the node that are represented by list. 700 farthest_entity1 =
None 701 farthest_entity2 =
None 702 farthest_distance = 0
704 for i
in range(0, len(self.
entries)):
707 for j
in range(i + 1, len(self.
entries)):
709 candidate_distance = candidate1.get_distance(candidate2, type_measurement)
711 if candidate_distance > farthest_distance:
712 farthest_distance = candidate_distance
713 farthest_entity1 = candidate1
714 farthest_entity2 = candidate2
716 return [farthest_entity1, farthest_entity2]
721 @brief Find nearest index of nearest entry of node for the specified entry. 723 @param[in] entry (cfentry): Entry that is used for calculation distance. 724 @param[in] type_measurement (measurement_type): Measurement type that is used for obtaining nearest entry to the specified. 726 @return (uint) Index of nearest entry of node for the specified entry. 730 minimum_distance = float(
'Inf')
733 for candidate_index
in range(0, len(self.
entries)):
735 if candidate_distance < minimum_distance:
736 nearest_index = candidate_index
743 @brief Find nearest entry of node for the specified entry. 745 @param[in] entry (cfentry): Entry that is used for calculation distance. 746 @param[in] type_measurement (measurement_type): Measurement type that is used for obtaining nearest entry to the specified. 748 @return (cfentry) Nearest entry of node for the specified entry. 752 min_key =
lambda cur_entity: cur_entity.get_distance(entry, type_measurement)
758 @brief CF-Tree representation. 759 @details A CF-tree is a height-balanced tree with two parameters: branching factor and threshold. 766 @return (cfnode) Root of the tree. 775 @return (list) List of all non-leaf nodes in the tree. 784 @return (unit) Number of nodes (leaf and non-leaf) in the tree. 793 @return (uint) Number of entries in the tree. 802 @return (uint) Height of the tree. 811 @return (uint) Branching factor of the tree. 812 @details Branching factor defines maximum number of successors in each non-leaf node. 821 @return (double) Threshold of the tree that represents maximum diameter of sub-clusters that is formed by leaf node entries. 830 @return (uint) Maximum number of entries in each leaf node. 839 @return (measurement_type) Type that is used for measuring. 845 def __init__(self, branch_factor, max_entries, threshold, type_measurement = measurement_type.CENTROID_EUCLIDEAN_DISTANCE):
847 @brief Create CF-tree. 849 @param[in] branch_factor (uint): Maximum number of children for non-leaf nodes. 850 @param[in] max_entries (uint): Maximum number of entries for leaf nodes. 851 @param[in] threshold (double): Maximum diameter of feature clustering for each leaf node. 852 @param[in] type_measurement (measurement_type): Measurement type that is used for calculation distance metrics. 877 @brief Traverses CF-tree to obtain nodes at the specified level. 879 @param[in] level (uint): CF-tree level from that nodes should be returned. 881 @return (list) List of CF-nodes that are located on the specified level of the CF-tree. 892 def __recursive_get_level_nodes(self, level, node):
894 @brief Traverses CF-tree to obtain nodes at the specified level recursively. 896 @param[in] level (uint): Current CF-tree level. 897 @param[in] node (cfnode): CF-node from that traversing is performed. 899 @return (list) List of CF-nodes that are located on the specified level of the CF-tree. 905 level_nodes.append(node);
908 for sucessor
in node.successors:
916 @brief Insert cluster that is represented as list of points where each point is represented by list of coordinates. 917 @details Clustering feature is created for that cluster and inserted to the tree. 919 @param[in] cluster (list): Cluster that is represented by list of points that should be inserted to the tree. 923 entry =
cfentry(len(cluster), linear_sum(cluster), square_sum(cluster));
929 @brief Insert clustering feature to the tree. 931 @param[in] entry (cfentry): Clustering feature that should be inserted. 936 node =
leaf_node(entry,
None, [ entry ],
None);
947 if (child_node_updation
is True):
955 @brief Search nearest leaf to the specified clustering feature. 957 @param[in] entry (cfentry): Clustering feature. 958 @param[in] search_node (cfnode): Node from that searching should be started, if None then search process will be started for the root. 960 @return (leaf_node) Nearest node to the specified clustering feature. 964 if (search_node
is None):
965 search_node = self.
__root;
967 nearest_node = search_node;
969 if (search_node.type == cfnode_type.CFNODE_NONLEAF):
970 min_key =
lambda child_node: child_node.feature.get_distance(entry, self.
__type_measurement);
971 nearest_child_node = min(search_node.successors, key = min_key);
978 def __recursive_insert(self, entry, search_node):
980 @brief Recursive insert of the entry to the tree. 981 @details It performs all required procedures during insertion such as splitting, merging. 983 @param[in] entry (cfentry): Clustering feature. 984 @param[in] search_node (cfnode): Node from that insertion should be started. 986 @return (bool) True if number of nodes at the below level is changed, otherwise False. 991 if (search_node.type == cfnode_type.CFNODE_NONLEAF):
999 def __insert_for_leaf_node(self, entry, search_node):
1001 @brief Recursive insert entry from leaf node to the tree. 1003 @param[in] entry (cfentry): Clustering feature. 1004 @param[in] search_node (cfnode): None-leaf node from that insertion should be started. 1006 @return (bool) True if number of nodes at the below level is changed, otherwise False. 1010 node_amount_updation =
False;
1013 index_nearest_entry = search_node.get_nearest_index_entry(entry, self.
__type_measurement);
1014 merged_entry = search_node.entries[index_nearest_entry] + entry;
1017 if (merged_entry.get_diameter() > self.
__threshold):
1019 search_node.insert_entry(entry);
1024 node_amount_updation =
True;
1030 search_node.entries[index_nearest_entry] = merged_entry;
1031 search_node.feature += entry;
1033 return node_amount_updation;
1036 def __insert_for_noneleaf_node(self, entry, search_node):
1038 @brief Recursive insert entry from none-leaf node to the tree. 1040 @param[in] entry (cfentry): Clustering feature. 1041 @param[in] search_node (cfnode): None-leaf node from that insertion should be started. 1043 @return (bool) True if number of nodes at the below level is changed, otherwise False. 1047 node_amount_updation =
False;
1049 min_key =
lambda child_node: child_node.get_distance(search_node, self.
__type_measurement);
1050 nearest_child_node = min(search_node.successors, key = min_key);
1055 search_node.feature += entry;
1061 if (search_node
is self.
__root):
1063 search_node.parent = self.
__root;
1072 parent = search_node.parent;
1073 parent.successors.remove(search_node);
1074 parent.successors.append(new_node1);
1075 parent.successors.append(new_node2);
1079 node_amount_updation =
True;
1081 elif (child_node_updation
is True):
1086 return node_amount_updation;
1089 def __merge_nearest_successors(self, node):
1091 @brief Find nearest sucessors and merge them. 1093 @param[in] node (non_leaf_node): Node whose two nearest successors should be merged. 1095 @return (bool): True if merging has been successfully performed, otherwise False. 1099 merging_result =
False;
1101 if (node.successors[0].type == cfnode_type.CFNODE_NONLEAF):
1102 [nearest_child_node1, nearest_child_node2] = node.get_nearest_successors(self.
__type_measurement);
1104 if (len(nearest_child_node1.successors) + len(nearest_child_node2.successors) <= self.
__branch_factor):
1105 node.successors.remove(nearest_child_node2);
1106 if (nearest_child_node2.type == cfnode_type.CFNODE_LEAF):
1107 self.
__leafes.remove(nearest_child_node2);
1109 nearest_child_node1.merge(nearest_child_node2);
1111 merging_result =
True;
1113 return merging_result;
1116 def __split_procedure(self, split_node):
1118 @brief Starts node splitting procedure in the CF-tree from the specify node. 1120 @param[in] split_node (cfnode): CF-tree node that should be splitted. 1123 if (split_node
is self.
__root):
1125 split_node.parent = self.
__root;
1138 parent = split_node.parent;
1139 parent.successors.remove(split_node);
1140 parent.successors.append(new_node1);
1141 parent.successors.append(new_node2);
1147 def __split_nonleaf_node(self, node):
1149 @brief Performs splitting of the specified non-leaf node. 1151 @param[in] node (non_leaf_node): Non-leaf node that should be splitted. 1153 @return (list) New pair of non-leaf nodes [non_leaf_node1, non_leaf_node2]. 1157 [farthest_node1, farthest_node2] = node.get_farthest_successors(self.
__type_measurement);
1160 new_node1 =
non_leaf_node(farthest_node1.feature, node.parent, [ farthest_node1 ],
None);
1161 new_node2 =
non_leaf_node(farthest_node2.feature, node.parent, [ farthest_node2 ],
None);
1163 farthest_node1.parent = new_node1;
1164 farthest_node2.parent = new_node2;
1167 for successor
in node.successors:
1168 if ( (successor
is not farthest_node1)
and (successor
is not farthest_node2) ):
1172 if (distance1 < distance2):
1173 new_node1.insert_successor(successor);
1175 new_node2.insert_successor(successor);
1177 return [new_node1, new_node2];
1180 def __split_leaf_node(self, node):
1182 @brief Performs splitting of the specified leaf node. 1184 @param[in] node (leaf_node): Leaf node that should be splitted. 1186 @return (list) New pair of leaf nodes [leaf_node1, leaf_node2]. 1188 @warning Splitted node is transformed to non_leaf. 1193 [farthest_entity1, farthest_entity2] = node.get_farthest_entries(self.
__type_measurement);
1196 new_node1 =
leaf_node(farthest_entity1, node.parent, [ farthest_entity1 ],
None);
1197 new_node2 =
leaf_node(farthest_entity2, node.parent, [ farthest_entity2 ],
None);
1200 for entity
in node.entries:
1201 if ( (entity
is not farthest_entity1)
and (entity
is not farthest_entity2) ):
1205 if (distance1 < distance2):
1206 new_node1.insert_entry(entity);
1208 new_node2.insert_entry(entity);
1210 return [new_node1, new_node2];
1215 @brief Shows feature distribution. 1216 @details Only features in 1D, 2D, 3D space can be visualized. 1218 @param[in] data (list): List of points that will be used for visualization, if it not specified than feature will be displayed only. 1225 if (data
is not None):
1226 visualizer.append_cluster(data, marker =
'x');
1228 for level
in range(0, self.
height):
1231 centers = [ node.feature.get_centroid()
for node
in level_nodes ];
1232 visualizer.append_cluster(centers,
None, markersize = (self.
height - level + 1) * 5);
Common visualizer of clusters on 1D, 2D or 3D surface.
pyclustering module for cluster analysis.
def __init__(self, feature, parent, entries, payload)
Create CF Leaf node.
def linear_sum(self)
Returns linear sum.
Represents clustering feature leaf node.
def get_distance(self, entry, type_measurement)
Calculates distance between two clusters in line with measurement type.
def remove_entry(self, entry)
Remove clustering feature from the leaf node.
def show_feature_destibution(self, data=None)
Shows feature distribution.
def insert_entry(self, entry)
Insert new clustering feature to the leaf node.
def get_farthest_successors(self, type_measurement)
Find pair of farthest successors of the node in line with measurement type.
def __str__(self)
Default cfentry string representation.
def __add__(self, entry)
Overloaded operator add.
def __insert_for_noneleaf_node(self, entry, search_node)
Recursive insert entry from none-leaf node to the tree.
Utils that are used by modules of pyclustering.
def __split_procedure(self, split_node)
Starts node splitting procedure in the CF-tree from the specify node.
def __recursive_get_level_nodes(self, level, node)
Traverses CF-tree to obtain nodes at the specified level recursively.
def __get_average_inter_cluster_distance(self, entry)
Calculates average inter cluster distance between current and specified clusters. ...
def __split_nonleaf_node(self, node)
Performs splitting of the specified non-leaf node.
def merge(self, node)
Merge non-leaf node to the current.
def insert_successor(self, successor)
Insert successor to the node.
feature
Clustering feature of the node.
Representation of clustering feature non-leaf node.
def __get_variance_increase_distance(self, entry)
Calculates variance increase distance between current and specified clusters.
def get_nearest_index_entry(self, entry, type_measurement)
Find nearest index of nearest entry of node for the specified entry.
def merge(self, node)
Merge leaf node to the current.
def __init__(self, feature, parent, payload)
Constructor of abstract CF node.
def get_diameter(self)
Calculates diameter of cluster that is represented by the entry.
def __insert_for_leaf_node(self, entry, search_node)
Recursive insert entry from leaf node to the tree.
def insert(self, entry)
Insert clustering feature to the tree.
def find_nearest_leaf(self, entry, search_node=None)
Search nearest leaf to the specified clustering feature.
def __merge_nearest_successors(self, node)
Find nearest sucessors and merge them.
def type_measurement(self)
def __eq__(self, entry)
Overloaded operator eq.
def insert_cluster(self, cluster)
Insert cluster that is represented as list of points where each point is represented by list of coord...
def get_radius(self)
Calculates radius of cluster that is represented by the entry.
def get_farthest_entries(self, type_measurement)
Find pair of farthest entries of the node.
def get_nearest_entry(self, entry, type_measurement)
Find nearest entry of node for the specified entry.
def square_sum(self)
Returns square sum.
parent
Pointer to the parent node (None for root).
def __init__(self, number_points, linear_sum, square_sum)
CF-entry constructor.
def get_distance(self, node, type_measurement)
Calculates distance between nodes in line with specified type measurement.
def __sub__(self, entry)
Overloaded operator sub.
Enumeration of measurement types for CF-Tree.
type
Type node (leaf or non-leaf).
Clustering feature representation.
def remove_successor(self, successor)
Remove successor from the node.
Representation of node of CF-Tree.
def number_points(self)
Returns number of points that are encoded.
def __init__(self, branch_factor, max_entries, threshold, type_measurement=measurement_type.CENTROID_EUCLIDEAN_DISTANCE)
Create CF-tree.
Enumeration of CF-Node types that are used by CF-Tree.
def __recursive_insert(self, entry, search_node)
Recursive insert of the entry to the tree.
payload
Payload of node where user data can be stored.
def get_centroid(self)
Calculates centroid of cluster that is represented by the entry.
def __split_leaf_node(self, node)
Performs splitting of the specified leaf node.
def get_level_nodes(self, level)
Traverses CF-tree to obtain nodes at the specified level.
def get_nearest_successors(self, type_measurement)
Find pair of nearest successors of the node in line with measurement type.
def __get_average_intra_cluster_distance(self, entry)
Calculates average intra cluster distance between current and specified clusters. ...
def __init__(self, feature, parent, successors, payload)
Create CF Non-leaf node.