3 @brief Data Structure: CF-Tree 4 @details Implementation based on paper @cite article::birch::1. 6 @authors Andrei Novikov (pyclustering@yandex.ru) 8 @copyright GNU Public License 10 @cond GNU_PUBLIC_LICENSE 11 PyClustering is free software: you can redistribute it and/or modify 12 it under the terms of the GNU General Public License as published by 13 the Free Software Foundation, either version 3 of the License, or 14 (at your option) any later version. 16 PyClustering is distributed in the hope that it will be useful, 17 but WITHOUT ANY WARRANTY; without even the implied warranty of 18 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 19 GNU General Public License for more details. 21 You should have received a copy of the GNU General Public License 22 along with this program. If not, see <http://www.gnu.org/licenses/>. 33 from pyclustering.utils import list_math_addition, list_math_subtraction, list_math_multiplication
36 from enum
import IntEnum
41 @brief Enumeration of measurement types for CF-Tree. 48 CENTROID_EUCLIDEAN_DISTANCE = 0;
51 CENTROID_MANHATTAN_DISTANCE = 1;
54 AVERAGE_INTER_CLUSTER_DISTANCE = 2;
57 AVERAGE_INTRA_CLUSTER_DISTANCE = 3;
60 VARIANCE_INCREASE_DISTANCE = 4;
65 @brief Enumeration of CF-Node types that are used by CF-Tree. 84 @brief Clustering feature representation. 94 @brief Returns number of points that are encoded. 96 @return (uint) Number of encoded points. 104 @brief Returns linear sum. 106 @return (list) Linear sum. 115 @brief Returns square sum. 117 @return (double) Square sum. 123 def __init__(self, number_points, linear_sum, square_sum):
125 @brief CF-entry constructor. 127 @param[in] number_points (uint): Number of objects that is represented by the entry. 128 @param[in] linear_sum (list): Linear sum of values that represent objects in each dimension. 129 @param[in] square_sum (double): Square sum of values that represent objects. 144 @returns (cfentry) Makes copy of the cfentry instance. 152 @return (string) Default cfentry representation. 160 @brief Default cfentry string representation. 168 @brief Overloaded operator add. Performs addition of two clustering features. 170 @param[in] entry (cfentry): Entry that is added to the current. 172 @return (cfentry) Result of addition of two clustering features. 177 result_linear_sum = list_math_addition(self.
linear_sum, entry.linear_sum);
178 result_square_sum = self.
square_sum + entry.square_sum;
180 return cfentry(number_points, result_linear_sum, result_square_sum);
185 @brief Overloaded operator sub. Performs substraction of two clustering features. 186 @details Substraction can't be performed with clustering feature whose description is less then substractor. 188 @param[in] entry (cfentry): Entry that is substracted from the current. 190 @return (cfentry) Result of substraction of two clustering features. 195 result_linear_sum = list_math_subtraction(self.
linear_sum, entry.linear_sum);
196 result_square_sum = self.
square_sum - entry.square_sum;
198 if ( (number_points < 0)
or (result_square_sum < 0) ):
199 raise NameError(
"Substruction with negative result is not possible for clustering features.");
201 return cfentry(number_points, result_linear_sum, result_square_sum);
206 @brief Overloaded operator eq. 207 @details Performs comparison of two clustering features. 209 @param[in] entry (cfentry): Entry that is used for comparison with current. 211 @return (bool) True is both clustering features are equals in line with tolerance, otherwise False. 218 result &= ( (self.
square_sum + tolerance > entry.square_sum)
and (self.
square_sum - tolerance < entry.square_sum) );
220 for index_dimension
in range(0, len(self.
linear_sum)):
221 result &= ( (self.
linear_sum[index_dimension] + tolerance > entry.linear_sum[index_dimension])
and (self.
linear_sum[index_dimension] - tolerance < entry.linear_sum[index_dimension]) );
228 @brief Calculates distance between two clusters in line with measurement type. 230 @details In case of usage CENTROID_EUCLIDIAN_DISTANCE square euclidian distance will be returned. 231 Square root should be taken from the result for obtaining real euclidian distance between 234 @param[in] entry (cfentry): Clustering feature to which distance should be obtained. 235 @param[in] type_measurement (measurement_type): Distance measurement algorithm between two clusters. 237 @return (double) Distance between two clusters. 241 if (type_measurement
is measurement_type.CENTROID_EUCLIDEAN_DISTANCE):
242 return euclidean_distance_square(entry.get_centroid(), self.
get_centroid());
244 elif (type_measurement
is measurement_type.CENTROID_MANHATTAN_DISTANCE):
245 return manhattan_distance(entry.get_centroid(), self.
get_centroid());
247 elif (type_measurement
is measurement_type.AVERAGE_INTER_CLUSTER_DISTANCE):
250 elif (type_measurement
is measurement_type.AVERAGE_INTRA_CLUSTER_DISTANCE):
253 elif (type_measurement
is measurement_type.VARIANCE_INCREASE_DISTANCE):
262 @brief Calculates centroid of cluster that is represented by the entry. 263 @details It's calculated once when it's requested after the last changes. 265 @return (list) Centroid of cluster that is represented by the entry. 273 for index_dimension
in range(0, len(self.
linear_sum)):
281 @brief Calculates radius of cluster that is represented by the entry. 282 @details It's calculated once when it's requested after the last changes. 284 @return (double) Radius of cluster that is represented by the entry. 298 if (type(centroid) == list):
299 radius_part_2 = 2.0 * sum(list_math_multiplication(self.
linear_sum, centroid));
300 radius_part_3 = self.
number_points * sum(list_math_multiplication(centroid, centroid));
302 radius_part_2 = 2.0 * self.
linear_sum * centroid;
305 self.
__radius = ( (1.0 / self.
number_points) * (radius_part_1 - radius_part_2 + radius_part_3) ) ** 0.5;
311 @brief Calculates diameter of cluster that is represented by the entry. 312 @details It's calculated once when it's requested after the last changes. 314 @return (double) Diameter of cluster that is represented by the entry. 331 def __get_average_inter_cluster_distance(self, entry):
333 @brief Calculates average inter cluster distance between current and specified clusters. 335 @param[in] entry (cfentry): Clustering feature to which distance should be obtained. 337 @return (double) Average inter cluster distance. 341 linear_part_distance = sum(list_math_multiplication(self.
linear_sum, entry.linear_sum));
346 def __get_average_intra_cluster_distance(self, entry):
348 @brief Calculates average intra cluster distance between current and specified clusters. 350 @param[in] entry (cfentry): Clustering feature to which distance should be obtained. 352 @return (double) Average intra cluster distance. 356 linear_part_first = list_math_addition(self.
linear_sum, entry.linear_sum);
357 linear_part_second = linear_part_first;
359 linear_part_distance = sum(list_math_multiplication(linear_part_first, linear_part_second));
361 general_part_distance = 2.0 * (self.
number_points + entry.number_points) * (self.
square_sum + entry.square_sum) - 2.0 * linear_part_distance;
363 return (general_part_distance / ( (self.
number_points + entry.number_points) * (self.
number_points + entry.number_points - 1.0) )) ** 0.5;
366 def __get_variance_increase_distance(self, entry):
368 @brief Calculates variance increase distance between current and specified clusters. 370 @param[in] entry (cfentry): Clustering feature to which distance should be obtained. 372 @return (double) Variance increase distance. 376 linear_part_12 = list_math_addition(self.
linear_sum, entry.linear_sum);
377 variance_part_first = (self.
square_sum + entry.square_sum) - \
378 2.0 * sum(list_math_multiplication(linear_part_12, linear_part_12)) / (self.
number_points + entry.number_points) + \
379 (self.
number_points + entry.number_points) * sum(list_math_multiplication(linear_part_12, linear_part_12)) / (self.
number_points + entry.number_points)**2.0;
385 linear_part_22 = sum(list_math_multiplication(entry.linear_sum, entry.linear_sum));
386 variance_part_third = -( entry.square_sum - (2.0 / entry.number_points) * linear_part_22 + entry.number_points * (1.0 / entry.number_points ** 2.0) * linear_part_22 );
388 return (variance_part_first + variance_part_second + variance_part_third);
393 @brief Representation of node of CF-Tree. 399 @brief Constructor of abstract CF node. 401 @param[in] feature (cfentry): Clustering feature of the created node. 402 @param[in] parent (cfnode): Parent of the created node. 403 @param[in] payload (*): Data that is stored by the node. 414 self.
type = cfnode_type.CFNODE_DUMMY;
422 @return (string) Default representation of CF node. 426 return 'CF node %s, parent %s, feature %s' % ( hex(id(self)), self.
parent, self.
feature );
431 @return (string) String representation of CF node. 439 @brief Calculates distance between nodes in line with specified type measurement. 441 @param[in] node (cfnode): CF-node that is used for calculation distance to the current node. 442 @param[in] type_measurement (measurement_type): Measurement type that is used for calculation distance. 444 @return (double) Distance between two nodes. 453 @brief Representation of clustering feature non-leaf node. 460 @return (list) List of successors of the node. 466 def __init__(self, feature, parent, successors, payload):
468 @brief Create CF Non-leaf node. 470 @param[in] feature (cfentry): Clustering feature of the created node. 471 @param[in] parent (non_leaf_node): Parent of the created node. 472 @param[in] successors (list): List of successors of the node. 473 @param[in] payload (*): Data that is stored by the node. 477 super().
__init__(feature, parent, payload);
480 self.
type = cfnode_type.CFNODE_NONLEAF;
487 @return (string) Representation of non-leaf node representation. 490 return 'Non-leaf node %s, parent %s, feature %s, successors: %d' % ( hex(id(self)), self.
parent, self.
feature, len(self.
successors) );
495 @return (string) String non-leaf representation. 503 @brief Insert successor to the node. 505 @param[in] successor (cfnode): Successor for adding. 509 self.
feature += successor.feature;
512 successor.parent = self;
517 @brief Remove successor from the node. 519 @param[in] successor (cfnode): Successor for removing. 523 self.
feature -= successor.feature;
526 successor.parent = self;
531 @brief Merge non-leaf node to the current. 533 @param[in] node (non_leaf_node): Non-leaf node that should be merged with current. 539 for child
in node.successors:
546 @brief Find pair of farthest successors of the node in line with measurement type. 548 @param[in] type_measurement (measurement_type): Measurement type that is used for obtaining farthest successors. 550 @return (list) Pair of farthest successors represented by list [cfnode1, cfnode2]. 554 farthest_node1 =
None;
555 farthest_node2 =
None;
556 farthest_distance = 0;
563 candidate_distance = candidate1.get_distance(candidate2, type_measurement);
565 if (candidate_distance > farthest_distance):
566 farthest_distance = candidate_distance;
567 farthest_node1 = candidate1;
568 farthest_node2 = candidate2;
570 return [farthest_node1, farthest_node2];
575 @brief Find pair of nearest successors of the node in line with measurement type. 577 @param[in] type_measurement (measurement_type): Measurement type that is used for obtaining nearest successors. 579 @return (list) Pair of nearest successors represented by list. 583 nearest_node1 =
None;
584 nearest_node2 =
None;
585 nearest_distance = float(
"Inf");
592 candidate_distance = candidate1.get_distance(candidate2, type_measurement);
594 if (candidate_distance < nearest_distance):
595 nearest_distance = candidate_distance;
596 nearest_node1 = candidate1;
597 nearest_node2 = candidate2;
599 return [nearest_node1, nearest_node2];
604 @brief Represents clustering feature leaf node. 611 @return (list) List of leaf nodes. 617 def __init__(self, feature, parent, entries, payload):
619 @brief Create CF Leaf node. 621 @param[in] feature (cfentry): Clustering feature of the created node. 622 @param[in] parent (non_leaf_node): Parent of the created node. 623 @param[in] entries (list): List of entries of the node. 624 @param[in] payload (*): Data that is stored by the node. 628 super().
__init__(feature, parent, payload);
631 self.
type = cfnode_type.CFNODE_LEAF;
638 @return (string) Default leaf node represenation. 643 text_entries +=
"\t" + str(entry) +
"\n";
645 return 'Leaf-node %s, parent %s, feature %s, entries: %d %s' % ( hex(id(self)), self.
parent, self.
feature, len(self.
entries), text_entries );
650 @return (string) String leaf node representation. 658 @brief Insert new clustering feature to the leaf node. 660 @param[in] entry (cfentry): Clustering feature. 670 @brief Remove clustering feature from the leaf node. 672 @param[in] entry (cfentry): Clustering feature. 682 @brief Merge leaf node to the current. 684 @param[in] node (leaf_node): Leaf node that should be merged with current. 691 for entry
in node.entries:
697 @brief Find pair of farthest entries of the node. 699 @param[in] type_measurement (measurement_type): Measurement type that is used for obtaining farthest entries. 701 @return (list) Pair of farthest entries of the node that are represented by list. 705 farthest_entity1 =
None;
706 farthest_entity2 =
None;
707 farthest_distance = 0;
709 for i
in range(0, len(self.
entries)):
712 for j
in range(i + 1, len(self.
entries)):
714 candidate_distance = candidate1.get_distance(candidate2, type_measurement);
716 if (candidate_distance > farthest_distance):
717 farthest_distance = candidate_distance;
718 farthest_entity1 = candidate1;
719 farthest_entity2 = candidate2;
721 return [farthest_entity1, farthest_entity2];
726 @brief Find nearest index of nearest entry of node for the specified entry. 728 @param[in] entry (cfentry): Entry that is used for calculation distance. 729 @param[in] type_measurement (measurement_type): Measurement type that is used for obtaining nearest entry to the specified. 731 @return (uint) Index of nearest entry of node for the specified entry. 735 minimum_distance = float(
'Inf');
738 for candidate_index
in range(0, len(self.
entries)):
740 if (candidate_distance < minimum_distance):
741 nearest_index = candidate_index;
743 return nearest_index;
748 @brief Find nearest entry of node for the specified entry. 750 @param[in] entry (cfentry): Entry that is used for calculation distance. 751 @param[in] type_measurement (measurement_type): Measurement type that is used for obtaining nearest entry to the specified. 753 @return (cfentry) Nearest entry of node for the specified entry. 757 min_key =
lambda cur_entity: cur_entity.get_distance(entry, type_measurement);
758 return min(self.
__entries, key = min_key);
763 @brief CF-Tree representation. 764 @details A CF-tree is a height-balanced tree with two parameters: branching factor and threshold. 771 @return (cfnode) Root of the tree. 780 @return (list) List of all non-leaf nodes in the tree. 789 @return (unit) Number of nodes (leaf and non-leaf) in the tree. 798 @return (uint) Number of entries in the tree. 807 @return (uint) Height of the tree. 816 @return (uint) Branching factor of the tree. 817 @details Branching factor defines maximum number of successors in each non-leaf node. 826 @return (double) Threshold of the tree that represents maximum diameter of sub-clusters that is formed by leaf node entries. 835 @return (uint) Maximum number of entries in each leaf node. 844 @return (measurement_type) Type that is used for measuring. 850 def __init__(self, branch_factor, max_entries, threshold, type_measurement = measurement_type.CENTROID_EUCLIDEAN_DISTANCE):
852 @brief Create CF-tree. 854 @param[in] branch_factor (uint): Maximum number of children for non-leaf nodes. 855 @param[in] max_entries (uint): Maximum number of entries for leaf nodes. 856 @param[in] threshold (double): Maximum diameter of feature clustering for each leaf node. 857 @param[in] type_measurement (measurement_type): Measurement type that is used for calculation distance metrics. 883 @brief Traverses CF-tree to obtain nodes at the specified level. 885 @param[in] level (uint): CF-tree level from that nodes should be returned. 887 @return (list) List of CF-nodes that are located on the specified level of the CF-tree. 898 def __recursive_get_level_nodes(self, level, node):
900 @brief Traverses CF-tree to obtain nodes at the specified level recursively. 902 @param[in] level (uint): Current CF-tree level. 903 @param[in] node (cfnode): CF-node from that traversing is performed. 905 @return (list) List of CF-nodes that are located on the specified level of the CF-tree. 911 level_nodes.append(node);
914 for sucessor
in node.successors:
922 @brief Insert cluster that is represented as list of points where each point is represented by list of coordinates. 923 @details Clustering feature is created for that cluster and inserted to the tree. 925 @param[in] cluster (list): Cluster that is represented by list of points that should be inserted to the tree. 929 entry =
cfentry(len(cluster), linear_sum(cluster), square_sum(cluster));
935 @brief Insert clustering feature to the tree. 937 @param[in] entry (cfentry): Clustering feature that should be inserted. 942 node =
leaf_node(entry,
None, [ entry ],
None);
953 if (child_node_updation
is True):
961 @brief Search nearest leaf to the specified clustering feature. 963 @param[in] entry (cfentry): Clustering feature. 964 @param[in] search_node (cfnode): Node from that searching should be started, if None then search process will be started for the root. 966 @return (leaf_node) Nearest node to the specified clustering feature. 970 if (search_node
is None):
971 search_node = self.
__root;
973 nearest_node = search_node;
975 if (search_node.type == cfnode_type.CFNODE_NONLEAF):
976 min_key =
lambda child_node: child_node.feature.get_distance(entry, self.
__type_measurement);
977 nearest_child_node = min(search_node.successors, key = min_key);
984 def __recursive_insert(self, entry, search_node):
986 @brief Recursive insert of the entry to the tree. 987 @details It performs all required procedures during insertion such as splitting, merging. 989 @param[in] entry (cfentry): Clustering feature. 990 @param[in] search_node (cfnode): Node from that insertion should be started. 992 @return (bool) True if number of nodes at the below level is changed, otherwise False. 997 if (search_node.type == cfnode_type.CFNODE_NONLEAF):
1005 def __insert_for_leaf_node(self, entry, search_node):
1007 @brief Recursive insert entry from leaf node to the tree. 1009 @param[in] entry (cfentry): Clustering feature. 1010 @param[in] search_node (cfnode): None-leaf node from that insertion should be started. 1012 @return (bool) True if number of nodes at the below level is changed, otherwise False. 1016 node_amount_updation =
False;
1019 index_nearest_entry = search_node.get_nearest_index_entry(entry, self.
__type_measurement);
1020 merged_entry = search_node.entries[index_nearest_entry] + entry;
1023 if (merged_entry.get_diameter() > self.
__threshold):
1025 search_node.insert_entry(entry);
1030 node_amount_updation =
True;
1036 search_node.entries[index_nearest_entry] = merged_entry;
1037 search_node.feature += entry;
1039 return node_amount_updation;
1042 def __insert_for_noneleaf_node(self, entry, search_node):
1044 @brief Recursive insert entry from none-leaf node to the tree. 1046 @param[in] entry (cfentry): Clustering feature. 1047 @param[in] search_node (cfnode): None-leaf node from that insertion should be started. 1049 @return (bool) True if number of nodes at the below level is changed, otherwise False. 1053 node_amount_updation =
False;
1055 min_key =
lambda child_node: child_node.get_distance(search_node, self.
__type_measurement);
1056 nearest_child_node = min(search_node.successors, key = min_key);
1061 search_node.feature += entry;
1067 if (search_node
is self.
__root):
1069 search_node.parent = self.
__root;
1078 parent = search_node.parent;
1079 parent.successors.remove(search_node);
1080 parent.successors.append(new_node1);
1081 parent.successors.append(new_node2);
1085 node_amount_updation =
True;
1087 elif (child_node_updation
is True):
1092 return node_amount_updation;
1095 def __merge_nearest_successors(self, node):
1097 @brief Find nearest sucessors and merge them. 1099 @param[in] node (non_leaf_node): Node whose two nearest successors should be merged. 1101 @return (bool): True if merging has been successfully performed, otherwise False. 1105 merging_result =
False;
1107 if (node.successors[0].type == cfnode_type.CFNODE_NONLEAF):
1108 [nearest_child_node1, nearest_child_node2] = node.get_nearest_successors(self.
__type_measurement);
1110 if (len(nearest_child_node1.successors) + len(nearest_child_node2.successors) <= self.
__branch_factor):
1111 node.successors.remove(nearest_child_node2);
1112 if (nearest_child_node2.type == cfnode_type.CFNODE_LEAF):
1113 self.
__leafes.remove(nearest_child_node2);
1115 nearest_child_node1.merge(nearest_child_node2);
1117 merging_result =
True;
1119 return merging_result;
1122 def __split_procedure(self, split_node):
1124 @brief Starts node splitting procedure in the CF-tree from the specify node. 1126 @param[in] split_node (cfnode): CF-tree node that should be splitted. 1129 if (split_node
is self.
__root):
1131 split_node.parent = self.
__root;
1144 parent = split_node.parent;
1145 parent.successors.remove(split_node);
1146 parent.successors.append(new_node1);
1147 parent.successors.append(new_node2);
1153 def __split_nonleaf_node(self, node):
1155 @brief Performs splitting of the specified non-leaf node. 1157 @param[in] node (non_leaf_node): Non-leaf node that should be splitted. 1159 @return (list) New pair of non-leaf nodes [non_leaf_node1, non_leaf_node2]. 1163 [farthest_node1, farthest_node2] = node.get_farthest_successors(self.
__type_measurement);
1166 new_node1 =
non_leaf_node(farthest_node1.feature, node.parent, [ farthest_node1 ],
None);
1167 new_node2 =
non_leaf_node(farthest_node2.feature, node.parent, [ farthest_node2 ],
None);
1169 farthest_node1.parent = new_node1;
1170 farthest_node2.parent = new_node2;
1173 for successor
in node.successors:
1174 if ( (successor
is not farthest_node1)
and (successor
is not farthest_node2) ):
1178 if (distance1 < distance2):
1179 new_node1.insert_successor(successor);
1181 new_node2.insert_successor(successor);
1183 return [new_node1, new_node2];
1186 def __split_leaf_node(self, node):
1188 @brief Performs splitting of the specified leaf node. 1190 @param[in] node (leaf_node): Leaf node that should be splitted. 1192 @return (list) New pair of leaf nodes [leaf_node1, leaf_node2]. 1194 @warning Splitted node is transformed to non_leaf. 1199 [farthest_entity1, farthest_entity2] = node.get_farthest_entries(self.
__type_measurement);
1202 new_node1 =
leaf_node(farthest_entity1, node.parent, [ farthest_entity1 ],
None);
1203 new_node2 =
leaf_node(farthest_entity2, node.parent, [ farthest_entity2 ],
None);
1206 for entity
in node.entries:
1207 if ( (entity
is not farthest_entity1)
and (entity
is not farthest_entity2) ):
1211 if (distance1 < distance2):
1212 new_node1.insert_entry(entity);
1214 new_node2.insert_entry(entity);
1216 return [new_node1, new_node2];
1221 @brief Shows feature distribution. 1222 @details Only features in 1D, 2D, 3D space can be visualized. 1224 @param[in] data (list): List of points that will be used for visualization, if it not specified than feature will be displayed only. 1231 if (data
is not None):
1232 visualizer.append_cluster(data, marker =
'x');
1234 for level
in range(0, self.
height):
1237 centers = [ node.feature.get_centroid()
for node
in level_nodes ];
1238 visualizer.append_cluster(centers,
None, markersize = (self.
height - level + 1) * 5);
Common visualizer of clusters on 1D, 2D or 3D surface.
pyclustering module for cluster analysis.
def __init__(self, feature, parent, entries, payload)
Create CF Leaf node.
def linear_sum(self)
Returns linear sum.
Represents clustering feature leaf node.
def get_distance(self, entry, type_measurement)
Calculates distance between two clusters in line with measurement type.
def remove_entry(self, entry)
Remove clustering feature from the leaf node.
def show_feature_destibution(self, data=None)
Shows feature distribution.
def insert_entry(self, entry)
Insert new clustering feature to the leaf node.
def get_farthest_successors(self, type_measurement)
Find pair of farthest successors of the node in line with measurement type.
def __str__(self)
Default cfentry string representation.
def __add__(self, entry)
Overloaded operator add.
def __insert_for_noneleaf_node(self, entry, search_node)
Recursive insert entry from none-leaf node to the tree.
Utils that are used by modules of pyclustering.
def __split_procedure(self, split_node)
Starts node splitting procedure in the CF-tree from the specify node.
def __recursive_get_level_nodes(self, level, node)
Traverses CF-tree to obtain nodes at the specified level recursively.
def __get_average_inter_cluster_distance(self, entry)
Calculates average inter cluster distance between current and specified clusters. ...
def __split_nonleaf_node(self, node)
Performs splitting of the specified non-leaf node.
def merge(self, node)
Merge non-leaf node to the current.
def insert_successor(self, successor)
Insert successor to the node.
feature
Clustering feature of the node.
Representation of clustering feature non-leaf node.
def __get_variance_increase_distance(self, entry)
Calculates variance increase distance between current and specified clusters.
def get_nearest_index_entry(self, entry, type_measurement)
Find nearest index of nearest entry of node for the specified entry.
def merge(self, node)
Merge leaf node to the current.
def __init__(self, feature, parent, payload)
Constructor of abstract CF node.
def get_diameter(self)
Calculates diameter of cluster that is represented by the entry.
def __insert_for_leaf_node(self, entry, search_node)
Recursive insert entry from leaf node to the tree.
def insert(self, entry)
Insert clustering feature to the tree.
def find_nearest_leaf(self, entry, search_node=None)
Search nearest leaf to the specified clustering feature.
def __merge_nearest_successors(self, node)
Find nearest sucessors and merge them.
def type_measurement(self)
def __eq__(self, entry)
Overloaded operator eq.
def insert_cluster(self, cluster)
Insert cluster that is represented as list of points where each point is represented by list of coord...
def get_radius(self)
Calculates radius of cluster that is represented by the entry.
def get_farthest_entries(self, type_measurement)
Find pair of farthest entries of the node.
def get_nearest_entry(self, entry, type_measurement)
Find nearest entry of node for the specified entry.
def square_sum(self)
Returns square sum.
parent
Pointer to the parent node (None for root).
def __init__(self, number_points, linear_sum, square_sum)
CF-entry constructor.
def get_distance(self, node, type_measurement)
Calculates distance between nodes in line with specified type measurement.
def __sub__(self, entry)
Overloaded operator sub.
Enumeration of measurement types for CF-Tree.
type
Type node (leaf or non-leaf).
Clustering feature representation.
def remove_successor(self, successor)
Remove successor from the node.
Representation of node of CF-Tree.
def number_points(self)
Returns number of points that are encoded.
def __init__(self, branch_factor, max_entries, threshold, type_measurement=measurement_type.CENTROID_EUCLIDEAN_DISTANCE)
Create CF-tree.
Enumeration of CF-Node types that are used by CF-Tree.
def __recursive_insert(self, entry, search_node)
Recursive insert of the entry to the tree.
payload
Payload of node where user data can be stored.
def get_centroid(self)
Calculates centroid of cluster that is represented by the entry.
def __split_leaf_node(self, node)
Performs splitting of the specified leaf node.
def get_level_nodes(self, level)
Traverses CF-tree to obtain nodes at the specified level.
def get_nearest_successors(self, type_measurement)
Find pair of nearest successors of the node in line with measurement type.
def __get_average_intra_cluster_distance(self, entry)
Calculates average intra cluster distance between current and specified clusters. ...
def __init__(self, feature, parent, successors, payload)
Create CF Non-leaf node.