pyclustering
0.10.1
pyclustring is a Python, C++ data mining library.
|
Class represents the clustering algorithm BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies). More...
Public Member Functions | |
def | __init__ (self, data, number_clusters, branching_factor=50, max_node_entries=200, diameter=0.5, type_measurement=measurement_type.CENTROID_EUCLIDEAN_DISTANCE, entry_size_limit=500, diameter_multiplier=1.5, ccore=True) |
Constructor of clustering algorithm BIRCH. More... | |
def | process (self) |
Performs cluster analysis in line with rules of BIRCH algorithm. More... | |
def | get_clusters (self) |
Returns list of allocated clusters, each cluster is represented by a list of indexes where each index corresponds to a point in an input dataset. More... | |
def | get_cf_entries (self) |
Returns CF-entries that encodes an input dataset. More... | |
def | get_cf_cluster (self) |
Returns list of allocated CF-entry clusters where each cluster is represented by indexes (each index corresponds to CF-entry). More... | |
def | get_cluster_encoding (self) |
Returns clustering result representation type that indicate how clusters are encoded. More... | |
Class represents the clustering algorithm BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies).
BIRCH is suitable for large databases. The algorithm incrementally and dynamically clusters incoming multi-dimensional metric data points using the concepts of Clustering Feature and CF tree. A Clustering Feature is a triple summarizing the information that is maintained about a cluster. The Clustering Feature vector is defined as a triple:
\[CF=\left ( N, \overrightarrow{LS}, SS \right )\]
Example how to extract clusters from 'OldFaithful' sample using BIRCH algorithm:
Here is the clustering result produced by BIRCH algorithm:
Methods 'get_cf_entries' and 'get_cf_clusters' can be used to obtain information how does an input data is encoded. Here is an example how the encoding information can be extracted and visualized:
Here is the clustering result produced by BIRCH algorithm:
def pyclustering.cluster.birch.birch.__init__ | ( | self, | |
data, | |||
number_clusters, | |||
branching_factor = 50 , |
|||
max_node_entries = 200 , |
|||
diameter = 0.5 , |
|||
type_measurement = measurement_type.CENTROID_EUCLIDEAN_DISTANCE , |
|||
entry_size_limit = 500 , |
|||
diameter_multiplier = 1.5 , |
|||
ccore = True |
|||
) |
Constructor of clustering algorithm BIRCH.
[in] | data | (list): An input data represented as a list of points (objects) where each point is be represented by list of coordinates. |
[in] | number_clusters | (uint): Amount of clusters that should be allocated. |
[in] | branching_factor | (uint): Maximum number of successor that might be contained by each non-leaf node in CF-Tree. |
[in] | max_node_entries | (uint): Maximum number of entries that might be contained by each leaf node in CF-Tree. |
[in] | diameter | (double): CF-entry diameter that used for CF-Tree construction, it might be increase if 'entry_size_limit' is exceeded. |
[in] | type_measurement | (measurement_type): Type measurement used for calculation distance metrics. |
[in] | entry_size_limit | (uint): Maximum number of entries that can be stored in CF-Tree, if it is exceeded during creation then the 'diameter' is increased and CF-Tree is rebuilt. |
[in] | diameter_multiplier | (double): Multiplier that is used for increasing diameter when 'entry_size_limit' is exceeded. |
[in] | ccore | (bool): If True than C++ part of the library is used for processing. |
def pyclustering.cluster.birch.birch.get_cf_cluster | ( | self | ) |
Returns list of allocated CF-entry clusters where each cluster is represented by indexes (each index corresponds to CF-entry).
def pyclustering.cluster.birch.birch.get_cf_entries | ( | self | ) |
Returns CF-entries that encodes an input dataset.
def pyclustering.cluster.birch.birch.get_cluster_encoding | ( | self | ) |
Returns clustering result representation type that indicate how clusters are encoded.
def pyclustering.cluster.birch.birch.get_clusters | ( | self | ) |
Returns list of allocated clusters, each cluster is represented by a list of indexes where each index corresponds to a point in an input dataset.
Definition at line 166 of file birch.py.
Referenced by pyclustering.samples.answer_reader.get_cluster_lengths(), pyclustering.cluster.birch.birch.process(), and pyclustering.cluster.optics.optics.process().
def pyclustering.cluster.birch.birch.process | ( | self | ) |
Performs cluster analysis in line with rules of BIRCH algorithm.