pyclustering  0.10.1
pyclustring is a Python, C++ data mining library.
pyclustering.cluster.encoder.cluster_encoder Class Reference

Provides service to change clustering result representation. More...

Public Member Functions

def __init__ (self, encoding, clusters, data)
 Constructor of clustering result representor. More...
 
def get_encoding (self)
 Returns current cluster representation.
 
def get_clusters (self)
 Returns clusters that are represented in line with type that is defined by get_encoding(). More...
 
def get_data (self)
 Returns data that was used for cluster analysis.
 
def set_encoding (self, encoding)
 Change clusters encoding to specified type (Index List, Object List, Labeling). More...
 

Detailed Description

Provides service to change clustering result representation.

There are three general types of representation:

  1. Index List Separation that is defined by CLUSTER_INDEX_LIST_SEPARATION, for example [[0, 1, 2], [3, 4], [5, 6, 7].
  2. Index Labeling that is defined by CLUSTER_INDEX_LABELING, for example [0, 0, 0, 1, 1, 2, 2, 2].
  3. Object List Separation that is defined by CLUSTER_OBJECT_LIST_SEPARATION, for example [[obj1, obj2, obj3], [obj4, obj5], [obj5, obj6, obj7].

There is an example how to covert default Index List Separation to other types:

from pyclustering.utils import read_sample
from pyclustering.samples.definitions import SIMPLE_SAMPLES
from pyclustering.cluster.encoder import type_encoding, cluster_encoder
from pyclustering.cluster.kmeans import kmeans
# load list of points for cluster analysis
sample = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE1)
# create instance of K-Means algorithm
kmeans_instance = kmeans(sample, [[3.0, 5.1], [6.5, 8.6]])
# run cluster analysis and obtain results
kmeans_instance.process()
clusters = kmeans_instance.get_clusters()
print("Index List Separation:", clusters)
# by default k-means returns representation CLUSTER_INDEX_LIST_SEPARATION
type_repr = kmeans_instance.get_cluster_encoding()
encoder = cluster_encoder(type_repr, clusters, sample)
# change representation from index list to label list
encoder.set_encoding(type_encoding.CLUSTER_INDEX_LABELING)
print("Index Labeling:", encoder.get_clusters())
# change representation from label to object list
encoder.set_encoding(type_encoding.CLUSTER_OBJECT_LIST_SEPARATION)
print("Object List Separation:", encoder.get_clusters())

Output of the code above is following:

Index List Separation: [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]
Index Labeling: [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
Object List Separation: [[[3.522979, 5.487981], [3.768699, 5.364477], [3.423602, 5.4199], [3.803905, 5.389491], [3.93669, 5.663041]], [[6.968136, 7.755556], [6.750795, 7.269541], [6.593196, 7.850364], [6.978178, 7.60985], [6.554487, 7.498119]]]

If there is no index or object in clusters that exists in an input data then it is going to be marked as NaN in case of Index Labeling. Here is an example:

from pyclustering.cluster.encoder import type_encoding, cluster_encoder
# An input data.
sample = [[1.0, 1.2], [1.2, 2.3], [114.3, 54.1], [2.2, 1.4], [5.3, 1.3]]
# Clusters do not contains object with index 2 ([114.3, 54.1]) because it is outline.
clusters = [[0, 1], [3, 4]]
encoder = cluster_encoder(type_encoding.CLUSTER_INDEX_LIST_SEPARATION, clusters, sample)
encoder.set_encoding(type_encoding.CLUSTER_INDEX_LABELING)
print("Index Labeling:", encoder.get_clusters())

Here is an output of the code above. Pay attention to NaN value for the object with index 2 [114.3, 54.1].

Index Labeling: [0, 0, nan, 1, 1]

Definition at line 32 of file encoder.py.

Constructor & Destructor Documentation

◆ __init__()

def pyclustering.cluster.encoder.cluster_encoder.__init__ (   self,
  encoding,
  clusters,
  data 
)

Constructor of clustering result representor.

Parameters
[in]encoding(type_encoding): Type of clusters representation (Index List, Object List or Labels).
[in]clusters(list): Clusters that were allocated from an input data.
[in]data(list): Data that was used for cluster analysis.
See also
type_encoding

Definition at line 103 of file encoder.py.

Member Function Documentation

◆ get_clusters()

def pyclustering.cluster.encoder.cluster_encoder.get_clusters (   self)

Returns clusters that are represented in line with type that is defined by get_encoding().

See also
get_encoding()

Definition at line 129 of file encoder.py.

Referenced by pyclustering.samples.answer_reader.get_cluster_lengths(), and pyclustering.cluster.optics.optics.process().

◆ set_encoding()

def pyclustering.cluster.encoder.cluster_encoder.set_encoding (   self,
  encoding 
)

Change clusters encoding to specified type (Index List, Object List, Labeling).

Parameters
[in]encoding(type_encoding): New type of clusters representation.
Returns
(cluster_encoder) Return itself.

Definition at line 147 of file encoder.py.


The documentation for this class was generated from the following file:
pyclustering.cluster.kmeans
The module contains K-Means algorithm and other related services.
Definition: kmeans.py:1
pyclustering.utils
Utils that are used by modules of pyclustering.
Definition: __init__.py:1
pyclustering.utils.read_sample
def read_sample(filename)
Returns data sample from simple text file.
Definition: __init__.py:30
pyclustering.cluster.encoder
Module for representing clustering results.
Definition: encoder.py:1