pyclustering.cluster.optics.optics Class Reference

Class represents clustering algorithm OPTICS (Ordering Points To Identify Clustering Structure) with KD-tree optimization (ccore options is supported). More...

Public Member Functions

def __init__ (self, sample, eps, minpts, amount_clusters=None, ccore=True, kwargs)
 Constructor of clustering algorithm OPTICS. More...
 
def process (self)
 Performs cluster analysis in line with rules of OPTICS algorithm. More...
 
def get_clusters (self)
 Returns list of allocated clusters, where each cluster contains indexes of objects and each cluster is represented by list. More...
 
def get_noise (self)
 Returns list of noise that contains indexes of objects that corresponds to input data. More...
 
def get_ordering (self)
 Returns clustering ordering information about the input data set. More...
 
def get_optics_objects (self)
 Returns OPTICS objects where each object contains information about index of point from processed data, core distance and reachability distance. More...
 
def get_radius (self)
 Returns connectivity radius that is calculated and used for clustering by the algorithm. More...
 
def get_cluster_encoding (self)
 Returns clustering result representation type that indicate how clusters are encoded. More...
 

Detailed Description

Class represents clustering algorithm OPTICS (Ordering Points To Identify Clustering Structure) with KD-tree optimization (ccore options is supported).

OPTICS is a density-based algorithm. Purpose of the algorithm is to provide explicit clusters, but create clustering-ordering representation of the input data. Clustering-ordering information contains information about internal structures of data set in terms of density and proper connectivity radius can be obtained for allocation required amount of clusters using this diagram. In case of usage additional input parameter 'amount of clusters' connectivity radius should be bigger than real - because it will be calculated by the algorithms if requested amount of clusters is not allocated.

CCORE option can be used to use the pyclustering core - C/C++ shared library for processing that significantly increases performance.

optics_example_clustering.png
Scheme how does OPTICS works. At the beginning only one cluster is allocated, but two is requested. At the second step OPTICS calculates connectivity radius using cluster-ordering and performs final cluster allocation.

Example:

# Read sample for clustering from some file
sample = read_sample(path_sample);
# Create OPTICS algorithm for cluster analysis
optics_instance = optics(sample, 0.5, 6);
# Run cluster analysis
optics_instance.process();
# Obtain results of clustering
clusters = optics_instance.get_clusters();
noise = optics_instance.get_noise();
# Obtain rechability-distances
ordering = ordering_analyser(optics_instance.get_ordering());
# Visualization of cluster ordering in line with reachability distance.
ordering_visualizer.show_ordering_diagram(ordering);

Amount of clusters that should be allocated can be also specified. In this case connectivity radius should be greater than real, for example:

# Import required packages
from pyclustering.cluster.optics import optics;
from pyclustering.samples.definitions import FCPS_SAMPLES;
from pyclustering.utils import read_sample;
# Read sample for clustering from some file
sample = read_sample(FCPS_SAMPLES.SAMPLE_LSUN);
# Run cluster analysis where connvectivity radius is bigger than real
radius = 2.0;
neighbors = 3;
amount_of_clusters = 3;
optics_instance = optics(sample, radius, neighbors, amount_of_clusters);
# Obtain results of clustering
clusters = optics_instance.get_clusters();
noise = optics_instance.get_noise();

Definition at line 284 of file optics.py.

Constructor & Destructor Documentation

◆ __init__()

def pyclustering.cluster.optics.optics.__init__ (   self,
  sample,
  eps,
  minpts,
  amount_clusters = None,
  ccore = True,
  kwargs 
)

Constructor of clustering algorithm OPTICS.

Parameters
[in]sample(list): Input data that is presented as a list of points (objects), where each point is represented by list or tuple.
[in]eps(double): Connectivity radius between points, points may be connected if distance between them less than the radius.
[in]minpts(uint): Minimum number of shared neighbors that is required for establishing links between points.
[in]amount_clusters(uint): Optional parameter where amount of clusters that should be allocated is specified. In case of usage 'amount_clusters' connectivity radius can be greater than real, in other words, there is place for mistake in connectivity radius usage.
[in]ccore(bool): if True than DLL CCORE (C++ solution) will be used for solving the problem.
[in]**kwargsArbitrary keyword arguments (available arguments: 'data_type').

Keyword Args:

  • data_type (string): Data type of input sample 'data' that is processed by the algorithm ('points', 'distance_matrix').

Definition at line 342 of file optics.py.

Member Function Documentation

◆ get_cluster_encoding()

def pyclustering.cluster.optics.optics.get_cluster_encoding (   self)

Returns clustering result representation type that indicate how clusters are encoded.

Returns
(type_encoding) Clustering result representation.
See also
get_clusters()

Definition at line 565 of file optics.py.

◆ get_clusters()

def pyclustering.cluster.optics.optics.get_clusters (   self)

Returns list of allocated clusters, where each cluster contains indexes of objects and each cluster is represented by list.

Returns
(list) List of allocated clusters.
See also
process()
get_noise()
get_ordering()
get_radius()

Definition at line 472 of file optics.py.

Referenced by pyclustering.samples.answer_reader.get_cluster_lengths(), and pyclustering.cluster.optics.optics.process().

◆ get_noise()

def pyclustering.cluster.optics.optics.get_noise (   self)

Returns list of noise that contains indexes of objects that corresponds to input data.

Returns
(list) List of allocated noise objects.
See also
process()
get_clusters()
get_ordering()
get_radius()

Definition at line 488 of file optics.py.

◆ get_optics_objects()

def pyclustering.cluster.optics.optics.get_optics_objects (   self)

Returns OPTICS objects where each object contains information about index of point from processed data, core distance and reachability distance.

Returns
(list) OPTICS objects.
See also
get_ordering()
get_clusters()
get_noise()
optics_descriptor

Definition at line 531 of file optics.py.

◆ get_ordering()

def pyclustering.cluster.optics.optics.get_ordering (   self)

Returns clustering ordering information about the input data set.

Clustering ordering of data-set contains the information about the internal clustering structure in line with connectivity radius.

Returns
(ordering_analyser) Analyser of clustering ordering.
See also
process()
get_clusters()
get_noise()
get_radius()
get_optics_objects()

Definition at line 504 of file optics.py.

Referenced by pyclustering.cluster.optics.optics.process().

◆ get_radius()

def pyclustering.cluster.optics.optics.get_radius (   self)

Returns connectivity radius that is calculated and used for clustering by the algorithm.

Connectivity radius may be changed only in case of usage additional parameter of the algorithm - amount of clusters for allocation.

Returns
(double) Connectivity radius.
See also
get_ordering()
get_clusters()
get_noise()
get_optics_objects()

Definition at line 548 of file optics.py.

◆ process()

def pyclustering.cluster.optics.optics.process (   self)

Performs cluster analysis in line with rules of OPTICS algorithm.

Remarks
Results of clustering can be obtained using corresponding gets methods.
See also
get_clusters()
get_noise()
get_ordering()

Definition at line 381 of file optics.py.


The documentation for this class was generated from the following file: