pyclustering
0.10.1
pyclustring is a Python, C++ data mining library.
|
Utils that are used by modules of pyclustering. More...
Namespaces | |
color | |
Colors used by pyclustering library for visualization. | |
graph | |
Graph representation (uses format GRPR). | |
metric | |
Module provides various distance metrics - abstraction of the notion of distance in a metric space. | |
sampling | |
Module provides various random sampling algorithms. | |
Functions | |
def | read_sample (filename) |
Returns data sample from simple text file. More... | |
def | calculate_distance_matrix (sample, metric=distance_metric(type_metric.EUCLIDEAN)) |
Calculates distance matrix for data sample (sequence of points) using specified metric (by default Euclidean distance). More... | |
def | read_image (filename) |
Returns image as N-dimension (depends on the input image) matrix, where one element of list describes pixel. More... | |
def | rgb2gray (image_rgb_array) |
Returns image as 1-dimension (gray colored) matrix, where one element of list describes pixel. More... | |
def | stretch_pattern (image_source) |
Returns stretched content as 1-dimension (gray colored) matrix with size of input image. More... | |
def | gray_pattern_borders (image) |
Returns coordinates of gray image content on the input image. More... | |
def | average_neighbor_distance (points, num_neigh) |
Returns average distance for establish links between specified number of nearest neighbors. More... | |
def | medoid (data, indexes=None, **kwargs) |
Calculate medoid for input points. More... | |
def | euclidean_distance (a, b) |
Calculate Euclidean distance between vector a and b. More... | |
def | euclidean_distance_square (a, b) |
Calculate square Euclidian distance between vector a and b. More... | |
def | manhattan_distance (a, b) |
Calculate Manhattan distance between vector a and b. More... | |
def | average_inter_cluster_distance (cluster1, cluster2, data=None) |
Calculates average inter-cluster distance between two clusters. More... | |
def | average_intra_cluster_distance (cluster1, cluster2, data=None) |
Calculates average intra-cluster distance between two clusters. More... | |
def | variance_increase_distance (cluster1, cluster2, data=None) |
Calculates variance increase distance between two clusters. More... | |
def | calculate_ellipse_description (covariance, scale=2.0) |
Calculates description of ellipse using covariance matrix. More... | |
def | data_corners (data, data_filter=None) |
Finds maximum and minimum corner in each dimension of the specified data. More... | |
def | norm_vector (vector) |
Calculates norm of an input vector that is known as a vector length. More... | |
def | heaviside (value) |
Calculates Heaviside function that represents step function. More... | |
def | timedcall (executable_function, *args, **kwargs) |
Executes specified method or function with measuring of execution time. More... | |
def | extract_number_oscillations (osc_dyn, index=0, amplitude_threshold=1.0) |
Extracts number of oscillations of specified oscillator. More... | |
def | allocate_sync_ensembles (dynamic, tolerance=0.1, threshold=1.0, ignore=None) |
Allocate clusters in line with ensembles of synchronous oscillators where each synchronous ensemble corresponds to only one cluster. More... | |
def | draw_clusters (data, clusters, noise=[], marker_descr='.', hide_axes=False, axes=None, display_result=True) |
Displays clusters for data in 2D or 3D. More... | |
def | draw_dynamics (t, dyn, x_title=None, y_title=None, x_lim=None, y_lim=None, x_labels=True, y_labels=True, separate=False, axes=None) |
Draw dynamics of neurons (oscillators) in the network. More... | |
def | set_ax_param (ax, x_title=None, y_title=None, x_lim=None, y_lim=None, x_labels=True, y_labels=True, grid=True) |
Sets parameters for matplotlib ax. More... | |
def | draw_dynamics_set (dynamics, xtitle=None, ytitle=None, xlim=None, ylim=None, xlabels=False, ylabels=False) |
Draw lists of dynamics of neurons (oscillators) in the network. More... | |
def | draw_image_color_segments (source, clusters, hide_axes=True) |
Shows image segments using colored image. More... | |
def | draw_image_mask_segments (source, clusters, hide_axes=True) |
Shows image segments using black masks. More... | |
def | find_left_element (sorted_data, right, comparator) |
Returns the element's index at the left side from the right border with the same value as the last element in the range sorted_data . More... | |
def | linear_sum (list_vector) |
Calculates linear sum of vector that is represented by list, each element can be represented by list - multidimensional elements. More... | |
def | square_sum (list_vector) |
Calculates square sum of vector that is represented by list, each element can be represented by list - multidimensional elements. More... | |
def | list_math_subtraction (a, b) |
Calculates subtraction of two lists. More... | |
def | list_math_substraction_number (a, b) |
Calculates subtraction between list and number. More... | |
def | list_math_addition (a, b) |
Addition of two lists. More... | |
def | list_math_addition_number (a, b) |
Addition between list and number. More... | |
def | list_math_division_number (a, b) |
Division between list and number. More... | |
def | list_math_division (a, b) |
Division of two lists. More... | |
def | list_math_multiplication_number (a, b) |
Multiplication between list and number. More... | |
def | list_math_multiplication (a, b) |
Multiplication of two lists. More... | |
Variables | |
float | pi = 3.1415926535 |
The number \(pi\) is a mathematical constant, the ratio of a circle's circumference to its diameter. | |
Utils that are used by modules of pyclustering.
def pyclustering.utils.allocate_sync_ensembles | ( | dynamic, | |
tolerance = 0.1 , |
|||
threshold = 1.0 , |
|||
ignore = None |
|||
) |
Allocate clusters in line with ensembles of synchronous oscillators where each synchronous ensemble corresponds to only one cluster.
[in] | dynamic | (dynamic): Dynamic of each oscillator. |
[in] | tolerance | (double): Maximum error for allocation of synchronous ensemble oscillators. |
[in] | threshold | (double): Amlitude trigger when spike is taken into account. |
[in] | ignore | (bool): Set of indexes that shouldn't be taken into account. |
Definition at line 631 of file __init__.py.
Referenced by pyclustering.nnet.fsync.fsync_dynamic.allocate_sync_ensembles().
def pyclustering.utils.average_inter_cluster_distance | ( | cluster1, | |
cluster2, | |||
data = None |
|||
) |
Calculates average inter-cluster distance between two clusters.
Clusters can be represented by list of coordinates (in this case data shouldn't be specified), or by list of indexes of points from the data (represented by list of points), in this case data should be specified.
[in] | cluster1 | (list): The first cluster where each element can represent index from the data or object itself. |
[in] | cluster2 | (list): The second cluster where each element can represent index from the data or object itself. |
[in] | data | (list): If specified than elements of clusters will be used as indexes, otherwise elements of cluster will be considered as points. |
Definition at line 331 of file __init__.py.
def pyclustering.utils.average_intra_cluster_distance | ( | cluster1, | |
cluster2, | |||
data = None |
|||
) |
Calculates average intra-cluster distance between two clusters.
Clusters can be represented by list of coordinates (in this case data shouldn't be specified), or by list of indexes of points from the data (represented by list of points), in this case data should be specified.
[in] | cluster1 | (list): The first cluster. |
[in] | cluster2 | (list): The second cluster. |
[in] | data | (list): If specified than elements of clusters will be used as indexes, otherwise elements of cluster will be considered as points. |
Definition at line 362 of file __init__.py.
def pyclustering.utils.average_neighbor_distance | ( | points, | |
num_neigh | |||
) |
Returns average distance for establish links between specified number of nearest neighbors.
[in] | points | (list): Input data, list of points where each point represented by list. |
[in] | num_neigh | (uint): Number of neighbors that should be used for distance calculation. |
Definition at line 180 of file __init__.py.
def pyclustering.utils.calculate_distance_matrix | ( | sample, | |
metric = distance_metric(type_metric.EUCLIDEAN) |
|||
) |
Calculates distance matrix for data sample (sequence of points) using specified metric (by default Euclidean distance).
[in] | sample | (array_like): Data points that are used for distance calculation. |
[in] | metric | (distance_metric): Metric that is used for distance calculation between two points. |
Definition at line 54 of file __init__.py.
Referenced by pyclustering.utils.sampling.reservoir_x().
def pyclustering.utils.calculate_ellipse_description | ( | covariance, | |
scale = 2.0 |
|||
) |
Calculates description of ellipse using covariance matrix.
[in] | covariance | (numpy.array): Covariance matrix for which ellipse area should be calculated. |
[in] | scale | (float): Scale of the ellipse. |
Definition at line 482 of file __init__.py.
def pyclustering.utils.data_corners | ( | data, | |
data_filter = None |
|||
) |
Finds maximum and minimum corner in each dimension of the specified data.
[in] | data | (list): List of points that should be analysed. |
[in] | data_filter | (list): List of indexes of the data that should be analysed, if it is 'None' then whole 'data' is analysed to obtain corners. |
Definition at line 506 of file __init__.py.
def pyclustering.utils.draw_clusters | ( | data, | |
clusters, | |||
noise = [] , |
|||
marker_descr = '.' , |
|||
hide_axes = False , |
|||
axes = None , |
|||
display_result = True |
|||
) |
Displays clusters for data in 2D or 3D.
[in] | data | (list): Points that are described by coordinates represented. |
[in] | clusters | (list): Clusters that are represented by lists of indexes where each index corresponds to point in data. |
[in] | noise | (list): Points that are regarded to noise. |
[in] | marker_descr | (string): Marker for displaying points. |
[in] | hide_axes | (bool): If True - axes is not displayed. |
[in] | axes | (ax) Matplotlib axes where clusters should be drawn, if it is not specified (None) then new plot will be created. |
[in] | display_result | (bool): If specified then matplotlib axes will be used for drawing and plot will not be shown. |
Definition at line 727 of file __init__.py.
Referenced by pyclustering.utils.sampling.reservoir_x().
def pyclustering.utils.draw_dynamics | ( | t, | |
dyn, | |||
x_title = None , |
|||
y_title = None , |
|||
x_lim = None , |
|||
y_lim = None , |
|||
x_labels = True , |
|||
y_labels = True , |
|||
separate = False , |
|||
axes = None |
|||
) |
Draw dynamics of neurons (oscillators) in the network.
It draws if matplotlib is not specified (None), othewise it should be performed manually.
[in] | t | (list): Values of time (used by x axis). |
[in] | dyn | (list): Values of output of oscillators (used by y axis). |
[in] | x_title | (string): Title for Y. |
[in] | y_title | (string): Title for X. |
[in] | x_lim | (double): X limit. |
[in] | y_lim | (double): Y limit. |
[in] | x_labels | (bool): If True - shows X labels. |
[in] | y_labels | (bool): If True - shows Y labels. |
[in] | separate | (list): Consists of lists of oscillators where each such list consists of oscillator indexes that will be shown on separated stage. |
[in] | axes | (ax): If specified then matplotlib axes will be used for drawing and plot will not be shown. |
Definition at line 829 of file __init__.py.
Referenced by pyclustering.utils.draw_dynamics_set(), pyclustering.utils.sampling.reservoir_x(), and pyclustering.nnet.fsync.fsync_visualizer.show_output_dynamic().
def pyclustering.utils.draw_dynamics_set | ( | dynamics, | |
xtitle = None , |
|||
ytitle = None , |
|||
xlim = None , |
|||
ylim = None , |
|||
xlabels = False , |
|||
ylabels = False |
|||
) |
Draw lists of dynamics of neurons (oscillators) in the network.
[in] | dynamics | (list): List of network outputs that are represented by values of output of oscillators (used by y axis). |
[in] | xtitle | (string): Title for Y. |
[in] | ytitle | (string): Title for X. |
[in] | xlim | (double): X limit. |
[in] | ylim | (double): Y limit. |
[in] | xlabels | (bool): If True - shows X labels. |
[in] | ylabels | (bool): If True - shows Y labels. |
Definition at line 957 of file __init__.py.
Referenced by pyclustering.nnet.fsync.fsync_visualizer.show_output_dynamics().
def pyclustering.utils.draw_image_color_segments | ( | source, | |
clusters, | |||
hide_axes = True |
|||
) |
Shows image segments using colored image.
Each color on result image represents allocated segment. The first image is initial and other is result of segmentation.
[in] | source | (string): Path to image. |
[in] | clusters | (list): List of clusters (allocated segments of image) where each cluster consists of indexes of pixel from source image. |
[in] | hide_axes | (bool): If True then axes will not be displayed. |
Definition at line 1002 of file __init__.py.
def pyclustering.utils.draw_image_mask_segments | ( | source, | |
clusters, | |||
hide_axes = True |
|||
) |
Shows image segments using black masks.
Each black mask of allocated segment is presented on separate plot. The first image is initial and others are black masks of segments.
[in] | source | (string): Path to image. |
[in] | clusters | (list): List of clusters (allocated segments of image) where each cluster consists of indexes of pixel from source image. |
[in] | hide_axes | (bool): If True then axes will not be displayed. |
Definition at line 1054 of file __init__.py.
Referenced by pyclustering.utils.sampling.reservoir_x().
def pyclustering.utils.euclidean_distance | ( | a, | |
b | |||
) |
Calculate Euclidean distance between vector a and b.
The Euclidean between vectors (points) a and b is calculated by following formula:
\[ dist(a, b) = \sqrt{ \sum_{i=0}^{N}(b_{i} - a_{i})^{2}) }; \]
Where N is a length of each vector.
[in] | a | (list): The first vector. |
[in] | b | (list): The second vector. |
Definition at line 263 of file __init__.py.
Referenced by pyclustering.utils.average_neighbor_distance().
def pyclustering.utils.euclidean_distance_square | ( | a, | |
b | |||
) |
Calculate square Euclidian distance between vector a and b.
[in] | a | (list): The first vector. |
[in] | b | (list): The second vector. |
Definition at line 287 of file __init__.py.
Referenced by pyclustering.utils.average_inter_cluster_distance(), pyclustering.utils.average_intra_cluster_distance(), pyclustering.utils.euclidean_distance(), and pyclustering.utils.variance_increase_distance().
def pyclustering.utils.extract_number_oscillations | ( | osc_dyn, | |
index = 0 , |
|||
amplitude_threshold = 1.0 |
|||
) |
Extracts number of oscillations of specified oscillator.
[in] | osc_dyn | (list): Dynamic of oscillators. |
[in] | index | (uint): Index of oscillator in dynamic. |
[in] | amplitude_threshold | (double): Amplitude threshold when oscillation is taken into account, for example, when oscillator amplitude is greater than threshold then oscillation is incremented. |
Definition at line 592 of file __init__.py.
Referenced by pyclustering.nnet.fsync.fsync_dynamic.extract_number_oscillations().
def pyclustering.utils.find_left_element | ( | sorted_data, | |
right, | |||
comparator | |||
) |
Returns the element's index at the left side from the right border with the same value as the last element in the range sorted_data
.
The element at the right is considered as target to search. sorted_data
must be sorted collection. The complexity of the algorithm is O(log(n))
. The algorithm is based on the binary search algorithm.
[in] | sorted_data | input data to find the element. |
[in] | right | the index of the right element from that search is started. |
[in] | comparator | comparison function object which returns True if the first argument is less than the second. |
sorted_data
. Definition at line 1132 of file __init__.py.
def pyclustering.utils.gray_pattern_borders | ( | image | ) |
Returns coordinates of gray image content on the input image.
[in] | image | (Image): PIL Image instance that is processed. |
Definition at line 138 of file __init__.py.
Referenced by pyclustering.utils.stretch_pattern().
def pyclustering.utils.heaviside | ( | value | ) |
Calculates Heaviside function that represents step function.
If input value is greater than 0 then returns 1, otherwise returns 0.
[in] | value | (double): Argument of Heaviside function. |
Definition at line 557 of file __init__.py.
def pyclustering.utils.linear_sum | ( | list_vector | ) |
Calculates linear sum of vector that is represented by list, each element can be represented by list - multidimensional elements.
[in] | list_vector | (list): Input vector. |
Definition at line 1169 of file __init__.py.
def pyclustering.utils.list_math_addition | ( | a, | |
b | |||
) |
Addition of two lists.
Each element from list 'a' is added to element from list 'b' accordingly.
[in] | a | (list): List of elements that supports mathematic addition.. |
[in] | b | (list): List of elements that supports mathematic addition.. |
Definition at line 1246 of file __init__.py.
Referenced by pyclustering.utils.variance_increase_distance().
def pyclustering.utils.list_math_addition_number | ( | a, | |
b | |||
) |
Addition between list and number.
Each element from list 'a' is added to number 'b'.
[in] | a | (list): List of elements that supports mathematic addition. |
[in] | b | (double): Value that supports mathematic addition. |
Definition at line 1260 of file __init__.py.
def pyclustering.utils.list_math_division | ( | a, | |
b | |||
) |
Division of two lists.
Each element from list 'a' is divided by element from list 'b' accordingly.
[in] | a | (list): List of elements that supports mathematic division. |
[in] | b | (list): List of elements that supports mathematic division. |
Definition at line 1288 of file __init__.py.
def pyclustering.utils.list_math_division_number | ( | a, | |
b | |||
) |
Division between list and number.
Each element from list 'a' is divided by number 'b'.
[in] | a | (list): List of elements that supports mathematic division. |
[in] | b | (double): Value that supports mathematic division. |
Definition at line 1274 of file __init__.py.
Referenced by pyclustering.utils.variance_increase_distance().
def pyclustering.utils.list_math_multiplication | ( | a, | |
b | |||
) |
Multiplication of two lists.
Each element from list 'a' is multiplied by element from list 'b' accordingly.
[in] | a | (list): List of elements that supports mathematic multiplication. |
[in] | b | (list): List of elements that supports mathematic multiplication. |
Definition at line 1316 of file __init__.py.
Referenced by pyclustering.utils.square_sum().
def pyclustering.utils.list_math_multiplication_number | ( | a, | |
b | |||
) |
Multiplication between list and number.
Each element from list 'a' is multiplied by number 'b'.
[in] | a | (list): List of elements that supports mathematic division. |
[in] | b | (double): Number that supports mathematic division. |
Definition at line 1302 of file __init__.py.
def pyclustering.utils.list_math_substraction_number | ( | a, | |
b | |||
) |
Calculates subtraction between list and number.
Each element from list 'a' is subtracted by number 'b'.
[in] | a | (list): List of elements that supports mathematical subtraction. |
[in] | b | (list): Value that supports mathematical subtraction. |
Definition at line 1232 of file __init__.py.
def pyclustering.utils.list_math_subtraction | ( | a, | |
b | |||
) |
Calculates subtraction of two lists.
Each element from list 'a' is subtracted by element from list 'b' accordingly.
[in] | a | (list): List of elements that supports mathematical subtraction. |
[in] | b | (list): List of elements that supports mathematical subtraction. |
Definition at line 1218 of file __init__.py.
def pyclustering.utils.manhattan_distance | ( | a, | |
b | |||
) |
Calculate Manhattan distance between vector a and b.
[in] | a | (list): The first cluster. |
[in] | b | (list): The second cluster. |
Definition at line 308 of file __init__.py.
def pyclustering.utils.medoid | ( | data, | |
indexes = None , |
|||
** | kwargs | ||
) |
Calculate medoid for input points.
[in] | data | (list): Set of points for that median should be calculated. |
[in] | indexes | (list): Indexes of input set of points that will be taken into account during median calculation. |
[in] | **kwargs | Arbitrary keyword arguments (available arguments: 'metric', 'data_type'). |
Keyword Args:
Definition at line 213 of file __init__.py.
def pyclustering.utils.norm_vector | ( | vector | ) |
Calculates norm of an input vector that is known as a vector length.
[in] | vector | (list): The input vector whose length is calculated. |
Definition at line 538 of file __init__.py.
def pyclustering.utils.read_image | ( | filename | ) |
Returns image as N-dimension (depends on the input image) matrix, where one element of list describes pixel.
[in] | filename | (string): Path to image. |
Definition at line 69 of file __init__.py.
Referenced by pyclustering.utils.sampling.reservoir_x().
def pyclustering.utils.read_sample | ( | filename | ) |
Returns data sample from simple text file.
This function should be used for text file with following format:
[in] | filename | (string): Path to file with data. |
Definition at line 30 of file __init__.py.
Referenced by pyclustering.utils.sampling.reservoir_x().
def pyclustering.utils.rgb2gray | ( | image_rgb_array | ) |
Returns image as 1-dimension (gray colored) matrix, where one element of list describes pixel.
Luma coding is used for transformation and that is calculated directly from gamma-compressed primary intensities as a weighted sum:
\[Y = 0.2989R + 0.587G + 0.114B\]
[in] | image_rgb_array | (list): Image represented by RGB list. |
Definition at line 84 of file __init__.py.
Referenced by pyclustering.utils.sampling.reservoir_x(), and pyclustering.utils.stretch_pattern().
def pyclustering.utils.set_ax_param | ( | ax, | |
x_title = None , |
|||
y_title = None , |
|||
x_lim = None , |
|||
y_lim = None , |
|||
x_labels = True , |
|||
y_labels = True , |
|||
grid = True |
|||
) |
Sets parameters for matplotlib ax.
[in] | ax | (Axes): Axes for which parameters should applied. |
[in] | x_title | (string): Title for Y. |
[in] | y_title | (string): Title for X. |
[in] | x_lim | (double): X limit. |
[in] | y_lim | (double): Y limit. |
[in] | x_labels | (bool): If True - shows X labels. |
[in] | y_labels | (bool): If True - shows Y labels. |
[in] | grid | (bool): If True - shows grid. |
Definition at line 913 of file __init__.py.
Referenced by pyclustering.utils.draw_dynamics().
def pyclustering.utils.square_sum | ( | list_vector | ) |
Calculates square sum of vector that is represented by list, each element can be represented by list - multidimensional elements.
[in] | list_vector | (list): Input vector. |
Definition at line 1196 of file __init__.py.
def pyclustering.utils.stretch_pattern | ( | image_source | ) |
Returns stretched content as 1-dimension (gray colored) matrix with size of input image.
[in] | image_source | (Image): PIL Image instance. |
Definition at line 113 of file __init__.py.
def pyclustering.utils.timedcall | ( | executable_function, | |
* | args, | ||
** | kwargs | ||
) |
Executes specified method or function with measuring of execution time.
[in] | executable_function | (pointer): Pointer to a function or method that should be called. |
[in] | *args | Arguments of the called function or method. |
[in] | **kwargs | Arbitrary keyword arguments of the called function or method. |
Definition at line 573 of file __init__.py.
def pyclustering.utils.variance_increase_distance | ( | cluster1, | |
cluster2, | |||
data = None |
|||
) |
Calculates variance increase distance between two clusters.
Clusters can be represented by list of coordinates (in this case data shouldn't be specified), or by list of indexes of points from the data (represented by list of points), in this case data should be specified.
[in] | cluster1 | (list): The first cluster. |
[in] | cluster2 | (list): The second cluster. |
[in] | data | (list): If specified than elements of clusters will be used as indexes, otherwise elements of cluster will be considered as points. |
Definition at line 413 of file __init__.py.