3 @brief Cluster analysis algorithm: Fuzzy C-Means
4 @details Implementation based on paper @cite book::pattern_recognition_with_fuzzy.
6 @authors Andrei Novikov (pyclustering@yandex.ru)
8 @copyright BSD-3-Clause
15 import pyclustering.core.fcm_wrapper
as wrapper
17 from pyclustering.core.wrapper
import ccore_library
22 @brief Class represents Fuzzy C-means (FCM) clustering algorithm.
23 @details Fuzzy clustering is a form of clustering in which each data point can belong to more than one cluster.
25 Fuzzy C-Means algorithm uses two general formulas for cluster analysis. The first is to updated membership of each
27 \f[w_{ij}=\frac{1}{\sum_{k=0}^{c}\left ( \frac{\left \| x_{i}-c_{j} \right \|}{\left \| x_{i}-c_{k} \right \|} \right )^{\frac{2}{m-1}}}\f]
29 The second formula is used to update centers in line with obtained centers:
30 \f[c_{k}=\frac{\sum_{i=0}^{N}w_{k}\left ( x_{i} \right )^{m}x_{i}}{\sum_{i=0}^{N}w_{k}\left ( x_{i} \right )^{m}}\f]
32 Fuzzy C-Means clustering results depend on initial centers. Algorithm K-Means++ can used for center initialization
33 from module 'pyclustering.cluster.center_initializer'.
35 CCORE implementation of the algorithm uses thread pool to parallelize the clustering process.
37 Here is an example how to perform cluster analysis using Fuzzy C-Means algorithm:
39 from pyclustering.samples.definitions import FAMOUS_SAMPLES
40 from pyclustering.cluster import cluster_visualizer
41 from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
42 from pyclustering.cluster.fcm import fcm
43 from pyclustering.utils import read_sample
45 # load list of points for cluster analysis
46 sample = read_sample(FAMOUS_SAMPLES.SAMPLE_OLD_FAITHFUL)
49 initial_centers = kmeans_plusplus_initializer(sample, 2, kmeans_plusplus_initializer.FARTHEST_CENTER_CANDIDATE).initialize()
51 # create instance of Fuzzy C-Means algorithm
52 fcm_instance = fcm(sample, initial_centers)
54 # run cluster analysis and obtain results
55 fcm_instance.process()
56 clusters = fcm_instance.get_clusters()
57 centers = fcm_instance.get_centers()
59 # visualize clustering results
60 visualizer = cluster_visualizer()
61 visualizer.append_clusters(clusters, sample)
62 visualizer.append_cluster(centers, marker='*', markersize=10)
66 The next example shows how to perform image segmentation using Fuzzy C-Means algorithm:
68 from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
69 from pyclustering.cluster.fcm import fcm
70 from pyclustering.utils import read_image, draw_image_mask_segments
72 # load list of points for cluster analysis
73 data = read_image("stpetersburg_admiral.jpg")
76 initial_centers = kmeans_plusplus_initializer(data, 3, kmeans_plusplus_initializer.FARTHEST_CENTER_CANDIDATE).initialize()
78 # create instance of Fuzzy C-Means algorithm
79 fcm_instance = fcm(data, initial_centers)
81 # run cluster analysis and obtain results
82 fcm_instance.process()
83 clusters = fcm_instance.get_clusters()
85 # visualize segmentation results
86 draw_image_mask_segments("stpetersburg_admiral.jpg", clusters)
89 @image html fcm_segmentation_stpetersburg.png "Image segmentation using Fuzzy C-Means algorithm."
93 def __init__(self, data, initial_centers, **kwargs):
95 @brief Initialize Fuzzy C-Means algorithm.
97 @param[in] data (array_like): Input data that is presented as array of points (objects), each point should be represented by array_like data structure.
98 @param[in] initial_centers (array_like): Initial coordinates of centers of clusters that are represented by array_like data structure: [center1, center2, ...].
99 @param[in] **kwargs: Arbitrary keyword arguments (available arguments: 'tolerance', 'itermax', 'm').
101 <b>Keyword Args:</b><br>
102 - ccore (bool): Defines should be CCORE library (C++ pyclustering library) used instead of Python code or not.
103 - tolerance (float): Stop condition: if maximum value of change of centers of clusters is less than tolerance then algorithm stops processing.
104 - itermax (uint): Maximum number of iterations that is used for clustering process (by default: 200).
105 - m (float): Hyper-parameter that controls how fuzzy the cluster will be. The higher it is, the fuzzier the cluster will be in the end.
106 This parameter should be greater than 1 (by default: 2).
116 self.
__itermax = kwargs.get(
'itermax', 200)
117 self.
__m = kwargs.get(
'm', 2)
121 self.
__ccore = kwargs.get(
'ccore',
True)
123 self.
__ccore = ccore_library.workable()
130 @brief Performs cluster analysis in line with Fuzzy C-Means algorithm.
132 @return (fcm) Returns itself (Fuzzy C-Means instance).
136 @see get_membership()
149 @brief Returns allocated clusters that consists of points that most likely (in line with membership) belong to
152 @remark Allocated clusters can be returned only after data processing (use method process()). Otherwise empty list is returned.
154 @return (list) List of allocated clusters, each cluster contains indexes from input data.
158 @see get_membership()
166 @brief Returns list of centers of allocated clusters.
168 @return (array_like) Cluster centers.
172 @see get_membership()
180 @brief Returns cluster membership (probability) for each point in data.
182 @return (array_like) Membership for each point in format [[Px1(c1), Px1(c2), ...], [Px2(c1), Px2(c2), ...], ...],
183 where [Px1(c1), Px1(c2), ...] membership for point x1.
193 def __process_by_ccore(self):
195 @brief Performs cluster analysis using C/C++ implementation.
200 self.
__clusters = result[wrapper.fcm_package_indexer.INDEX_CLUSTERS]
201 self.
__centers = result[wrapper.fcm_package_indexer.INDEX_CENTERS]
202 self.
__membership = result[wrapper.fcm_package_indexer.INDEX_MEMBERSHIP]
205 def __process_by_python(self):
207 @brief Performs cluster analysis using Python implementation.
215 change = float(
'inf')
229 def __calculate_centers(self):
231 @brief Calculate center using membership of each cluster.
233 @return (list) Updated clusters as list of clusters. Each cluster contains indexes of objects from data.
235 @return (numpy.array) Updated centers.
238 dimension = self.
__data.shape[1]
239 centers = numpy.zeros((len(self.
__centers), dimension))
248 def __update_membership(self):
250 @brief Update membership for each point in line with current cluster centers.
256 data_difference[i] = numpy.sum(numpy.square(self.
__data - self.
__centers[i]), axis=1)
258 for i
in range(len(self.
__data)):
260 divider = sum([pow(data_difference[j][i] / data_difference[k][i], self.
__degree)
for k
in range(len(self.
__centers))
if data_difference[k][i] != 0.0])
268 def __calculate_changes(self, updated_centers):
270 @brief Calculate changes between centers.
272 @return (float) Maximum change between centers.
275 changes = numpy.sum(numpy.square(self.
__centers - updated_centers), axis=1).T
276 return numpy.max(changes)
279 def __extract_clusters(self):
283 for i
in range(len(belongs)):
287 def __verify_arguments(self):
289 @brief Verify input parameters for the algorithm and throw exception in case of incorrectness.
293 raise ValueError(
"Input data is empty (size: '%d')." % len(self.
__data))
296 raise ValueError(
"Initial centers are empty (size: '%d')." % len(self.
__centers))