3 @brief Cluster analysis algorithm: Fuzzy C-Means 4 @details Implementation based on paper @cite book::pattern_recognition_with_fuzzy. 6 @authors Andrei Novikov (pyclustering@yandex.ru) 8 @copyright GNU Public License 10 @cond GNU_PUBLIC_LICENSE 11 PyClustering is free software: you can redistribute it and/or modify 12 it under the terms of the GNU General Public License as published by 13 the Free Software Foundation, either version 3 of the License, or 14 (at your option) any later version. 16 PyClustering is distributed in the hope that it will be useful, 17 but WITHOUT ANY WARRANTY; without even the implied warranty of 18 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 19 GNU General Public License for more details. 21 You should have received a copy of the GNU General Public License 22 along with this program. If not, see <http://www.gnu.org/licenses/>. 30 import pyclustering.core.fcm_wrapper
as wrapper
32 from pyclustering.core.wrapper
import ccore_library
37 @brief Class represents Fuzzy C-means (FCM) clustering algorithm. 38 @details Fuzzy clustering is a form of clustering in which each data point can belong to more than one cluster. 40 Fuzzy C-Means algorithm uses two general formulas for cluster analysis. The first is to updated membership of each 42 \f[w_{ij}=\frac{1}{\sum_{k=0}^{c}\left ( \frac{\left \| x_{i}-c_{j} \right \|}{\left \| x_{i}-c_{k} \right \|} \right )^{\frac{2}{m-1}}}\f] 44 The second formula is used to update centers in line with obtained centers: 45 \f[c_{k}=\frac{\sum_{i=0}^{N}w_{k}\left ( x_{i} \right )^{m}x_{i}}{\sum_{i=0}^{N}w_{k}\left ( x_{i} \right )^{m}}\f] 47 Fuzzy C-Means clustering results depend on initial centers. Algorithm K-Means++ can used for center initialization 48 from module 'pyclustering.cluster.center_initializer'. 50 CCORE implementation of the algorithm uses thread pool to parallelize the clustering process. 52 Here is an example how to perform cluster analysis using Fuzzy C-Means algorithm: 54 from pyclustering.samples.definitions import FAMOUS_SAMPLES 55 from pyclustering.cluster import cluster_visualizer 56 from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer 57 from pyclustering.cluster.fcm import fcm 58 from pyclustering.utils import read_sample 60 # load list of points for cluster analysis 61 sample = read_sample(FAMOUS_SAMPLES.SAMPLE_OLD_FAITHFUL) 64 initial_centers = kmeans_plusplus_initializer(sample, 2, kmeans_plusplus_initializer.FARTHEST_CENTER_CANDIDATE).initialize() 66 # create instance of Fuzzy C-Means algorithm 67 fcm_instance = fcm(sample, initial_centers) 69 # run cluster analysis and obtain results 70 fcm_instance.process() 71 clusters = fcm_instance.get_clusters() 72 centers = fcm_instance.get_centers() 74 # visualize clustering results 75 visualizer = cluster_visualizer() 76 visualizer.append_clusters(clusters, sample) 77 visualizer.append_cluster(centers, marker='*', markersize=10) 81 The next example shows how to perform image segmentation using Fuzzy C-Means algorithm: 83 from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer 84 from pyclustering.cluster.fcm import fcm 85 from pyclustering.utils import read_image, draw_image_mask_segments 87 # load list of points for cluster analysis 88 data = read_image("stpetersburg_admiral.jpg") 91 initial_centers = kmeans_plusplus_initializer(data, 3, kmeans_plusplus_initializer.FARTHEST_CENTER_CANDIDATE).initialize() 93 # create instance of Fuzzy C-Means algorithm 94 fcm_instance = fcm(data, initial_centers) 96 # run cluster analysis and obtain results 97 fcm_instance.process() 98 clusters = fcm_instance.get_clusters() 100 # visualize segmentation results 101 draw_image_mask_segments("stpetersburg_admiral.jpg", clusters) 104 @image html fcm_segmentation_stpetersburg.png "Image segmentation using Fuzzy C-Means algorithm." 108 def __init__(self, data, initial_centers, **kwargs):
110 @brief Initialize Fuzzy C-Means algorithm. 112 @param[in] data (array_like): Input data that is presented as array of points (objects), each point should be represented by array_like data structure. 113 @param[in] initial_centers (array_like): Initial coordinates of centers of clusters that are represented by array_like data structure: [center1, center2, ...]. 114 @param[in] **kwargs: Arbitrary keyword arguments (available arguments: 'tolerance', 'itermax', 'm'). 116 <b>Keyword Args:</b><br> 117 - ccore (bool): Defines should be CCORE library (C++ pyclustering library) used instead of Python code or not. 118 - tolerance (float): Stop condition: if maximum value of change of centers of clusters is less than tolerance then algorithm stops processing. 119 - itermax (uint): Maximum number of iterations that is used for clustering process (by default: 200). 120 - m (float): Hyper-parameter that controls how fuzzy the cluster will be. The higher it is, the fuzzier the cluster will be in the end. 121 This parameter should be greater than 1 (by default: 2). 131 self.
__itermax = kwargs.get(
'itermax', 200)
132 self.
__m = kwargs.get(
'm', 2)
136 self.
__ccore = kwargs.get(
'ccore',
True)
138 self.
__ccore = ccore_library.workable()
145 @brief Performs cluster analysis in line with Fuzzy C-Means algorithm. 147 @return (fcm) Returns itself (Fuzzy C-Means instance). 151 @see get_membership() 164 @brief Returns allocated clusters that consists of points that most likely (in line with membership) belong to 167 @remark Allocated clusters can be returned only after data processing (use method process()). Otherwise empty list is returned. 169 @return (list) List of allocated clusters, each cluster contains indexes from input data. 173 @see get_membership() 181 @brief Returns list of centers of allocated clusters. 183 @return (array_like) Cluster centers. 187 @see get_membership() 195 @brief Returns cluster membership (probability) for each point in data. 197 @return (array_like) Membership for each point in format [[Px1(c1), Px1(c2), ...], [Px2(c1), Px2(c2), ...], ...], 198 where [Px1(c1), Px1(c2), ...] membership for point x1. 208 def __process_by_ccore(self):
210 @brief Performs cluster analysis using C/C++ implementation. 215 self.
__clusters = result[wrapper.fcm_package_indexer.INDEX_CLUSTERS]
216 self.
__centers = result[wrapper.fcm_package_indexer.INDEX_CENTERS]
217 self.
__membership = result[wrapper.fcm_package_indexer.INDEX_MEMBERSHIP]
220 def __process_by_python(self):
222 @brief Performs cluster analysis using Python implementation. 230 change = float(
'inf')
244 def __calculate_centers(self):
246 @brief Calculate center using membership of each cluster. 248 @return (list) Updated clusters as list of clusters. Each cluster contains indexes of objects from data. 250 @return (numpy.array) Updated centers. 253 dimension = self.
__data.shape[1]
254 centers = numpy.zeros((len(self.
__centers), dimension))
263 def __update_membership(self):
265 @brief Update membership for each point in line with current cluster centers. 271 data_difference[i] = numpy.sum(numpy.square(self.
__data - self.
__centers[i]), axis=1)
273 for i
in range(len(self.
__data)):
275 divider = sum([pow(data_difference[j][i] / data_difference[k][i], self.
__degree)
for k
in range(len(self.
__centers))
if data_difference[k][i] != 0.0])
283 def __calculate_changes(self, updated_centers):
285 @brief Calculate changes between centers. 287 @return (float) Maximum change between centers. 290 changes = numpy.sum(numpy.square(self.
__centers - updated_centers), axis=1).T
291 return numpy.max(changes)
294 def __extract_clusters(self):
298 for i
in range(len(belongs)):
302 def __verify_arguments(self):
304 @brief Verify input parameters for the algorithm and throw exception in case of incorrectness. 308 raise ValueError(
"Input data is empty (size: '%d')." % len(self.
__data))
311 raise ValueError(
"Initial centers are empty (size: '%d')." % len(self.
__centers))
def __update_membership(self)
Update membership for each point in line with current cluster centers.
def __extract_clusters(self)
def get_centers(self)
Returns list of centers of allocated clusters.
def process(self)
Performs cluster analysis in line with Fuzzy C-Means algorithm.
def __calculate_centers(self)
Calculate center using membership of each cluster.
def __calculate_changes(self, updated_centers)
Calculate changes between centers.
def __verify_arguments(self)
Verify input parameters for the algorithm and throw exception in case of incorrectness.
def get_clusters(self)
Returns allocated clusters that consists of points that most likely (in line with membership) belong ...
Class represents Fuzzy C-means (FCM) clustering algorithm.
def get_membership(self)
Returns cluster membership (probability) for each point in data.
def __init__(self, data, initial_centers, kwargs)
Initialize Fuzzy C-Means algorithm.
def __process_by_ccore(self)
Performs cluster analysis using C/C++ implementation.
def __process_by_python(self)
Performs cluster analysis using Python implementation.