3 @brief Cluster analysis algorithm: Genetic clustering algorithm (GA). 4 @details Implementation based on papers @cite article::ga::1, @cite article::ga::2. 6 @authors Andrei Novikov, Aleksey Kukushkin (pyclustering@yandex.ru) 8 @copyright GNU Public License 10 @cond GNU_PUBLIC_LICENSE 11 PyClustering is free software: you can redistribute it and/or modify 12 it under the terms of the GNU General Public License as published by 13 the Free Software Foundation, either version 3 of the License, or 14 (at your option) any later version. 16 PyClustering is distributed in the hope that it will be useful, 17 but WITHOUT ANY WARRANTY; without even the implied warranty of 18 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 19 GNU General Public License for more details. 21 You should have received a copy of the GNU General Public License 22 along with this program. If not, see <http://www.gnu.org/licenses/>. 33 import matplotlib.pyplot
as plt
34 import matplotlib.animation
as animation
35 except Exception
as error_instance:
36 warnings.warn(
"Impossible to import matplotlib (please, install 'matplotlib'), pyclustering's visualization " 37 "functionality is not available (details: '%s')." % str(error_instance))
40 from pyclustering.cluster.ga_maths
import ga_math
46 @brief Genetic algorithm observer that is used to collect information about clustering process on each iteration. 50 def __init__(self, need_global_best=False, need_population_best=False, need_mean_ff=False):
52 @brief Constructs genetic algorithm observer to collect specific information. 54 @param[in] need_global_best (bool): If 'True' then the best chromosomes and its fitness function value (global optimum) for each iteration are stored. 55 @param[in] need_population_best (bool): If 'True' then current (on each iteration) best chromosomes and its fitness function value (local optimum) are stored. 56 @param[in] need_mean_ff (bool): If 'True' then average value of fitness function on each iteration is stored. 77 @brief Returns amount of iterations that genetic algorithm was observed. 84 return max(global_length, local_length, average_length)
89 @brief Stores the best chromosome and its fitness function's value. 91 @param[in] best_chromosome (list): The best chromosome that were observed. 92 @param[in] best_fitness_function (float): Fitness function value of the best chromosome. 105 @brief Stores the best chromosome for current specific iteration and its fitness function's value. 107 @param[in] best_chromosome (list): The best chromosome on specific iteration. 108 @param[in] best_fitness_function (float): Fitness function value of the chromosome. 121 @brief Stores average value of fitness function among chromosomes on specific iteration. 123 @param[in] fitness_functions (float): Average value of fitness functions among chromosomes. 135 @return (dict) Returns dictionary with keys 'chromosome' and 'fitness_function' where evolution of the best chromosome 136 and its fitness function's value (evolution of global optimum) are stored in lists. 144 @brief (dict) Returns dictionary with keys 'chromosome' and 'fitness_function' where evolution of the current best chromosome 145 and its fitness function's value (evolution of local optimum) are stored in lists. 153 @brief (list) Returns fitness function's values on each iteration. 162 @brief Genetic algorithm visualizer is used to show clustering results that are specific for 163 this particular algorithm: clusters, evolution of global and local optimum. 164 @details The visualizer requires 'ga_observer' that collects evolution of clustering process in 165 genetic algorithm. The observer is created by user and passed to genetic algorithm. There 166 is usage example of the visualizer using the observer: 168 from pyclustering.cluster.ga import genetic_algorithm, ga_observer, ga_visualizer 169 from pyclustering.utils import read_sample 170 from pyclustering.samples.definitions import SIMPLE_SAMPLES 173 # Read data for clustering 174 sample = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE1) 176 # Create instance of observer that will collect all information: 177 observer_instance = ga_observer(True, True, True) 179 # Create genetic algorithm where observer will collect information: 180 ga_instance = genetic_algorithm(data=sample, 184 count_mutation_gens=1, 185 observer=observer_instance) 188 ga_instance.process() 191 clusters = ga_instance.get_clusters() 193 # Print cluster to console 194 print("Amount of clusters: '%d'. Clusters: '%s'" % (len(clusters), clusters)) 196 # Show cluster using observer: 197 ga_visualizer.show_clusters(sample, observer_instance) 200 @see cluster_visualizer 205 def show_evolution(observer, start_iteration = 0, stop_iteration=None, ax=None, display=True):
207 @brief Displays evolution of fitness function for the best chromosome, for the current best chromosome and 208 average value among all chromosomes. 210 @param[in] observer (ga_observer): Genetic algorithm observer that was used for collecting evolution in the algorithm and 211 where whole required information for visualization is stored. 212 @param[in] start_iteration (uint): Iteration from that evolution should be shown. 213 @param[in] stop_iteration (uint): Iteration after that evolution shouldn't be shown. 214 @param[in] ax (Axes): Canvas where evolution should be displayed. 215 @param[in] display (bool): If 'True' then visualization of the evolution will be shown by the function. 216 This argument should be 'False' if you want to add something else to the canvas and display it later. 218 @return (Axis) Canvas where evolution was shown. 223 _, ax = plt.subplots(1)
224 ax.set_title(
"Evolution")
226 if stop_iteration
is None:
227 stop_iteration = len(observer)
229 line_best, = ax.plot(observer.get_global_best()[
'fitness_function'][start_iteration:stop_iteration],
'r') 230 line_current, = ax.plot(observer.get_population_best()['fitness_function'][start_iteration:stop_iteration],
'k')
231 line_mean, = ax.plot(observer.get_mean_fitness_function()[start_iteration:stop_iteration],
'c')
233 if start_iteration < (stop_iteration - 1):
234 ax.set_xlim([start_iteration, (stop_iteration - 1)])
236 ax.set_xlabel(
"Iteration")
237 ax.set_ylabel(
"Fitness function")
238 ax.legend([line_best, line_current, line_mean], [
"The best pop.",
"Cur. best pop.",
"Average"], prop={
'size': 10})
250 @brief Shows allocated clusters by the genetic algorithm. 252 @param[in] data (list): Input data that was used for clustering process by the algorithm. 253 @param[in] observer (ga_observer): Observer that was used for collection information about clustering process. 254 @param[in] marker (char): Type of marker that should be used for object (point) representation. 255 @param[in] markersize (uint): Size of the marker that is used for object (point) representation. 257 @note If you have clusters instead of observer then 'cluster_visualizer' can be used for visualization purposes. 259 @see cluster_visualizer 263 figure = plt.figure()
264 ax1 = figure.add_subplot(121)
266 clusters = ga_math.get_clusters_representation(observer.get_global_best()[
'chromosome'][-1])
269 visualizer.append_clusters(clusters, data, 0, marker, markersize)
270 visualizer.show(figure, display=
False)
272 ga_visualizer.show_evolution(observer, 0,
None, ax1,
True)
278 @brief Animate clustering process of genetic clustering algorithm. 279 @details This method can be also used for rendering movie of clustering process and 'ffmpeg' is required for that purpuse. 281 @param[in] data (list): Input data that was used for clustering process by the algorithm. 282 @param[in] observer (ga_observer): Observer that was used for collection information about clustering process. 283 Be sure that whole information was collected by the observer. 284 @param[in] animation_velocity (uint): Interval between frames in milliseconds (for run-time animation only). 285 @param[in] movie_fps (uint): Defines frames per second (for rendering movie only). 286 @param[in] save_movie (string): If it is specified then animation will be stored to file that is specified in this parameter. 290 figure = plt.figure()
293 return frame_generation(0)
295 def frame_generation(index_iteration):
298 figure.suptitle(
"Clustering genetic algorithm (iteration: " + str(index_iteration) +
")", fontsize=18, fontweight=
'bold')
300 visualizer =
cluster_visualizer(4, 2, [
"The best pop. on step #" + str(index_iteration),
"The best population"])
302 local_minimum_clusters = ga_math.get_clusters_representation(observer.get_population_best()[
'chromosome'][index_iteration])
303 visualizer.append_clusters(local_minimum_clusters, data, 0)
305 global_minimum_clusters = ga_math.get_clusters_representation(observer.get_global_best()[
'chromosome'][index_iteration])
306 visualizer.append_clusters(global_minimum_clusters, data, 1)
308 ax1 = plt.subplot2grid((2, 2), (1, 0), colspan=2)
309 ga_visualizer.show_evolution(observer, 0, index_iteration + 1, ax1,
False)
311 visualizer.show(figure, shift=0, display=
False)
312 figure.subplots_adjust(top=0.85)
314 return [figure.gca()]
316 iterations = len(observer)
317 cluster_animation = animation.FuncAnimation(figure, frame_generation, iterations, interval=animation_velocity, init_func=init_frame, repeat_delay=5000)
319 if save_movie
is not None:
320 cluster_animation.save(save_movie, writer=
'ffmpeg', fps=movie_fps, bitrate=1500)
327 @brief Class represents Genetic clustering algorithm. 328 @details The searching capability of genetic algorithms is exploited in order to search for appropriate 331 Example of clustering using genetic algorithm: 333 from pyclustering.cluster.ga import genetic_algorithm, ga_observer 334 from pyclustering.utils import read_sample 335 from pyclustering.samples.definitions import SIMPLE_SAMPLES 338 # Read input data for clustering 339 sample = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE4) 341 # Create instance of observer that will collect all information: 342 observer_instance = ga_observer(True, True, True) 344 # Create genetic algorithm for clustering 345 ga_instance = genetic_algorithm(data=sample, 347 chromosome_count=100, 348 population_count=200, 349 count_mutation_gens=1) 352 ga_instance.process() 355 clusters = ga_instance.get_clusters() 357 # Print cluster to console 358 print("Amount of clusters: '%d'. Clusters: '%s'" % (len(clusters), clusters)) 361 There is an example of clustering results (fitness function evolution and allocated clusters) that were 362 visualized by 'ga_visualizer': 364 @image html ga_clustering_sample_simple_04.png 371 def __init__(self, data, count_clusters, chromosome_count, population_count, **kwargs):
373 @brief Initialize genetic clustering algorithm for cluster analysis. 375 @param[in] data (numpy.array|list): Input data for clustering that is represented by two dimensional array 376 where each row is a point, for example, [[0.0, 2.1], [0.1, 2.0], [-0.2, 2.4]]. 377 @param[in] count_clusters (uint): Amount of clusters that should be allocated in the data. 378 @param[in] chromosome_count (uint): Amount of chromosomes in each population. 379 @param[in] population_count (uint): Amount of populations. 380 @param[in] **kwargs: Arbitrary keyword arguments (available arguments: 'count_mutation_gens', 381 'coeff_mutation_count', 'select_coeff', 'crossover_rate', 'observer', 'random_state'). 383 <b>Keyword Args:</b><br> 384 - count_mutation_gens (uint): Amount of genes in chromosome that is mutated on each step. 385 - coeff_mutation_count (float): Percent of chromosomes for mutation, distributed in range (0, 1] and 386 thus amount of chromosomes is defined as follows: 'chromosome_count' * 'coeff_mutation_count'. 387 - select_coeff (float): Exponential coefficient for selection procedure that is used as follows: 388 math.exp(1 + fitness(chromosome) * select_coeff). 389 - crossover_rate (float): Crossover rate. 390 - observer (ga_observer): Observer that is used for collecting information of about clustering process on each step. 391 - random_state (int): Seed for random state (by default is `None`, current system time is used). 396 np.random.seed(kwargs.get(
'random_state',
None))
399 self.
_data = np.array(data)
424 'best_fitness_function': 0.0}
434 @brief Perform clustering procedure in line with rule of genetic clustering algorithm. 444 best_chromosome, best_ff, first_fitness_functions \
449 self.
_observer.collect_global_best(best_chromosome, best_ff)
450 self.
_observer.collect_population_best(best_chromosome, best_ff)
451 self.
_observer.collect_mean(first_fitness_functions)
466 new_best_chromosome, new_best_ff, fitness_functions \
470 if new_best_ff < best_ff:
471 best_ff = new_best_ff
472 best_chromosome = new_best_chromosome
476 self.
_observer.collect_global_best(best_chromosome, best_ff)
477 self.
_observer.collect_population_best(new_best_chromosome, new_best_ff)
478 self.
_observer.collect_mean(fitness_functions)
484 return best_chromosome, best_ff
489 @brief Returns genetic algorithm observer. 497 @brief Returns list of allocated clusters, each cluster contains indexes of objects from the data. 499 @return (list) List of allocated clusters. 509 def _select(chromosomes, data, count_clusters, select_coeff):
511 @brief Performs selection procedure where new chromosomes are calculated. 513 @param[in] chromosomes (numpy.array): Chromosomes 518 centres = ga_math.get_centres(chromosomes, data, count_clusters)
521 fitness = genetic_algorithm._calc_fitness_function(centres, data, chromosomes)
523 for _idx
in range(len(fitness)):
524 fitness[_idx] = math.exp(1 + fitness[_idx] * select_coeff)
527 probabilities = ga_math.calc_probability_vector(fitness)
530 new_chromosomes = np.zeros(chromosomes.shape, dtype=np.int)
533 for _idx
in range(len(chromosomes)):
534 new_chromosomes[_idx] = chromosomes[ga_math.get_uniform(probabilities)]
536 return new_chromosomes
540 def _crossover(chromosomes):
542 @brief Crossover procedure. 547 pairs_to_crossover = np.array(range(len(chromosomes)))
550 np.random.shuffle(pairs_to_crossover)
553 offset_in_pair = int(len(pairs_to_crossover) / 2)
556 for _idx
in range(offset_in_pair):
559 crossover_mask = genetic_algorithm._get_crossover_mask(len(chromosomes[_idx]))
562 genetic_algorithm._crossover_a_pair(chromosomes[pairs_to_crossover[_idx]],
563 chromosomes[pairs_to_crossover[_idx + offset_in_pair]],
568 def _mutation(chromosomes, count_clusters, count_gen_for_mutation, coeff_mutation_count):
570 @brief Mutation procedure. 575 count_gens = len(chromosomes[0])
578 random_idx_chromosomes = np.array(range(len(chromosomes)))
579 np.random.shuffle(random_idx_chromosomes)
582 for _idx_chromosome
in range(int(len(random_idx_chromosomes) * coeff_mutation_count)):
585 for _
in range(count_gen_for_mutation):
588 gen_num = np.random.randint(count_gens)
591 chromosomes[random_idx_chromosomes[_idx_chromosome]][gen_num] = np.random.randint(count_clusters)
595 def _crossover_a_pair(chromosome_1, chromosome_2, mask):
597 @brief Crossovers a pair of chromosomes. 599 @param[in] chromosome_1 (numpy.array): The first chromosome for crossover. 600 @param[in] chromosome_2 (numpy.array): The second chromosome for crossover. 601 @param[in] mask (numpy.array): Crossover mask that defines which genes should be swapped. 605 for _idx
in range(len(chromosome_1)):
609 chromosome_1[_idx], chromosome_2[_idx] = chromosome_2[_idx], chromosome_1[_idx]
613 def _get_crossover_mask(mask_length):
615 @brief Crossover mask to crossover a pair of chromosomes. 617 @param[in] mask_length (uint): Length of the mask. 622 mask = np.zeros(mask_length)
625 mask[:int(int(mask_length) / 2)] = 1
628 np.random.shuffle(mask)
634 def _init_population(count_clusters, count_data, chromosome_count):
636 @brief Returns first population as a uniform random choice. 638 @param[in] count_clusters (uint): Amount of clusters that should be allocated. 639 @param[in] count_data (uint): Data size that is used for clustering process. 640 @param[in] chromosome_count (uint):Amount of chromosome that is used for clustering. 644 population = np.random.randint(count_clusters, size=(chromosome_count, count_data))
650 def _get_best_chromosome(chromosomes, data, count_clusters):
652 @brief Returns the current best chromosome. 654 @param[in] chromosomes (list): Chromosomes that are used for searching. 655 @param[in] data (list): Input data that is used for clustering process. 656 @param[in] count_clusters (uint): Amount of clusters that should be allocated. 658 @return (list, float, list) The best chromosome, its fitness function value and fitness function values for 664 centres = ga_math.get_centres(chromosomes, data, count_clusters)
667 fitness_functions = genetic_algorithm._calc_fitness_function(centres, data, chromosomes)
670 best_chromosome_idx = fitness_functions.argmin()
673 return chromosomes[best_chromosome_idx], fitness_functions[best_chromosome_idx], fitness_functions
677 def _calc_fitness_function(centres, data, chromosomes):
679 @brief Calculate fitness function values for chromosomes. 681 @param[in] centres (list): Cluster centers. 682 @param[in] data (list): Input data that is used for clustering process. 683 @param[in] chromosomes (list): Chromosomes whose fitness function's values are calculated. 685 @return (list) Fitness function value for each chromosome correspondingly. 690 count_chromosome = len(chromosomes)
693 fitness_function = np.zeros(count_chromosome)
696 for _idx_chromosome
in range(count_chromosome):
699 centres_data = np.zeros(data.shape)
702 for _idx
in range(len(data)):
703 centres_data[_idx] = centres[_idx_chromosome][chromosomes[_idx_chromosome][_idx]]
706 fitness_function[_idx_chromosome] += np.sum(abs(data - centres_data))
708 return fitness_function
711 def _verify_arguments(self):
713 @brief Verify input parameters for the algorithm and throw exception in case of incorrectness. 716 if len(self.
_data) == 0:
717 raise ValueError(
"Input data is empty (size: '%d')." % len(self.
_data))
720 raise ValueError(
"Amount of cluster (current value: '%d') for allocation should be greater than 0." %
Common visualizer of clusters on 1D, 2D or 3D surface.
def animate_cluster_allocation(data, observer, animation_velocity=75, movie_fps=5, save_movie=None)
Animate clustering process of genetic clustering algorithm.
pyclustering module for cluster analysis.
Genetic algorithm visualizer is used to show clustering results that are specific for this particular...
def get_global_best(self)
def __init__(self, need_global_best=False, need_population_best=False, need_mean_ff=False)
Constructs genetic algorithm observer to collect specific information.
def collect_global_best(self, best_chromosome, best_fitness_function)
Stores the best chromosome and its fitness function's value.
def collect_population_best(self, best_chromosome, best_fitness_function)
Stores the best chromosome for current specific iteration and its fitness function's value...
def process(self)
Perform clustering procedure in line with rule of genetic clustering algorithm.
def show_evolution(observer, start_iteration=0, stop_iteration=None, ax=None, display=True)
Displays evolution of fitness function for the best chromosome, for the current best chromosome and a...
def __len__(self)
Returns amount of iterations that genetic algorithm was observed.
def _crossover(chromosomes)
Crossover procedure.
Class represents Genetic clustering algorithm.
def _get_best_chromosome(chromosomes, data, count_clusters)
Returns the current best chromosome.
def _init_population(count_clusters, count_data, chromosome_count)
Returns first population as a uniform random choice.
def _select(chromosomes, data, count_clusters, select_coeff)
Performs selection procedure where new chromosomes are calculated.
def __init__(self, data, count_clusters, chromosome_count, population_count, kwargs)
Initialize genetic clustering algorithm for cluster analysis.
Genetic algorithm observer that is used to collect information about clustering process on each itera...
def collect_mean(self, fitness_functions)
Stores average value of fitness function among chromosomes on specific iteration. ...
def get_population_best(self)
(dict) Returns dictionary with keys 'chromosome' and 'fitness_function' where evolution of the curren...
def _verify_arguments(self)
Verify input parameters for the algorithm and throw exception in case of incorrectness.
def get_observer(self)
Returns genetic algorithm observer.
def get_mean_fitness_function(self)
(list) Returns fitness function's values on each iteration.
def show_clusters(data, observer, marker='.', markersize=None)
Shows allocated clusters by the genetic algorithm.
def get_clusters(self)
Returns list of allocated clusters, each cluster contains indexes of objects from the data...
def _mutation(chromosomes, count_clusters, count_gen_for_mutation, coeff_mutation_count)
Mutation procedure.