3 @brief Cluster analysis algorithm: Genetic clustering algorithm (GA). 4 @details Implementation based on papers @cite article::ga::1, @cite article::ga::2. 6 @authors Andrey Novikov, Aleksey Kukushkin (pyclustering@yandex.ru) 8 @copyright GNU Public License 10 @cond GNU_PUBLIC_LICENSE 11 PyClustering is free software: you can redistribute it and/or modify 12 it under the terms of the GNU General Public License as published by 13 the Free Software Foundation, either version 3 of the License, or 14 (at your option) any later version. 16 PyClustering is distributed in the hope that it will be useful, 17 but WITHOUT ANY WARRANTY; without even the implied warranty of 18 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 19 GNU General Public License for more details. 21 You should have received a copy of the GNU General Public License 22 along with this program. If not, see <http://www.gnu.org/licenses/>. 33 import matplotlib.pyplot
as plt
34 import matplotlib.animation
as animation
35 except Exception
as error_instance:
36 warnings.warn(
"Impossible to import matplotlib (please, install 'matplotlib'), pyclustering's visualization " 37 "functionality is not available (details: '%s')." % str(error_instance))
40 from pyclustering.cluster.ga_maths
import ga_math
46 @brief Genetic algorithm observer that is used to collect information about clustering process on each iteration. 50 def __init__(self, need_global_best=False, need_population_best=False, need_mean_ff=False):
52 @brief Constructs genetic algorithm observer to collect specific information. 54 @param[in] need_global_best (bool): If 'True' then the best chromosomes and its fitness function value (global optimum) for each iteration are stored. 55 @param[in] need_population_best (bool): If 'True' then current (on each iteration) best chromosomes and its fitness function value (local optimum) are stored. 56 @param[in] need_mean_ff (bool): If 'True' then average value of fitness function on each iteration is stored. 77 @brief Returns amount of iterations that genetic algorithm was observed. 84 return max(global_length, local_length, average_length);
89 @brief Stores the best chromosome and its fitness function's value. 91 @param[in] best_chromosome (list): The best chromosome that were observed. 92 @param[in] best_fitness_function (float): Fitness function value of the best chromosome. 105 @brief Stores the best chromosome for current specific iteration and its fitness function's value. 107 @param[in] best_chromosome (list): The best chromosome on specific iteration. 108 @param[in] best_fitness_function (float): Fitness function value of the chromosome. 121 @brief Stores average value of fitness function among chromosomes on specific iteration. 123 @param[in] fitness_functions (float): Average value of fitness functions among chromosomes. 135 @return (dict) Returns dictionary with keys 'chromosome' and 'fitness_function' where evolution of the best chromosome 136 and its fitness function's value (evolution of global optimum) are stored in lists. 144 @brief (dict) Returns dictionary with keys 'chromosome' and 'fitness_function' where evolution of the current best chromosome 145 and its fitness function's value (evolution of local optimum) are stored in lists. 153 @brief (list) Returns fitness function's values on each iteration. 162 @brief Genetic algorithm visualizer is used to show clustering results that are specific for 163 this particular algorithm: clusters, evolution of global and local optimum. 164 @details The visualizer requires 'ga_observer' that collects evolution of clustering process in 165 genetic algorithm. The observer is created by user and passed to genetic algorithm. There 166 is usage example of the visualizer using the observer: 168 # Read data for clustering 169 sample = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE1); 171 # Create instance of observer that will collect all information: 172 observer_instance = ga_observer(True, True, True); 174 # Create genetic algorithm where observer will collect information: 175 ga_instance = genetic_algorithm(data=sample, 179 count_mutation_gens=1, 180 observer=observer_instance); 183 ga_instance.process(); 185 # Show cluster using observer: 186 ga_visualizer.show_clusters(sample, observer_instance); 189 @see cluster_visualizer 194 def show_evolution(observer, start_iteration = 0, stop_iteration = None, ax = None, display = True):
196 @brief Displays evolution of fitness function for the best chromosome, for the current best chromosome and 197 average value among all chromosomes. 199 @param[in] observer (ga_observer): Genetic algorithm observer that was used for collecting evolution in the algorithm and 200 where whole required information for visualization is stored. 201 @param[in] start_iteration (uint): Iteration from that evolution should be shown. 202 @param[in] stop_iteration (uint): Iteration after that evolution shouldn't be shown. 203 @param[in] ax (Axes): Canvas where evolution should be displayed. 204 @param[in] display (bool): If 'True' then visualization of the evolution will be shown by the function. 205 This argument should be 'False' if you want to add something else to the canvas and display it later. 207 @return (Axis) Canvas where evolution was shown. 212 _, ax = plt.subplots(1);
213 ax.set_title(
"Evolution");
215 if (stop_iteration
is None):
216 stop_iteration = len(observer);
218 line_best, = ax.plot(observer.get_global_best()[
'fitness_function'][start_iteration:stop_iteration],
'r'); 219 line_current, = ax.plot(observer.get_population_best()['fitness_function'][start_iteration:stop_iteration],
'k');
220 line_mean, = ax.plot(observer.get_mean_fitness_function()[start_iteration:stop_iteration],
'c');
222 if (start_iteration < (stop_iteration - 1)):
223 ax.set_xlim([start_iteration, (stop_iteration - 1)]);
225 ax.set_xlabel(
"Iteration");
226 ax.set_ylabel(
"Fitness function");
227 ax.legend([line_best, line_current, line_mean], [
"The best pop.",
"Cur. best pop.",
"Average"], prop={
'size': 10});
230 if (display
is True):
239 @brief Shows allocated clusters by the genetic algorithm. 241 @param[in] data (list): Input data that was used for clustering process by the algorithm. 242 @param[in] observer (ga_observer): Observer that was used for collection information about clustering process. 243 @param[in] marker (char): Type of marker that should be used for object (point) representation. 244 @param[in] markersize (uint): Size of the marker that is used for object (point) representation. 246 @note If you have clusters instead of observer then 'cluster_visualizer' can be used for visualization purposes. 248 @see cluster_visualizer 252 figure = plt.figure();
253 ax1 = figure.add_subplot(121);
255 clusters = ga_math.get_clusters_representation(observer.get_global_best()[
'chromosome'][-1]);
258 visualizer.append_clusters(clusters, data, 0, marker, markersize);
259 visualizer.show(figure, display =
False);
261 ga_visualizer.show_evolution(observer, 0,
None, ax1,
True);
267 @brief Animate clustering process of genetic clustering algorithm. 268 @details This method can be also used for rendering movie of clustering process and 'ffmpeg' is required for that purpuse. 270 @param[in] data (list): Input data that was used for clustering process by the algorithm. 271 @param[in] observer (ga_observer): Observer that was used for collection information about clustering process. 272 Be sure that whole information was collected by the observer. 273 @param[in] animation_velocity (uint): Interval between frames in milliseconds (for run-time animation only). 274 @param[in] movie_fps (uint): Defines frames per second (for rendering movie only). 275 @param[in] save_movie (string): If it is specified then animation will be stored to file that is specified in this parameter. 279 figure = plt.figure();
282 return frame_generation(0);
284 def frame_generation(index_iteration):
287 figure.suptitle(
"Clustering genetic algorithm (iteration: " + str(index_iteration) +
")", fontsize = 18, fontweight =
'bold');
289 visualizer =
cluster_visualizer(4, 2, [
"The best pop. on step #" + str(index_iteration),
"The best population"]);
291 local_minimum_clusters = ga_math.get_clusters_representation(observer.get_population_best()[
'chromosome'][index_iteration]);
292 visualizer.append_clusters(local_minimum_clusters, data, 0);
294 global_minimum_clusters = ga_math.get_clusters_representation(observer.get_global_best()[
'chromosome'][index_iteration]);
295 visualizer.append_clusters(global_minimum_clusters, data, 1);
297 ax1 = plt.subplot2grid((2, 2), (1, 0), colspan = 2);
298 ga_visualizer.show_evolution(observer, 0, index_iteration + 1, ax1,
False);
300 visualizer.show(figure, shift = 0, display =
False);
301 figure.subplots_adjust(top = 0.85);
303 return [ figure.gca() ];
305 iterations = len(observer);
306 cluster_animation = animation.FuncAnimation(figure, frame_generation, iterations, interval = animation_velocity, init_func = init_frame, repeat_delay = 5000);
308 if (save_movie
is not None):
309 cluster_animation.save(save_movie, writer =
'ffmpeg', fps = movie_fps, bitrate = 1500);
316 @brief Class represents Genetic clustering algorithm. 317 @details The searching capability of genetic algorithms is exploited in order to search for appropriate 320 Example of clustering using genetic algorithm: 322 # Read input data for clustering 323 sample = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE4); 325 # Create genetic algorithm for clustering 326 ga_instance = genetic_algorithm(data=sample, 328 chromosome_count=100, 329 population_count=200, 330 count_mutation_gens=1); 332 # Start clustering process 333 ga_instance.process(); 335 # Extract extracted clusters by the algorithm 336 clusters = ga_instance.get_clusters(); 340 There is an example of clustering results (fitness function evolution and allocated clusters) that were 341 visualized by 'ga_visualizer': 343 @image html ga_clustering_sample_simple_04.png 350 def __init__(self, data, count_clusters, chromosome_count, population_count, count_mutation_gens=2,
351 coeff_mutation_count=0.25, select_coeff=1.0, observer=ga_observer()):
353 @brief Initialize genetic clustering algorithm for cluster analysis. 355 @param[in] data (numpy.array|list): Input data for clustering that is represented by two dimensional array 356 where each row is a point, for example, [[0.0, 2.1], [0.1, 2.0], [-0.2, 2.4]]. 357 @param[in] count_clusters (uint): Amount of clusters that should be allocated in the data. 358 @param[in] chromosome_count (uint): Amount of chromosomes in each population. 359 @param[in] population_count (uint): Amount of populations. 360 @param[in] count_mutation_gens (uint): Amount of genes in chromosome that is mutated on each step. 361 @param[in] coeff_mutation_count (float): Percent of chromosomes for mutation, destributed in range (0, 1] and 362 thus amount of chromosomes is defined as follows: 'chromosome_count' * 'coeff_mutation_count'. 363 @param[in] select_coeff (float): Exponential coefficient for selection procedure that is used as follows: 364 math.exp(1 + fitness(chromosome) * select_coeff). 365 @param[in] observer (ga_observer): Observer that is used for collecting information of about clustering process on each step. 373 if type(data)
is list:
374 self.
_data = np.array(data)
401 'best_fitness_function': 0.0}
408 @brief Perform clustering procedure in line with rule of genetic clustering algorithm. 418 best_chromosome, best_ff, first_fitness_functions \
423 self.
_observer.collect_global_best(best_chromosome, best_ff)
424 self.
_observer.collect_population_best(best_chromosome, best_ff)
425 self.
_observer.collect_mean(first_fitness_functions)
440 new_best_chromosome, new_best_ff, fitness_functions \
444 if new_best_ff < best_ff:
445 best_ff = new_best_ff
446 best_chromosome = new_best_chromosome
450 self.
_observer.collect_global_best(best_chromosome, best_ff)
451 self.
_observer.collect_population_best(new_best_chromosome, new_best_ff)
452 self.
_observer.collect_mean(fitness_functions)
458 return best_chromosome, best_ff
463 @brief Returns genetic algorithm observer. 471 @brief Returns list of allocated clusters, each cluster contains indexes of objects from the data. 473 @return (list) List of allocated clusters. 483 def _select(chromosomes, data, count_clusters, select_coeff):
485 @brief Performs selection procedure where new chromosomes are calculated. 487 @param[in] chromosomes (numpy.array): Chromosomes 492 centres = ga_math.get_centres(chromosomes, data, count_clusters)
495 fitness = genetic_algorithm._calc_fitness_function(centres, data, chromosomes)
497 for _idx
in range(len(fitness)):
498 fitness[_idx] = math.exp(1 + fitness[_idx] * select_coeff)
501 probabilities = ga_math.calc_probability_vector(fitness)
504 new_chromosomes = np.zeros(chromosomes.shape, dtype=np.int)
507 for _idx
in range(len(chromosomes)):
508 new_chromosomes[_idx] = chromosomes[ga_math.get_uniform(probabilities)]
510 return new_chromosomes
514 def _crossover(chromosomes):
516 @brief Crossover procedure. 521 pairs_to_crossover = np.array(range(len(chromosomes)))
524 np.random.shuffle(pairs_to_crossover)
527 offset_in_pair = int(len(pairs_to_crossover) / 2)
530 for _idx
in range(offset_in_pair):
533 crossover_mask = genetic_algorithm._get_crossover_mask(len(chromosomes[_idx]))
536 genetic_algorithm._crossover_a_pair(chromosomes[pairs_to_crossover[_idx]],
537 chromosomes[pairs_to_crossover[_idx + offset_in_pair]],
542 def _mutation(chromosomes, count_clusters, count_gen_for_mutation, coeff_mutation_count):
544 @brief Mutation procedure. 549 count_gens = len(chromosomes[0])
552 random_idx_chromosomes = np.array(range(len(chromosomes)))
553 np.random.shuffle(random_idx_chromosomes)
556 for _idx_chromosome
in range(int(len(random_idx_chromosomes) * coeff_mutation_count)):
559 for _
in range(count_gen_for_mutation):
562 gen_num = np.random.randint(count_gens)
565 chromosomes[random_idx_chromosomes[_idx_chromosome]][gen_num] = np.random.randint(count_clusters)
569 def _crossover_a_pair(chromosome_1, chromosome_2, mask):
571 @brief Crossovers a pair of chromosomes. 573 @param[in] chromosome_1 (numpy.array): The first chromosome for crossover. 574 @param[in] chromosome_2 (numpy.array): The second chromosome for crossover. 575 @param[in] mask (numpy.array): Crossover mask that defines which genes should be swapped. 579 for _idx
in range(len(chromosome_1)):
583 chromosome_1[_idx], chromosome_2[_idx] = chromosome_2[_idx], chromosome_1[_idx]
587 def _get_crossover_mask(mask_length):
589 @brief Crossover mask to crossover a pair of chromosomes. 591 @param[in] mask_length (uint): Length of the mask. 596 mask = np.zeros(mask_length)
599 mask[:int(int(mask_length) / 6)] = 1
602 np.random.shuffle(mask)
608 def _init_population(count_clusters, count_data, chromosome_count):
610 @brief Returns first population as a uniform random choice. 612 @param[in] count_clusters (uint): Amount of clusters that should be allocated. 613 @param[in] count_data (uint): Data size that is used for clustering process. 614 @param[in] chromosome_count (uint):Amount of chromosome that is used for clustering. 618 population = np.random.randint(count_clusters, size=(chromosome_count, count_data))
624 def _get_best_chromosome(chromosomes, data, count_clusters):
626 @brief Returns the current best chromosome. 628 @param[in] chromosomes (list): Chromosomes that are used for searching. 629 @param[in] data (list): Input data that is used for clustering process. 630 @param[in] count_clusters (uint): Amount of clusters that should be allocated. 632 @return (list, float, list) The best chromosome, its fitness function value and fitness function values for 638 centres = ga_math.get_centres(chromosomes, data, count_clusters)
641 fitness_functions = genetic_algorithm._calc_fitness_function(centres, data, chromosomes)
644 best_chromosome_idx = fitness_functions.argmin()
647 return chromosomes[best_chromosome_idx], fitness_functions[best_chromosome_idx], fitness_functions
651 def _calc_fitness_function(centres, data, chromosomes):
653 @brief Calculate fitness function values for chromosomes. 655 @param[in] centres (list): Cluster centers. 656 @param[in] data (list): Input data that is used for clustering process. 657 @param[in] chromosomes (list): Chromosomes whose fitness function's values are calculated. 659 @return (list) Fitness function value for each chromosome correspondingly. 664 count_chromosome = len(chromosomes)
667 fitness_function = np.zeros(count_chromosome)
670 for _idx_chromosome
in range(count_chromosome):
673 centres_data = np.zeros(data.shape)
676 for _idx
in range(len(data)):
677 centres_data[_idx] = centres[_idx_chromosome][chromosomes[_idx_chromosome][_idx]]
680 fitness_function[_idx_chromosome] += np.sum(abs(data - centres_data))
682 return fitness_function
Common visualizer of clusters on 1D, 2D or 3D surface.
def animate_cluster_allocation(data, observer, animation_velocity=75, movie_fps=5, save_movie=None)
Animate clustering process of genetic clustering algorithm.
pyclustering module for cluster analysis.
Genetic algorithm visualizer is used to show clustering results that are specific for this particular...
def get_global_best(self)
def __init__(self, need_global_best=False, need_population_best=False, need_mean_ff=False)
Constructs genetic algorithm observer to collect specific information.
def collect_global_best(self, best_chromosome, best_fitness_function)
Stores the best chromosome and its fitness function's value.
def collect_population_best(self, best_chromosome, best_fitness_function)
Stores the best chromosome for current specific iteration and its fitness function's value...
def process(self)
Perform clustering procedure in line with rule of genetic clustering algorithm.
def show_evolution(observer, start_iteration=0, stop_iteration=None, ax=None, display=True)
Displays evolution of fitness function for the best chromosome, for the current best chromosome and a...
def __len__(self)
Returns amount of iterations that genetic algorithm was observed.
def _crossover(chromosomes)
Crossover procedure.
Class represents Genetic clustering algorithm.
def _get_best_chromosome(chromosomes, data, count_clusters)
Returns the current best chromosome.
def _init_population(count_clusters, count_data, chromosome_count)
Returns first population as a uniform random choice.
def _select(chromosomes, data, count_clusters, select_coeff)
Performs selection procedure where new chromosomes are calculated.
Genetic algorithm observer that is used to collect information about clustering process on each itera...
def collect_mean(self, fitness_functions)
Stores average value of fitness function among chromosomes on specific iteration. ...
def get_population_best(self)
(dict) Returns dictionary with keys 'chromosome' and 'fitness_function' where evolution of the curren...
def get_observer(self)
Returns genetic algorithm observer.
def get_mean_fitness_function(self)
(list) Returns fitness function's values on each iteration.
def show_clusters(data, observer, marker='.', markersize=None)
Shows allocated clusters by the genetic algorithm.
def __init__(self, data, count_clusters, chromosome_count, population_count, count_mutation_gens=2, coeff_mutation_count=0.25, select_coeff=1.0, observer=ga_observer())
Initialize genetic clustering algorithm for cluster analysis.
def get_clusters(self)
Returns list of allocated clusters, each cluster contains indexes of objects from the data...
def _mutation(chromosomes, count_clusters, count_gen_for_mutation, coeff_mutation_count)
Mutation procedure.