Methods Directory UMM :Data Elmu:jurnal:B:Biosystems:Vol54.Issue1-2.1999:

Fogel 1995b and Fogel and Ghozeil 1996 empirically examined the distribution of fitness scores attained under different variation operators for specific parameter settings on three continuous optimization problems sphere, Rosenbrock, Bo- hachevsky and one discrete problem travelling salesman problem TSP. For the continuous problems, the variation operators included zero mean Gaussian mutations and different forms of recombination one-point, intermediate. In con- trast, a variable-length reversal of a segment of the list of cities to be visited was tested for the TSP. Experiments involved repeated Monte Carlo application of variation operators to parents from an initial generation, with the probability of im- provement and expected amount of improvement i.e. the reduction in error being recorded for each trial. The mean behavior of the operators as a function of their parametrization was depicted graphically see Fig. 4, and consistently showed the potential for maximizing the expected pro- gress that could be obtained with a particular operator by adjusting its control parameter and its associated probability of improvement. The results demonstrated the possibility for optimizing variation operators even when no analytic deriva- tion for optimal parameters settings may be possible. Following Fogel and Ghozeil 1996, the method is further developed here and used to examine the appropriate settings for scaling three different types of single-parent variation operators across a set of four continuous function optimiza- tion problems in 2, 5, and 10 dimensions. The results indicate that the expected improvement of an operator can be estimated for various control parameters; however, in contrast to the 15 rule it may be insufficient to use the probability of im- provement as a surrogate variable to maximize the expected improvement.

3. Methods

To begin, three sets of experiments were per- formed to investigate the properties of three varia- tion operations that are common in the application of evolutionary algorithms to continu- ous function optimization problems. The frame- work for selection is based on the 1, 100 model, indicating that a single parent generates 100 off- spring, and then the best of these offspring is selected to be the parent for the next generation. Attention was given to the probability of im- provement PI and the expected improvement EI attained by the application of a particular variation operator. The algorithm proceeded as follows: i The trial number, t, was set to 1. ii 100 initial solutions parents x i t, i = 1,…, 100 were sampled uniformly from an interval [a,b] n . iii Each parent was evaluated in light of the objective function Fx defined below. iv The best parent xt with the lowest objec- tive value was used to generate 100 offspring, x i t, i = 1,…, 100 through 100 independent ap- plications of a variation operator 6. The variation 6 was accomplished in the form: Fig. 4. Fogel and Ghozeil 1996 showed the expected im- provement on the well-known Bohachevsky functions see Fogel and Stayton, 1994 from the best offspring as a function of the standard deviation s of a zero mean Gaussian muta- tion in each dimension given selection of the best parent from 100 samples. The expected improvement is standardized to have a mean of zero across sigma in the range of [0,2]. The curves indicate that the best setting for the standard deviation of the mutation varies by the dimension of the problem. They also indicate that the expected progress is more sensitive to the proper setting of the standard deviation in two dimensions as opposed to ten dimensions. Fig. 5. The probability density function pdf of the standard Gaussian and Cauchy pdfs in comparison with their convolu- tion. The convolution of the pdfs is equivalent to taking the mean of the random variables. There exists a trade-off among the three pdfs between the probabilities of generating very small 0.0 – 0.6; small 0.6 – 1.2; medium 1.2 – 2.0; large 2.0 – 4.8; and very large \ 4.8 mutations. These were the pdfs used for generating 100 offspring from the best parents xt Eq. 6. Steps i – vii were repeated for 5000 trials in a Monte Carlo fashion whereupon the mean ft and It were recorded as estimates of PI and EI for the variation operator 6 with scaling term s. For convenience, PI and EI are used to denote these estimates in the following discourse. The values of s are identified later in this section. Each experiment was conducted on four test functions, F 1 – F 4 , given by: F 1 = n i = 1 x i 2 8 F 2 = − 20 exp − 0.2 1 n n i = 1 x i 2 − exp 1 n n i = 1 cos2px i + 20 + e 9 F 3 = n i = 1 x i 2 − 10 cos2px i + 10 10 F 4 = n i = 1 floor x i 11 Function F 1 is the sphere, F 2 is a modified version of the Ackley function, F 3 is the Rastrigin function, and F 4 is the generalized step function. The three different variation operators were performed as: x i, j = x j + sN j 0,1 12 x i, j = x j + sC j 0,1 13 x i, j = x j + 0.5sN j 0,1 + C j 0,1 14 where N0,1 is a standard normal RV, C0,1 is a standard Cauchy RV, j is an index for the jth dimension, and i is an index for the ith offspring from x. For the case of Gaussian mutation, s is the standard deviation s, but recall that the stan- dard deviation of a Cauchy pdf is undefined, thus s is best viewed simply as a scaling factor. Throughout the remainder of the paper, these three variations are described as Gaussian, Cauchy, and mean mutation operators GMO, CMO, and MMO, respectively. For each function F 1 – F 4 , 200 separate experi- ments each of 5000 trials were conducted by stepping the value of s from 0.01 to 4.00 by increments of 0.02. Initial solutions were dis- x i t = xt + 6 6 where 6 was a random variable with one of three possible probability density functions pdfs: 1 a zero mean Gaussian random variable with stan- dard deviation scaling parameter s; 2 a stan- dard Cauchy random variable scaled by s; 3 a convolution of 1 and 2. These variation opera- tors follow typical implementations in evolution- ary computation for real parameter optimization Ba¨ck, 1996; Rudolph, 1997; Chellapilla, 1998b. Fig. 5 indicates the pdf for each choice. v Each offspring was evaluated in light of Fx. vi The fraction of offspring, ft, that were strictly better i.e. lower error than xt was computed. vii The offspring with lowest error, xt, was used to compute the improvement during trial t using It = Fxt − Fxt 7 Note that It could be negative if the best of the 100 offspring was worse than the parent that generated it. viii The trial number, t, was incremented. tributed uniformly over [ − 4,4] n , n = 2, 5, and 10 which is symmetric about the optimum solu- tion. Fig. 8. The PI for the GMO, CMO, and MMO across the settings of the step size, s, for the 10-dimensional sphere function F 1 . For all three operators, the PI values decrease with increasing sigma. Since the quadratic function is continu- ous and unimodal, the PI value attains a peak value of 0.5 as s tends to 0, which is the PI value on an inclined plane. The PI curve for the CMO drops the fastest and is followed by those for the MMO and GMO. Paralleling the estimates for EI Fig. 7, the GMO offers the greatest PI for any fixed value of s, followed by MMO and CMO, respectively. For s “ 0 the PI “ 0.5, and as s becomes large the PI tends to zero. Fig. 6. The EI for the GMO, CMO, and MMO across the settings of the step size, s, for the 2-dimensional sphere func- tion F 1 . The CMO EI curve peaks first, followed by those for the GMO and MMO. The maximum EI occurs at s = 0.47, 0.49, and 0.21 for the GMO, MMO, and CMO, respectively. The corresponding peak EI values were 0.20, 0.19, and 0.20 for the GMO, MMO, and CMO, respectively. The GMO curve has the largest bandwidth, followed by those for the MMO and CMO.

4. Results