Background on methods to relate parent and offspring fitness

expected losses, trials should be allocated to [1 c ], but this would then preclude discovering the best solution [00]. The second respect is more fundamental: The claim that the analysis in Holland 1975 leads to an optimal sampling plan has been shown to be mathematically flawed both by counterexample Rudolph, 1997 and direct analysis Macready and Wolpert 1998. Proportional selection does not minimize expected losses, so even if this crite- rion is given preference, the development in Hol- land 1975 does not support the use of proportional selection in evolutionary algorithms. This form of selection is just one among many options, and the choice should be based on the dependencies posed by the particular problem. 1 . 5 . A new direction Certainly, the above list could be extended e.g. inversion was offered to reorder schemata for effective processing as building blocks by one- point crossover Holland, 1975; pp. 106 – 109, but this has had no general empirical support Davis, 1991; Mitchell, 1996; Lobo et al., 1998. In light of these missteps, it would appear appropriate to investigate new methods for assessing the funda- mental nature of evolutionary search and opti- mization. The formulation in Eq. 1 leads directly to a Markov chain view of evolutionary al- gorithms in which a time-invariant, memoryless probability transition matrix describes the likeli- hood of transitioning to each possible population configuration given each possible configuration Fogel, 1994; Rudolph, 1994 and others. Such a description immediately leads to answers regard- ing questions about the asymptotic behavior of various algorithms e.g. typical instances of evolu- tion strategies and evolutionary programming ex- hibit asymptotic global convergence Fogel, 1995a, whereas the canonical genetic algorithm Holland, 1975 is not convergent due to its re- liance on proportional selection Rudolph, 1994. Further, as shown above, De Jong et al. 1995 used Markov chains and brute force computation to analyze the exact transient behavior of genetic algorithms under small populations e.g. size five and small chromosomes e.g. two or three bits concentrating on the expected waiting time until the global optimum is found for the first time. But this procedure appears at present to be too com- putationally intensive to be useful in designing more effective in terms of quality of evolved solution and efficient in terms of rate of conver- gence evolutionary algorithms for real problems. The description offered by Eq. 1, however, suggests that some level of understanding of the behavior of an evolutionary algorithm can be garnered by examining the stochastic effects of the operators s and 6 on a population x at time t. Of interest is the probabilistic description of the fitness of the solutions contained in x[t + 1]. Re- cent efforts Altenberg, 1995; Fogel, 1995a; Grefenstette, 1995; Fogel and Ghozeil, 1996 have been directed at generalized expressions describing the relationship between offspring and parent fitness under particular variation operators, or empirical determination of the fitness of offspring for a given random variation technique. This pa- per offers evidence that this approach to describ- ing the behavior of an evolutionary algorithm can be used to design more efficient and effective optimization techniques.

2. Background on methods to relate parent and offspring fitness

Altenberg 1995 offered the conjecture that, rather than rely on the schema theorem, the per- formance of an evolutionary algorithm could be better estimated by examining the probability mass function: PrW = wx; wy, wz 4 where W is a random variable, x is the offspring of y and z, and w. returns the fitness of the argument. Eq. 4 is defined as the transmission function in the fitness domain. Altenberg 1995 noted that empirical estimation of this function could be problematic particularly if the response surface being searched has multiple domains of attraction. The feasibility of this approach was recognized to depend on some level of regularity in the response surface. Grefenstette 1995 offered a similar notion for assessing the suitability of various genetic opera- tors. Attention was focused on the mean fitness of the offspring generated by applying a genetic op- erator to a parent conditioned on the parents’ fitness. That is, the fitness distribution of an oper- ator was defined as: FD op F p = PrF c F p 5 where the fitness distribution of an operator FD op is the family of probability distributions of the fitness of the offspring F c , indexed by the mean fitness of the parents F p . It was shown that the mean of the fitness distribution for some genetic operators could be described by simple linear functions of F p . This analysis, although potentially insightful, suffered from two important drawbacks. First, attention was unfortunately limited to the case of proportional selection and second, and more im- portantly, the analysis turns on relevance of the correlation in fitness between parent and off- spring. For example, Grefenstette 1995 offered that if the fitness distribution of an operator were shown to be independent of the parent’s fitness then poor performance ‘failure’ should be ex- pected. But this can be contradicted by counterex- ample. For a Newton – Gauss search on a quadratic bowl, regardless of the position of the parent, and therefore its fitness, the offspring generated will be at the global optimum and have minimum error. Thus offspring fitness is indepen- dent of parental fitness, yet the algorithm is as successful as possible on this function. The emphasis on correlation between parental fitness and offspring fitness goes back at least to Manderick et al. 1991. The utility of this ap- proach can suffer when attention is focused on the correlation between mean parental fitness and mean offspring fitness. For example, for the case of linear fitness functions, under real-valued rep- resentations the use of zero mean Gaussian muta- tions yields zero mean difference between parent and offspring fitness regardless of the setting for the step size control parameter s the standard deviation. But the expected rate of convergence for these methods depends crucially on the setting of s, as summarized in Ba¨ck 1996 and Fogel 1995a. In contrast, rather than examine mean parental fitness and how it correlates to mean offspring fitness for a particular search operator, attention can be more fruitfully given to the expected rate of improvement in terms of mean progress to- ward the optimum, as was offered in Rechenberg 1973. For the case of searching in R n using zero mean Gaussian mutations, Rechenberg 1973 noted that the maximum expected rate of conver- gence was attained for two simple functions, the sphere and corridor models, when the probability of a successful mutation was approximately 0.2. Thus, the 15 rule was suggested: The ratio of successful mutations to all muta- tions should be 15. If this ratio is greater than 15, increase the variance; if it is less, decrease the variance. Schwefel 1981 suggested measuring the suc- cess probability on-line over 10n trials where there are n dimensions and adjusting s at itera- tion t by: s t = Í Á Ä s t − n·d, s t − nd, s t − n, if p S B 0.2; if p S \ 0.2; if p S = 0.2; where d = 0.85 and p S equals the number of suc- cesses in 10n trials divided by 10n. This allowed for a general solution to setting the step size, but the robustness of this procedure remains un- known in general. The 15 rule is a static heuristic that was derived based on the analysis of only two classes of objective functions to generate suitable rates of improvement, and as such is of limited utility. An alternative procedure that assesses the fitness dis- tribution of a variation operator that does not rely on the mean parental fitness can be used to yield potential insights and useful comparisons between various operators. Attention can be devoted to the expected improvement over the specific parent’s fitness or in light of the selection method employed, alternative criteria such as the best parent in the population’s fitness, and the probability of improvement. Fogel 1995b and Fogel and Ghozeil 1996 empirically examined the distribution of fitness scores attained under different variation operators for specific parameter settings on three continuous optimization problems sphere, Rosenbrock, Bo- hachevsky and one discrete problem travelling salesman problem TSP. For the continuous problems, the variation operators included zero mean Gaussian mutations and different forms of recombination one-point, intermediate. In con- trast, a variable-length reversal of a segment of the list of cities to be visited was tested for the TSP. Experiments involved repeated Monte Carlo application of variation operators to parents from an initial generation, with the probability of im- provement and expected amount of improvement i.e. the reduction in error being recorded for each trial. The mean behavior of the operators as a function of their parametrization was depicted graphically see Fig. 4, and consistently showed the potential for maximizing the expected pro- gress that could be obtained with a particular operator by adjusting its control parameter and its associated probability of improvement. The results demonstrated the possibility for optimizing variation operators even when no analytic deriva- tion for optimal parameters settings may be possible. Following Fogel and Ghozeil 1996, the method is further developed here and used to examine the appropriate settings for scaling three different types of single-parent variation operators across a set of four continuous function optimiza- tion problems in 2, 5, and 10 dimensions. The results indicate that the expected improvement of an operator can be estimated for various control parameters; however, in contrast to the 15 rule it may be insufficient to use the probability of im- provement as a surrogate variable to maximize the expected improvement.

3. Methods