expected losses, trials should be allocated to [1 c ], but this would then preclude discovering the best
solution [00]. The second respect is more fundamental: The
claim that the analysis in Holland 1975 leads to an optimal sampling plan has been shown to be
mathematically flawed both by counterexample Rudolph, 1997 and direct analysis Macready
and Wolpert 1998. Proportional selection does not minimize expected losses, so even if this crite-
rion is given preference, the development in Hol- land
1975 does
not support
the use
of proportional selection in evolutionary algorithms.
This form of selection is just one among many options, and the choice should be based on the
dependencies posed by the particular problem.
1
.
5
. A new direction Certainly, the above list could be extended e.g.
inversion was offered to reorder schemata for effective processing as building blocks by one-
point crossover Holland, 1975; pp. 106 – 109, but this has had no general empirical support Davis,
1991; Mitchell, 1996; Lobo et al., 1998. In light of these missteps, it would appear appropriate to
investigate new methods for assessing the funda- mental nature of evolutionary search and opti-
mization. The formulation in Eq. 1 leads directly to a Markov chain view of evolutionary al-
gorithms in which a time-invariant, memoryless probability transition matrix describes the likeli-
hood of transitioning to each possible population configuration given each possible configuration
Fogel, 1994; Rudolph, 1994 and others. Such a description immediately leads to answers regard-
ing questions about the asymptotic behavior of various algorithms e.g. typical instances of evolu-
tion strategies and evolutionary programming ex- hibit
asymptotic global
convergence Fogel,
1995a, whereas the canonical genetic algorithm Holland, 1975 is not convergent due to its re-
liance on proportional selection Rudolph, 1994. Further, as shown above, De Jong et al. 1995
used Markov chains and brute force computation to analyze the exact transient behavior of genetic
algorithms under small populations e.g. size five and small chromosomes e.g. two or three bits
concentrating on the expected waiting time until the global optimum is found for the first time. But
this procedure appears at present to be too com- putationally intensive to be useful in designing
more effective in terms of quality of evolved solution and efficient in terms of rate of conver-
gence evolutionary algorithms for real problems.
The description offered by Eq. 1, however, suggests that some level of understanding of the
behavior of an evolutionary algorithm can be garnered by examining the stochastic effects of the
operators s and 6 on a population x at time t. Of interest is the probabilistic description of the
fitness of the solutions contained in x[t + 1]. Re- cent efforts Altenberg, 1995; Fogel, 1995a;
Grefenstette, 1995; Fogel and Ghozeil, 1996 have been directed at generalized expressions describing
the relationship between offspring and parent fitness under particular variation operators, or
empirical determination of the fitness of offspring for a given random variation technique. This pa-
per offers evidence that this approach to describ- ing the behavior of an evolutionary algorithm can
be used to design more efficient and effective optimization techniques.
2. Background on methods to relate parent and offspring fitness
Altenberg 1995 offered the conjecture that, rather than rely on the schema theorem, the per-
formance of an evolutionary algorithm could be better estimated by examining the probability
mass function:
PrW = wx; wy, wz 4
where W is a random variable, x is the offspring of y and z, and w. returns the fitness of the
argument. Eq. 4 is defined as the transmission function in the fitness domain. Altenberg 1995
noted that empirical estimation of this function could be problematic particularly if the response
surface being searched has multiple domains of attraction. The feasibility of this approach was
recognized to depend on some level of regularity in the response surface.
Grefenstette 1995 offered a similar notion for assessing the suitability of various genetic opera-
tors. Attention was focused on the mean fitness of the offspring generated by applying a genetic op-
erator to a parent conditioned on the parents’ fitness. That is, the fitness distribution of an oper-
ator was defined as:
FD
op
F
p
= PrF
c
F
p
5 where the fitness distribution of an operator FD
op
is the family of probability distributions of the fitness of the offspring F
c
, indexed by the mean fitness of the parents F
p
. It was shown that the mean of the fitness distribution for some genetic
operators could be described by simple linear functions of F
p
. This analysis, although potentially insightful,
suffered from two important drawbacks. First, attention was unfortunately limited to the case of
proportional selection and second, and more im- portantly, the analysis turns on relevance of the
correlation in fitness between parent and off- spring. For example, Grefenstette 1995 offered
that if the fitness distribution of an operator were shown to be independent of the parent’s fitness
then poor performance ‘failure’ should be ex- pected. But this can be contradicted by counterex-
ample. For a Newton – Gauss search on a quadratic bowl, regardless of the position of the
parent, and therefore its fitness, the offspring generated will be at the global optimum and have
minimum error. Thus offspring fitness is indepen- dent of parental fitness, yet the algorithm is as
successful as possible on this function.
The emphasis on correlation between parental fitness and offspring fitness goes back at least to
Manderick et al. 1991. The utility of this ap- proach can suffer when attention is focused on
the correlation between mean parental fitness and mean offspring fitness. For example, for the case
of linear fitness functions, under real-valued rep- resentations the use of zero mean Gaussian muta-
tions yields zero mean difference between parent and offspring fitness regardless of the setting for
the step size control parameter s the standard deviation. But the expected rate of convergence
for these methods depends crucially on the setting of s, as summarized in Ba¨ck 1996 and Fogel
1995a. In contrast, rather than examine mean parental
fitness and how it correlates to mean offspring fitness for a particular search operator, attention
can be more fruitfully given to the expected rate of improvement in terms of mean progress to-
ward the optimum, as was offered in Rechenberg 1973. For the case of searching in R
n
using zero mean Gaussian mutations, Rechenberg 1973
noted that the maximum expected rate of conver- gence was attained for two simple functions, the
sphere and corridor models, when the probability of a successful mutation was approximately 0.2.
Thus, the 15 rule was suggested:
The ratio of successful mutations to all muta- tions should be 15. If this ratio is greater than
15, increase the variance; if it is less, decrease the variance.
Schwefel 1981 suggested measuring the suc- cess probability on-line over 10n trials where
there are n dimensions and adjusting s at itera- tion t by:
s t =
Í Á
Ä
s t − n·d,
s t − nd,
s t − n,
if p
S
B 0.2;
if p
S
\ 0.2;
if p
S
= 0.2;
where d = 0.85 and p
S
equals the number of suc- cesses in 10n trials divided by 10n. This allowed
for a general solution to setting the step size, but the robustness of this procedure remains un-
known in general.
The 15 rule is a static heuristic that was derived based on the analysis of only two classes
of objective functions to generate suitable rates of improvement, and as such is of limited utility. An
alternative procedure that assesses the fitness dis- tribution of a variation operator that does not
rely on the mean parental fitness can be used to yield potential insights and useful comparisons
between various operators. Attention can be devoted to the expected improvement over the
specific parent’s fitness or in light of the selection method employed, alternative criteria such as the
best parent in the population’s fitness, and the probability of improvement.
Fogel 1995b and Fogel and Ghozeil 1996 empirically examined the distribution of fitness
scores attained under different variation operators for specific parameter settings on three continuous
optimization problems sphere, Rosenbrock, Bo- hachevsky and one discrete problem travelling
salesman problem TSP. For the continuous problems, the variation operators included zero
mean Gaussian mutations and different forms of recombination one-point, intermediate. In con-
trast, a variable-length reversal of a segment of the list of cities to be visited was tested for the
TSP. Experiments involved repeated Monte Carlo application of variation operators to parents from
an initial generation, with the probability of im- provement and expected amount of improvement
i.e. the reduction in error being recorded for each trial. The mean behavior of the operators as
a function of their parametrization was depicted graphically see Fig. 4, and consistently showed
the potential for maximizing the expected pro- gress that could be obtained with a particular
operator by adjusting its control parameter and its associated probability of improvement. The
results demonstrated the possibility for optimizing variation operators even when no analytic deriva-
tion for optimal parameters settings may be possible.
Following Fogel and Ghozeil 1996, the method is further developed here and used to
examine the appropriate settings for scaling three different types of single-parent variation operators
across a set of four continuous function optimiza- tion problems in 2, 5, and 10 dimensions. The
results indicate that the expected improvement of an operator can be estimated for various control
parameters; however, in contrast to the 15 rule it may be insufficient to use the probability of im-
provement as a surrogate variable to maximize the expected improvement.
3. Methods