Directory UMM :Data Elmu:jurnal:O:Operations Research Letters:Vol26.Issue1.2000:
Operations Research Letters 26 (2000) 1–8
www.elsevier.com/locate/orms
A variable target value method for nondierentiable optimization
Hanif D. Sherali ∗ , Gyunghyun Choi, Cihan H. Tuncbilek
Department of Industrial and Systems Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
Received 1 March 1998; received in revised form 1 July 1999
Abstract
This paper presents a new Variable target value method (VTVM) that can be used in conjunction with pure or de
ected
subgradient strategies. The proposed procedure assumes no a priori knowledge regarding bounds on the optimal value.
The target values are updated iteratively whenever necessary, depending on the information obtained in the process of the
algorithm. Moreover, convergence of the sequence of incumbent solution values to a near-optimum is proved using popular,
practically desirable step-length rules. In addition, the method also allows a wide
exibility in designing subgradient de
ection
strategies by imposing only mild conditions on the de
ection parameter. Some preliminary computational results are reported
c 2000 Elsevier Science B.V. All
on a set of standard test problems in order to demonstrate the viability of this approach.
rights reserved.
Keywords: Nondierentiable optimization; Subgradient algorithm; Conjugate subgradient methods; De
ected subgradient
directions; Variable target value method
1. Introduction
Consider the nondierentiable optimization problem
NDO: Minimize {f(x): x ∈ X }
(1)
where f is a convex function that is not necessarily
dierentiable, and X is a nonempty, closed, convex
subset of Rn . We assume that (1) has an optimum
x∗ and that the set of subgradients of f over X is a
bounded set.
One approach to solve such problems is to use
subgradient optimization methods. Here, given an
∗ Corresponding author. Fax:+1-540-231-3322.
E-mail address: [email protected] (H.D. Sherali)
iterate xk ∈ X; k¿1, a direction of motion dk is generated based on the set of subgradients of f at or about
xk , and a step-length k is taken along this direction.
The new iterate xk+1 is then computed according to
xk+1 = PX [xk + k dk ], where PX [ · ] denotes the projection operation onto the set X . The eectiveness of
this scheme is strongly dependent on the choice of dk
and k . In this paper, we will focus on the more popular pure or de
ected subgradient strategies that have
been spurred by Lagrangian relaxation applications in
which dk is selected as either −gk or −gk + k dk−1 ,
respectively, where gk is a subgradient of f at xk ,
and where k ¿0 is a suitable de
ection parameter
associated with the previous direction of motion, with
d0 ≡ 0. (See [3,6,10,15,17–19] for various subgradient de
ection strategies.) Note that in our analysis,
c 2000 Elsevier Science B.V. All rights reserved.
0167-6377/00/$ - see front matter
PII: S 0 1 6 7 - 6 3 7 7 ( 9 9 ) 0 0 0 6 3 - 2
2
H.D. Sherali et al. / Operations Research Letters 26 (2000) 1–8
we permit the use of any de
ection strategy so long as
the de
ection parameters k ¿0 are chosen such that
kdk k ¡ M for all k; for some suciently
large number M:
(2)
This should be easy to ensure, given our assumption
on the boundedness of the set of subgradients of f
over X . In particular, both the modied gradient technique (MGT ) of Camerini et al. [3] and the average
direction strategy (ADS) of Sherali and Ulular [18]
satisfy condition (2).
As far as related step-length rules are concerned,
these play an important role not only in ensuring convergence, but also in governing the practical rate of
convergence to optimality (see [4,7,17] for example).
The more eective step-length rules, popularized by
Held and Karp [6] and Held et al. [7], are of the type
k = k
f(xk ) − w
;
kdk k2
(3)
where 0 ¡ 6
k 6ˆ ¡ 2, and where w is a target
value.
Often, w is taken as a xed lower bound on the
problem. On the other hand, Bazaraa and Sherali [1]
present some rules to choose the target value as a
convex combination of a xed lower bound and the
current best objective value. A similar idea is used by
Kim et al. [9] in their variable target value method
for minimizing strongly convex functions. However,
both these papers assume that some initial lower
bound estimate is available, and moreover, Kim et
al. [9] additionally assume that an upper bound on
kx1 − x∗ k is known, where x∗ is an optimal solution
to NDO.
Our motivation in this paper is to present a new
variable target value method (VTVM) for general
nondierentiable optimization problems that adopts an
eective subgradient step-length rule of type (3) and
that assumes no a priori knowledge whatsoever regarding the optimal objective value. Brannlund [2] describes an alternate scheme in his dissertation for such
problems, Gon and Kiwiel [5] present a convergence
analysis for a variant of Brannlund’s method, and
Kulikov and Fazylov [14] propose a xed, short-step
variant of our algorithm. For a more extensive discussion on related subgradient projection methods, we
refer the reader to Kiwiel [12,13].
The remainder of this paper is organized as follows. In Section 2, we present the proposed algorithm
VTVM, and we analyze its convergence properties in
Section 3. Section 4, provides some preliminary computational results using a variety of standard test problems from the literature.
2. The algorithm VTVM
The proposed algorithm operates in an inner loop
and an outer loop. The inner loop involves the main
process of generating a sequence of iterates. Depending on the progress during such inner loop iterations,
the target value and other related parameters are periodically updated in an outer loop adjustment step. The
algorithm is designed to theoretically converge in objective value to within any a priori specied tolerance
¿ 0 of the optimal value, while preserving a reasonable degree of computational eectiveness in practice.
It accomplishes the latter by permitting the use of a
variety of subgradient de
ection direction strategies,
and by employing the practically eective step-length
rule (3).
Below, we highlight our notation and then present a
statement of the algorithm. The principal algorithmic
parameters are described in the Initialization Step, and
recommended values of these parameters are provided
later in Remark 1.
Notation:
• counters: k ≡ total iteration counter, ‘ ≡ outer loop
iteration counter, ≡ current inner loop’s iteration
counter, and
≡ counter of ongoing consecutive
nonimprovements.
• For any iteration k: xk = iterate; fk = f(xk );
gk = subgradient of f at xk ; dk = direction; k =
step-length; k = step-length parameter; and zk =
incumbent solution value. Also, k = accumulated
improvements within the current set of inner loop
iterations until the beginning of iteration k.
• For any outer loop ‘: w‘ = target value, and ‘ =
acceptance tolerance for declaring that the current
incumbent value is close enough to the target value
w‘ .
• Optimum: x∗ ∈ argmin{f(x): x ∈ X }, and f∗ ≡
f(x∗ ). Also, xˆ = incumbent solution and gˆ = available subgradient of f at x.
ˆ
H.D. Sherali et al. / Operations Research Letters 26 (2000) 1–8
• Algorithmic parameters: = acceptance interval
parameter, = fraction of cumulative inner loop improvement that is used to decrease the target value,
=
maximum allowable iterations in the inner loop
without coming within the acceptance tolerance of
the target value, and
=
maximum consecutive nonimprovements permitted within the inner loop.
Algorithm (VTVM)
fk − w‘
:
kdk k2
¿,
go to Step 3(b); else, set k+1 = k , increment
k and by one, and return to Step 1.
Step 2(b) (Nonimprovement in the inner loop). Put
go
zk+1 = zk , and increment
by one. If
¿
or ¿,
to Step 3(b). Otherwise, set k+1 = k , increment k
and by one, and return to Step 1.
Step 3(a) (Outer loop success iteration: zk+1 6w‘ +
‘ ). Compute
w‘+1 = zk+1 − ‘ − k+1
Initialization. Select the step-length parameter tolerance 0 ¡ 61,
and termination parameters 0 ¿0 for
the tolerance on subgradient norms, ¿ 0 for the overall convergence tolerance, and kmax 6∞ for the limit
on the maximum number of iterations. Select values for the algorithmic parameters ∈ (0; 13 ]; ∈
(0; 1]; ;
,
and let 1 ∈ [; 1] (see Remark 1 below for
recommended values).
Select a starting solution x1 ∈ X , compute f1 ≡
f(x1 ) and let d1 = −g1 . If kg1 k60 , then stop with
x1 as the prescribed solution. Otherwise, set xˆ = x1
and gˆ = g1 , and record z1 = f1 as the best known
objective function value. Initialize the target value
w1 = max{LB; f1 − kg1 k2 =2} and the acceptance tolerance 1 = (f1 − w1 ), where LB is any known lower
bound on f∗ , being taken as −∞ if no such lower
bound is available. (Note that any reasonable value
¡ f1 would suce for the second term in the maximand for w1 . The stated value corresponds to the minimum value of a second-order approximation of f at
x1 with assumed gradient g1 and an identity Hessian.)
Initialize k = ‘ = = 1;
= 0, and 1 = 0.
Step 1 (Inner loop main iteration). If k ¿ kmax ,
stop. Else, determine dk =−gk + k dk−1 , where k ¿0
is selected via any suitable strategy so long as (2)
holds true, and where d0 ≡ 0. If kdk k60 , then set
dk = −gk . Also, compute the step-length
k = k
3
(4)
Find the new iterate xk+1 = PX [xk + k dk ], and determine fk+1 and gk+1 . If kgk+1 k60 , terminate the algorithm with xk+1 as the prescribed solution. Update
k+1 = k + max{0; zk − fk+1 }. If fk+1 ¡ zk , go to
Step 2(a), and otherwise, go to Step 2(b).
Step 2(a) (Improvement in the inner loop). Put
= 0; zk+1 = fk+1 , and update xˆ = xk+1 and gˆ = gk+1 .
If zk+1 6w‘ + ‘ , then go to Step 3(a). Otherwise, if
and
(5)
‘+1 = max{(zk+1 − w‘+1 ); }:
(See Remark 1 below.) Put = 1; k+1 = 0; k+1 = k ,
increment ‘ and k by one, and return to Step 1.
Step 3(b) (Outer loop failure iteration: zk+1 ¿ w‘
+ ‘ ). Compute
w‘+1 =
(zk+1 − ‘ ) + w‘
2
and
(6)
‘+1 = max{(zk+1 − w‘+1 ); }:
(Optionally, if
¿
,
adjust
as recommended in Remark 1 and restart with the incumbent solution as suggested in Remark 2 (no restarts are to be performed
after some nite number of iterations); also, adjust the
step-length parameter k to k+1 ∈ [; 1] if necessary,
as described in Remark 2.) Put
= 0; = 1; k+1 = 0,
increment ‘ and k by one, and return to Step 1.
Remark 1. From a practical eciency point of view,
to ensure adequate step-lengths during improving
phases of the algorithm, we can replace the target
value update in (5) by w‘+1 = zk+1 − max{‘ +
k+1 ; r|zk+1 |}, where 0 ¡ r ¡ 1, and where r is divided by some r ¿ 1 whenever w‘+1 is determined by
the second term in this maximand. (We used r = 0:08
and r = 1:08 in our computations and this strategy gave an improved computational performance.)
The convergence analysis of Section 3 continues to
hold with this modication. Other recommended values of the parameters are = 10−6 ; 0 = 10−6 ; ∈
[10−6 ; 10−1 ] (we used = 0:1 in our computations),
kmax = 2000; 1 = 0:95; = 0:15; ∈ [0:75; 0:95] (we
used = 0:75); = 75, and
being initialized at 20
and then incremented by 10 each time this limit is
reached at Step 3(b) up to a maximum value of 50.
4
H.D. Sherali et al. / Operations Research Letters 26 (2000) 1–8
Remark 2 (Some practical considerations). Note
that a restarting technique is often an important computational ingredient of subgradient procedures (see
[1,7,18]). In the same spirit, for VTVM, whenever
the target needs to be increased at Step 3(b) due
to
consecutive failures, we restart the algorithm by
ˆ gk = g,
ˆ and fk = zk at the end of
setting xk = x;
Step 3(b), and then at the next visit to Step 1, we
adopt dk =−gk . (For convergence purposes, no restarts
are performed after some nite number of iterations.)
Also, if w‘+1 − w‘ 60 · 1 max{1; |zk+1 |}, we replace
k by k+1 = max{k =2; }.
Now, since xk−1 ∈ X , we have, using (4), (7) and the
nonexpansiveness of PX that
dtk−1 (x∗ − xk ) ¿ k−1 (fk−1 − w) − kdk−1 k
×kxk−1 + k−1 dk−1 − xk−1 k
= k−1 (fk−1 − w) − k−1 kdk−1 k2
= k−1 (fk−1 − w)
− k−1 (fk−1 − wˆ k−1 )
= k−1 (wˆ k−1 − w) ¿ 0:
Hence, assertion (8) holds true. Using the denition
of dk , we get
kx∗ − xk+1 k2 = kx∗ − PX (xk + k dk )k2
3. Convergence analysis
6 kx∗ − xk − k dk k2
In this section, we establish the convergence of Algorithm VTVM under kmax = ∞ and 0 = 0.
= kx∗ − xk k2 + k2 kdk k2
− 2k dtk (x∗ − xk )
= kx∗ − xk k2 + k2 kdk k2
Lemma 1. Consider Algorithm VTVM; and for
convenience; let us denote the target value at any
(inner) iteration k by wˆ k . Suppose that there exist
values w and w such that
∗
f 6w ¡ wˆ k ¡ w6f
k
(7)
for all k:
Then we must have both {wˆ k } → w and {fk } → w.
Proof. Let us rst show that
dtk−1 (x∗ − xk )¿0
for all k:
(8)
Note that this is trivially true for k = 1, since d0 = 0.
By induction, consider any k¿2, and assume that
dtk−2 (x∗ − xk−1 )¿0. Using the denition of dk−1 , the
induction hypothesis, the convexity of f, Eq. (7), the
fact that k−1 ∈ [; 1], and the Cauchy–Schwarz inequality, we obtain the following string of relations:
dtk−1 (x∗ − xk ) = dtk−1 (x∗ −xk−1) + dtk−1 (xk−1 −xk)
t
(x∗ − xk−1 )
= − gk−1
t
∗
k−1 dk−2 (x − xk−1 )
+ dtk−1 (xk−1 − xk )
¿ (fk−1 − f∗ ) + dtk−1 (xk−1
+
− xk )
¿ k−1 (fk−1 − w)
− kdk−1 k kxk − xk−1 k
= k−1 (fk−1 − w) − kdk−1 k
× kPX (xk−1 + k−1 dk−1 ) − xk−1 k:
+ 2k gkt (x∗ − xk )
− 2k
∗
t
∗
k dk−1 (x
2
6 kx − xk k +
− xk )
k2 kdk k2
+ 2k (f∗ − fk ):
(9)
The last inequality holds from (8) and the denition of
a subgradient. Now, from (4) and (7), we get k (f∗ −
fk )6k (wˆ k − fk ) = −k (fk − wˆ k )2 =kdk k2 . Hence,
using this in (9) along with (4) and the assumption
on k we have
kx∗ − xk+1 k2 6 kx∗ − xk k2 + k2
− 2k
(fk − wˆ k )2
kdk k2
(fk − wˆ k )2
kdk k2
= kx∗ − xk k2 + k (k − 2)
(fk − wˆ k )2
kdk k2
¡ kx∗ − xk k2 :
Hence, {kx∗ − xk k2 } is a bounded monotone decreasing sequence and is therefore convergent.
Consequently, we have, limk→∞ (fk −wˆ k )2 =kdk k2 =
0. From (2) and (7), this can happen only if both
and this completes the
{fk } → w and {wˆ k } → w,
proof.
It is interesting to note that this analysis requires
k 61, while the ordinary pure subgradient approach
H.D. Sherali et al. / Operations Research Letters 26 (2000) 1–8
with the step-length (4) permits 1 ¡ k 6ˆ ¡ 2 as
well. Now, observe that {zk } is a monotone nonincreasing sequence that is bounded below by f∗ , and
is hence a convergent sequence. Let z be the limit of
this sequence, and consider the following results.
Lemma 2. For Algorithm VTVM; there exists an
outer iteration L such that ‘ ≡ for all ‘¿L.
Proof. Consider Algorithm VTVM for iterations
k¿K such that zk ¡ z + for all k¿K, where
0 ¡ ¡ . Let us examine any outer loop ‘ for which
k¿K. Given w‘ and ‘ , note that w‘+1 and ‘+1
for the subsequent outer loop iteration are given by
either (5) or (6). In the case of (5), having computed w‘+1 = zk+1 − ‘ − k+1 , and noting that
k+1 6k+1 ¡ since all objective values are conned in [z;
z + ), we get
(zk+1 − w‘+1 ) = (‘ + k+1 ) ¡ (‘ + )
¡ (‘ + )6 23 ‘ :
(10)
Similarly, in the case of (6), having computed w‘+1 ,
we have that
(zk+1 − ‘ ) + w‘
(zk+1 − w‘+1 ) = zk+1 −
2
‘ (zk+1 − w‘)
+
:
(11)
=
2
2
Case (i): Suppose that ‘ = . Then from (5) or (6)
at the previous outer loop for some iteration k ′ ¡ k,
we must have had (zk ′ +1 − w‘ )6, which implies
that
(zk+1 − w‘ )6(zk ′ +1 − w‘ )6:
(12)
Now, if ‘+1 is given by (5), we have from (10) that
‘+1 = . On the other hand, if ‘+1 is given by (6), we
have from (11) and (12) that (zk+1 − w‘+1 )6=6 +
=2 = 2=3, and so ‘+1 = as well. Hence, we have
shown that ‘ = implies that ‘+1 = also, once k¿K
for such outer loops.
Case (ii): Suppose that ‘ ¿ . Hence, we have
from (5) or (6) at the previous outer loop that for some
k ′ ¡ k,
‘ = (zk ′ +1 − w‘ )¿(zk+1 − w‘ ):
(13)
Now, if ‘+1 is given by (5), then we either have
‘+1 = , or else from (10), we get ‘+1 62‘ =3. Similarly, if ‘+1 is given by (6), then we either have
5
‘+1 = , or else from (11) and (13), we get ‘+1 =
(zk+1 − w‘+1 )6‘ =2 + ‘ =262‘ =3. In either case,
{‘ } decreases at least at a geometric rate, ultimately
becoming equal to at some nite outer loop iteration
L, after which it remains at from Case (i) above.
This completes the proof.
Lemma 3. Suppose that for Algorithm VTVM;
we have z − − f∗ ¿ ¿ 0 for some satisfying
0 ¡ ¡ . Then there exists an outer iteration ‘˜
such that
z − − ¡ w‘ ¡ z − + for all ‘¿‘:˜
(14)
Proof. Following the proof of Lemma 2, let us consider the algorithmic process once we have k¿K that
is suciently large so that zk ¡ z + for all k¿K,
and that the outer loop index ‘¿L, where L is large
enough so that ‘ ≡ , for all ‘¿L.
Suppose that w‘ 6z − − for some outer loop ‘.
Then, zk+1 ¿ zk+1 − ¿z − ¿w‘ + for all corresponding inner loop iterations k. Hence, the algorithm
continues to increase , and ultimately, increases the
target value at Step 3(b). By successively increasing
the target value according to (6) in this fashion, we
will reach an outer iteration ‘˜ such that w‘˜ ¿ z − −
. Moreover, since any such increase via (6) while
w‘ 6z − − and zk+1 ¡ z + yields
(z + ) − + (z − − )
2
= (z − ) ¡ z − +
w‘+1 ¡
we also have w‘˜ ¡ z − + :
On the other hand, suppose that w‘ ¿z − + for
some outer loop ‘¿L. Note that when (k + 1)¿K,
we have zk+1 ¡ z + 6w‘ + ≡ w‘ + ‘ . But the
condition zk+1 ¡ w‘ + ‘ occurs in the algorithm only
when an improvement in objective value at Step 2(a)
causes zk+1 to fall below w‘ + ‘ . Therefore in this
case, we must have had ‘ = and w‘ ¿z − +
when for the rst time after k +1¿K, an improvement
caused zk+1 to fall below z + , and hence below w‘ +
, thereby triggering a transfer to Step 3(a). Using
(5) with ‘ = , the consequent decrease in the target
value yields w‘+1 = zk+1 − − k+1 ¡ (z + ) −
− k+1 ¡ z − + . Incrementing ‘ by 1, either
this revised w‘ then satises the lower bound in (14)
as well, or else, we have w‘ 6z − − . In the latter
6
H.D. Sherali et al. / Operations Research Letters 26 (2000) 1–8
case, as above, we will again obtain (14) holding for
some ‘˜ after successive increases in the target value.
Therefore, we have shown thus far that for some inner
˜
iteration k¿K
during an outer iteration ‘,˜ we have
z − − ¡ w‘˜ ¡ z − + :
(15)
Now, it remains to show that (15) continues to hold
for all ‘¿‘.˜ Let j(1) be the rst iteration after k˜ at
which the target value is changed at the next outer
loop, so that either the target value is decreased via
(5) or it is increased via (6) to yield, respectively,
w‘+1
˜ = (zj(1) − ) − j(1)
or
(16)
(zj(1) − ) + w‘˜
:
2
For the rst case in (16), we have j(1) 6j(1) ¡
because for k¿K the objective value improves by less
than . Hence, z − + ¿ w‘˜ ¿ w‘+1
˜ = (zj(1) − ) −
j(1) ¿ z − − . The second case in (16) yields,
using (15) that
(zj(1) − ) + w‘˜
z − − ¡ w‘˜ ¡ w‘+1
˜ =
2
(z + − ) + (z − + )
¡
= z − + :
2
continues to satisfy (15).
Hence, in either case, w‘+1
˜
By induction, this completes the proof.
w‘+1
˜ =
Theorem 1. Algorithm VTVM generates a sequence
where z − 6f∗ .
{zk } → z;
Proof. Consider the algorithm after the nal restart
has been performed. Assume on the contrary that
z − ¿ f∗ . We can therefore choose a satisfying ¿ ¿ 0 such that z − − f∗ ¿ ¿ 0. By
Lemma 3, we can nd an outer iteration ‘˜ such that
w ≡ z − − ¡ w‘ ¡ z − + ≡ w for all ‘¿‘.˜
Since f∗ 6w ¡ w‘ ¡ w ¡ z6f
k for all k and ‘ suf
ciently large, we get by Lemma 1 that {fk } → w.
But this is a contradiction because w ¡ z since ¿ ,
and so the proof is complete.
Remark 3. As evident from the proofs of Lemma 3
and Theorem 1, Algorithm VTVM can be operated
with ‘ held xed at ∀‘, and we would still have
{zk } → z where z − 6f∗ . However, from the viewpoint of computational eciency (as veried by our
experiments), it is important to permit variable acceptance tolerances ‘ , as prescribed by the stated procedure. This is so because a small, xed acceptance
tolerance of ‘ = can possibly lead to a sequence of
increases in the target value until the gap (fk − w‘ )
becomes relatively small, thereby resulting in small
step-lengths via (4), and inducing a slow progress at
iterates that are as yet remote from optimality.
4. Computational experience
We now present some preliminary computational
results using 15 convex test problems from the literature. Table 1 gives the sizes and the sources of these
problems, along with the standard, prescribed, starting
solutions.
Table 2 gives the results obtained. All the algorithms tested were coded in C and run on an IBM
RS=6000 computer. Run 1 corresponds to Algorithm VTVM with the ADS de
ection strategy (see
Section 1) and with a maximum limit of kmax = 2000
iterations. Note that we have used a xed set of
parameter values as prescribed in Remark 1, along
with the strategies of Remarks 1 and 2, and with an
initial lower bound LB = −∞ for all the runs. Run
2 corresponds to Kim et al.’s [9] algorithm run for
2000 iterations using their recommended parameters,
including a strong convexity constant of 1 as used in
their runs, and with an initial lower bound of −106
for all the problems. Run 3 is the same as Run 1, but
with an additional improvement-based stopping criterion. Note that as stated, the algorithm is terminated
whenever the iteration count exceeds kmax or when the
norm of the current subgradient becomes suciently
small. Additionally, we can terminate the algorithm
based on its progress in improving the incumbent
value. Hence, in Run 3, we terminate the algorithm
when each of the following conditions holds. (i)
k ¿ 500, (ii) the algorithm has executed Step 3(a)
(outer loop success iteration) at least once, and (iii)
Step 3(b) is visited four consecutive times via Step
2(b), with the average of the relative improvements
k+1 =(zk+1 − w‘ ) over these four visits being less
than or equal to 0.05.
The results indicate that Algorithm VTVM is fairly
robust and viable for a variety of problems, yielding near-optimal solutions with reasonable eort. This
7
H.D. Sherali et al. / Operations Research Letters 26 (2000) 1–8
Table 1
Test problems
Problem
n
Starting solution x1
Source(s)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
5
10
50
48
48
50
30
4
4
6
5
4
4
6
10
(0; 0; 0; 0; 1)
(1; : : : ; 1)
(i − (n + 1)=2; i = 1; : : : ; n)
(0; : : : ; 0)
(0; : : : ; 0)
(0; : : : ; 0)
(0; : : : ; 0)
(0; 0; 0; 0)
(0; 0; 0; 0)
(0; 0; 0; 0; 0; 0)
(0; 0; 0; 0; 1)
(0; 0; 0; 0)
(15; 22; 26; 11)
(0; 0; 0; 0; 0; 0)
(0; : : : ; 0)
Shor’s Problem [11]: Test 1
Lemarechal and Miin [16, p. 151] (MAXQUAD)
Gon’s polyhedral problem [11]: Test 3
Lemarechal and Miin [16, p. 161] (TR48)
Lemarechal and Miin [16, p. 165] (A48)
Kiwiel [11]: Test 5
Kiwiel [11]: Test 6
Streit’s Problem no. 1, Kiwiel [11]: Test 8
Streit’s Problem no. 2, Kiwiel [11]: Test 9
Streit’s Problem no. 3, Kiwiel [11]: Test 10
Hock and Schittkowski [8, p. 105]
Hock and Schittkowski [8, p. 66]
Chatelon et al.’s Minimax Location Problem [11]: Test 13
Chatelon et al.’s Minimax Location Problem [11]: Test 14
Kiwiel [11]: Test 7
Table 2
Computational results for runs 1, 2, and 3
Problem
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
f(x∗ )
22.60016
−0:841408
0.0
−638565
−9870
0.0
0.0
0.707107
1.014214
0.014706
−32:348679
−44:0
23.886767
68.82856
−0:368166
Run 1: VTVM+ADS
Run 2: Kim et al. (1991)
Run 3: Run 1+Improvement-based
stopping criterion
f(xbest )
(cpu s)
f(xbest )
(cpu s)
f(xbest )
Iters
22.600162
−0:801792
0.000002
−626175:71
−9870:00
0.025849
0.398131
0.707107
1.014214
0.128475
−32:3254
−43:950579
24.818542
68.831859
−0:339827
0.05
0.10
0.06
1.15
1.12
0.67
0.18
0.05
0.04
0.67
0.04
0.04
0.42
0.60
0.13
22.685788
−0:786540
481.154344
−560439:57
−9546:11
1.055688
0.058670
0.794324
1.014360
0.119763
−28:144825
−43:971590
26.708094
68.836753
−0:299131
0.04
0.09
0.05
1.16
1.13
0.67
0.16
0.04
0.04
0.68
0.03
0.01
0.39
0.63
0.12
22.600162
−0:801792
0.000002
−626175:71
−9870:00
0.025849
0.407191
0.707107
1.014214
0.128475
−32:317630
−43:74894
24.818605
68.832059
−0:339827
1995
1971
1999
1071
1541
1982
1994
1992
1959
1966
1967
1990
276
796
1980
Legend: f(x∗ ): optimal objective value.
f(xbest ): best objective value obtained by the corresponding algorithm.
cpu s: total execution time (in s) on an IBM RS=6000 computer.
Iters: total number of iterations until termination occurred for Run 3.
might be acceptable in Lagrangian relaxation applications, for example. Moreover, it is simple to implement. In contrast, the bundle method implemented
in Kiwiel [11] typically yields more accurate solutions in signicantly fewer iterations, although each
iteration is more complex in that it requires the solution of a quadratic program.
The results for Run 2 indicate that Kim et al.’s algorithm is quite sensitive to the strong convexity assumption, which does not necessarily hold for these
8
H.D. Sherali et al. / Operations Research Letters 26 (2000) 1–8
test problems. The results for Run 3 indicate that
the improvement-based stopping criterion prescribed
above oers a reasonable alternative to the criterion
based simply on the maximum number of iterations.
Note that for most test problems, progress that is acceptable to this criterion continues until close to the
limit of 2000 iterations.
Finally, we comment that for Run 1, we also
attempted the MGT and the pure subgradient strategies (see Section 1). The ADS strategy performed
the same or better than the MGT (respectively, the
pure subgradient) strategy on 12 (respectively, 13)
out of the 15 test problems. Also, Algorithm VTVM
performed the same or better on 13 out of the 15 test
problems than its variant in which ‘ is held xed at
the value (see Remark 3). In addition, we attempted
to solve some larger sized, randomly generated, dual
assignment test problems. For example, using test
cases of sizes 200 × 200; 250 × 250, and 300 × 300
having optimal values −2157; −2697; −3246, respectively, Algorithm VTVM terminated within 500
iterations in each case, nding solutions of objective
values −2156:46; −2696:78, and −3245:88, consuming a total of 26.8, 42.4 and 61.3 cpu s, respectively.
Acknowledgements
This material is based upon work supported
by the National Science Foundation under Grant
No. DMI-9521398 and DMI-9812047 and the Air
Force Oce of Scientic Research under Grant No.
F-49620-96-1-0274.
References
[1] M.S. Bazaraa, H.D. Sherali, On the choice of step sizes
in subgradient optimization, Eur. J. Oper. Res. 7 (1981)
380–388.
[2] U.G. Brannlund, On relaxation methods for nonsmooth
convex optimization, Ph.D. Thesis, Kungliga Tekniska
Hogskolan, S-100 44 Stockholm, Sweden, 1993.
[3] P.M. Camerini, L. Fratta, F. Maoli, On improving relaxation
methods by modied gradient techniques, Math. Programm.
Study 3 (1975) 26–34.
[4] J.L. Gon, On the convergence rate of subgradient methods,
Math. Programm. 13 (1977) 329–347.
[5] J.L. Gon, K.C. Kiwiel, Convergence of a simple subgradient
level method, GERAD Report G-96-56, 1998, Math.
Programm., to appear.
[6] M. Held, R.M. Karp, The traveling salesman problem and
minimum spanning trees: part II, Math. Programm. 1 (1971)
6–25.
[7] M. Held, P. Wolfe, H.D. Crowder, Validation of subgradient
optimization, Math. Programm. 6 (1974) 62–88.
[8] W. Hock, K. Schittkowski, Test examples for nonlinear
programming codes, Lecture Notes in Economics and
Mathematical Systems, Vol. 187, Springer, Berlin, 1981.
[9] S. Kim, H. Ahn, S. Cho, Variable target value subgradient
method, Math. Programm. 49 (1991) 359–369.
[10] K.C. Kiwiel, Methods of Descent of Nondierentiable
Optimization, Lecture Notes in Mathematics, Vol. 1133,
Springer, Berlin, 1985.
[11] K.C. Kiwiel, Proximity control in bundle methods for convex
nondierentiable minimization, Math. Programm. 46 (1990)
105–122.
[12] K.C. Kiwiel, The eciency of subgradient projection methods
for convex optimization, part I: general level methods; part
II: implementations and extensions, SIAM J. Control Optim.
34 (2) (1996) 660–697.
[13] K.C. Kiwiel, Subgradient method with entropic projections
for convex nondierentiable minimization, J. Optim. Theory
Appl. 96 (1) (1998) 159–173.
[14] A.N. Kulikov, V.R. Fazylov, Convex optimization with
prescribed accuracy, USSR Comput. Maths. Math. Phys.
30(3) (1990) 16 –22. (Zh. Vychisl Mat. i Fiz. 30(5) (1990)
663– 671.)
[15] C. Lemarechal, Numerical experiments in nonsmooth
optimization, in: E.A. Nurminski (Ed.), Progress in
Nondierentiable Optimization, Pergamon Press, Oxford,
1982, pp. 61–84.
[16] C. Lemarechal, R. Miin, Nonsmooth optimization,
Proceedings of an IIASA Workshop, March 28–April 8, 1977,
Pergamon Press, Oxford, 1978.
[17] B.T. Polyak, A general method for solving extremal problems,
Sov. Math. 8 (1967) 593–597.
[18] H.D. Sherali, O. Ulular, A primal-dual conjugate subgradient
algorithm for specially structured linear and convex
programming problems, Appl. Math. Optim. 20 (1989) 193–
221.
[19] H.D. Sherali, G. Choi, Z. Ansari, Memoryless and limited
memory space dilation and reduction algorithms, Department
of Industrial and Systems Engineering, Virginia Polytechnic
Institute and State University, Blacksburg, 1998.
www.elsevier.com/locate/orms
A variable target value method for nondierentiable optimization
Hanif D. Sherali ∗ , Gyunghyun Choi, Cihan H. Tuncbilek
Department of Industrial and Systems Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
Received 1 March 1998; received in revised form 1 July 1999
Abstract
This paper presents a new Variable target value method (VTVM) that can be used in conjunction with pure or de
ected
subgradient strategies. The proposed procedure assumes no a priori knowledge regarding bounds on the optimal value.
The target values are updated iteratively whenever necessary, depending on the information obtained in the process of the
algorithm. Moreover, convergence of the sequence of incumbent solution values to a near-optimum is proved using popular,
practically desirable step-length rules. In addition, the method also allows a wide
exibility in designing subgradient de
ection
strategies by imposing only mild conditions on the de
ection parameter. Some preliminary computational results are reported
c 2000 Elsevier Science B.V. All
on a set of standard test problems in order to demonstrate the viability of this approach.
rights reserved.
Keywords: Nondierentiable optimization; Subgradient algorithm; Conjugate subgradient methods; De
ected subgradient
directions; Variable target value method
1. Introduction
Consider the nondierentiable optimization problem
NDO: Minimize {f(x): x ∈ X }
(1)
where f is a convex function that is not necessarily
dierentiable, and X is a nonempty, closed, convex
subset of Rn . We assume that (1) has an optimum
x∗ and that the set of subgradients of f over X is a
bounded set.
One approach to solve such problems is to use
subgradient optimization methods. Here, given an
∗ Corresponding author. Fax:+1-540-231-3322.
E-mail address: [email protected] (H.D. Sherali)
iterate xk ∈ X; k¿1, a direction of motion dk is generated based on the set of subgradients of f at or about
xk , and a step-length k is taken along this direction.
The new iterate xk+1 is then computed according to
xk+1 = PX [xk + k dk ], where PX [ · ] denotes the projection operation onto the set X . The eectiveness of
this scheme is strongly dependent on the choice of dk
and k . In this paper, we will focus on the more popular pure or de
ected subgradient strategies that have
been spurred by Lagrangian relaxation applications in
which dk is selected as either −gk or −gk + k dk−1 ,
respectively, where gk is a subgradient of f at xk ,
and where k ¿0 is a suitable de
ection parameter
associated with the previous direction of motion, with
d0 ≡ 0. (See [3,6,10,15,17–19] for various subgradient de
ection strategies.) Note that in our analysis,
c 2000 Elsevier Science B.V. All rights reserved.
0167-6377/00/$ - see front matter
PII: S 0 1 6 7 - 6 3 7 7 ( 9 9 ) 0 0 0 6 3 - 2
2
H.D. Sherali et al. / Operations Research Letters 26 (2000) 1–8
we permit the use of any de
ection strategy so long as
the de
ection parameters k ¿0 are chosen such that
kdk k ¡ M for all k; for some suciently
large number M:
(2)
This should be easy to ensure, given our assumption
on the boundedness of the set of subgradients of f
over X . In particular, both the modied gradient technique (MGT ) of Camerini et al. [3] and the average
direction strategy (ADS) of Sherali and Ulular [18]
satisfy condition (2).
As far as related step-length rules are concerned,
these play an important role not only in ensuring convergence, but also in governing the practical rate of
convergence to optimality (see [4,7,17] for example).
The more eective step-length rules, popularized by
Held and Karp [6] and Held et al. [7], are of the type
k = k
f(xk ) − w
;
kdk k2
(3)
where 0 ¡ 6
k 6ˆ ¡ 2, and where w is a target
value.
Often, w is taken as a xed lower bound on the
problem. On the other hand, Bazaraa and Sherali [1]
present some rules to choose the target value as a
convex combination of a xed lower bound and the
current best objective value. A similar idea is used by
Kim et al. [9] in their variable target value method
for minimizing strongly convex functions. However,
both these papers assume that some initial lower
bound estimate is available, and moreover, Kim et
al. [9] additionally assume that an upper bound on
kx1 − x∗ k is known, where x∗ is an optimal solution
to NDO.
Our motivation in this paper is to present a new
variable target value method (VTVM) for general
nondierentiable optimization problems that adopts an
eective subgradient step-length rule of type (3) and
that assumes no a priori knowledge whatsoever regarding the optimal objective value. Brannlund [2] describes an alternate scheme in his dissertation for such
problems, Gon and Kiwiel [5] present a convergence
analysis for a variant of Brannlund’s method, and
Kulikov and Fazylov [14] propose a xed, short-step
variant of our algorithm. For a more extensive discussion on related subgradient projection methods, we
refer the reader to Kiwiel [12,13].
The remainder of this paper is organized as follows. In Section 2, we present the proposed algorithm
VTVM, and we analyze its convergence properties in
Section 3. Section 4, provides some preliminary computational results using a variety of standard test problems from the literature.
2. The algorithm VTVM
The proposed algorithm operates in an inner loop
and an outer loop. The inner loop involves the main
process of generating a sequence of iterates. Depending on the progress during such inner loop iterations,
the target value and other related parameters are periodically updated in an outer loop adjustment step. The
algorithm is designed to theoretically converge in objective value to within any a priori specied tolerance
¿ 0 of the optimal value, while preserving a reasonable degree of computational eectiveness in practice.
It accomplishes the latter by permitting the use of a
variety of subgradient de
ection direction strategies,
and by employing the practically eective step-length
rule (3).
Below, we highlight our notation and then present a
statement of the algorithm. The principal algorithmic
parameters are described in the Initialization Step, and
recommended values of these parameters are provided
later in Remark 1.
Notation:
• counters: k ≡ total iteration counter, ‘ ≡ outer loop
iteration counter, ≡ current inner loop’s iteration
counter, and
≡ counter of ongoing consecutive
nonimprovements.
• For any iteration k: xk = iterate; fk = f(xk );
gk = subgradient of f at xk ; dk = direction; k =
step-length; k = step-length parameter; and zk =
incumbent solution value. Also, k = accumulated
improvements within the current set of inner loop
iterations until the beginning of iteration k.
• For any outer loop ‘: w‘ = target value, and ‘ =
acceptance tolerance for declaring that the current
incumbent value is close enough to the target value
w‘ .
• Optimum: x∗ ∈ argmin{f(x): x ∈ X }, and f∗ ≡
f(x∗ ). Also, xˆ = incumbent solution and gˆ = available subgradient of f at x.
ˆ
H.D. Sherali et al. / Operations Research Letters 26 (2000) 1–8
• Algorithmic parameters: = acceptance interval
parameter, = fraction of cumulative inner loop improvement that is used to decrease the target value,
=
maximum allowable iterations in the inner loop
without coming within the acceptance tolerance of
the target value, and
=
maximum consecutive nonimprovements permitted within the inner loop.
Algorithm (VTVM)
fk − w‘
:
kdk k2
¿,
go to Step 3(b); else, set k+1 = k , increment
k and by one, and return to Step 1.
Step 2(b) (Nonimprovement in the inner loop). Put
go
zk+1 = zk , and increment
by one. If
¿
or ¿,
to Step 3(b). Otherwise, set k+1 = k , increment k
and by one, and return to Step 1.
Step 3(a) (Outer loop success iteration: zk+1 6w‘ +
‘ ). Compute
w‘+1 = zk+1 − ‘ − k+1
Initialization. Select the step-length parameter tolerance 0 ¡ 61,
and termination parameters 0 ¿0 for
the tolerance on subgradient norms, ¿ 0 for the overall convergence tolerance, and kmax 6∞ for the limit
on the maximum number of iterations. Select values for the algorithmic parameters ∈ (0; 13 ]; ∈
(0; 1]; ;
,
and let 1 ∈ [; 1] (see Remark 1 below for
recommended values).
Select a starting solution x1 ∈ X , compute f1 ≡
f(x1 ) and let d1 = −g1 . If kg1 k60 , then stop with
x1 as the prescribed solution. Otherwise, set xˆ = x1
and gˆ = g1 , and record z1 = f1 as the best known
objective function value. Initialize the target value
w1 = max{LB; f1 − kg1 k2 =2} and the acceptance tolerance 1 = (f1 − w1 ), where LB is any known lower
bound on f∗ , being taken as −∞ if no such lower
bound is available. (Note that any reasonable value
¡ f1 would suce for the second term in the maximand for w1 . The stated value corresponds to the minimum value of a second-order approximation of f at
x1 with assumed gradient g1 and an identity Hessian.)
Initialize k = ‘ = = 1;
= 0, and 1 = 0.
Step 1 (Inner loop main iteration). If k ¿ kmax ,
stop. Else, determine dk =−gk + k dk−1 , where k ¿0
is selected via any suitable strategy so long as (2)
holds true, and where d0 ≡ 0. If kdk k60 , then set
dk = −gk . Also, compute the step-length
k = k
3
(4)
Find the new iterate xk+1 = PX [xk + k dk ], and determine fk+1 and gk+1 . If kgk+1 k60 , terminate the algorithm with xk+1 as the prescribed solution. Update
k+1 = k + max{0; zk − fk+1 }. If fk+1 ¡ zk , go to
Step 2(a), and otherwise, go to Step 2(b).
Step 2(a) (Improvement in the inner loop). Put
= 0; zk+1 = fk+1 , and update xˆ = xk+1 and gˆ = gk+1 .
If zk+1 6w‘ + ‘ , then go to Step 3(a). Otherwise, if
and
(5)
‘+1 = max{(zk+1 − w‘+1 ); }:
(See Remark 1 below.) Put = 1; k+1 = 0; k+1 = k ,
increment ‘ and k by one, and return to Step 1.
Step 3(b) (Outer loop failure iteration: zk+1 ¿ w‘
+ ‘ ). Compute
w‘+1 =
(zk+1 − ‘ ) + w‘
2
and
(6)
‘+1 = max{(zk+1 − w‘+1 ); }:
(Optionally, if
¿
,
adjust
as recommended in Remark 1 and restart with the incumbent solution as suggested in Remark 2 (no restarts are to be performed
after some nite number of iterations); also, adjust the
step-length parameter k to k+1 ∈ [; 1] if necessary,
as described in Remark 2.) Put
= 0; = 1; k+1 = 0,
increment ‘ and k by one, and return to Step 1.
Remark 1. From a practical eciency point of view,
to ensure adequate step-lengths during improving
phases of the algorithm, we can replace the target
value update in (5) by w‘+1 = zk+1 − max{‘ +
k+1 ; r|zk+1 |}, where 0 ¡ r ¡ 1, and where r is divided by some r ¿ 1 whenever w‘+1 is determined by
the second term in this maximand. (We used r = 0:08
and r = 1:08 in our computations and this strategy gave an improved computational performance.)
The convergence analysis of Section 3 continues to
hold with this modication. Other recommended values of the parameters are = 10−6 ; 0 = 10−6 ; ∈
[10−6 ; 10−1 ] (we used = 0:1 in our computations),
kmax = 2000; 1 = 0:95; = 0:15; ∈ [0:75; 0:95] (we
used = 0:75); = 75, and
being initialized at 20
and then incremented by 10 each time this limit is
reached at Step 3(b) up to a maximum value of 50.
4
H.D. Sherali et al. / Operations Research Letters 26 (2000) 1–8
Remark 2 (Some practical considerations). Note
that a restarting technique is often an important computational ingredient of subgradient procedures (see
[1,7,18]). In the same spirit, for VTVM, whenever
the target needs to be increased at Step 3(b) due
to
consecutive failures, we restart the algorithm by
ˆ gk = g,
ˆ and fk = zk at the end of
setting xk = x;
Step 3(b), and then at the next visit to Step 1, we
adopt dk =−gk . (For convergence purposes, no restarts
are performed after some nite number of iterations.)
Also, if w‘+1 − w‘ 60 · 1 max{1; |zk+1 |}, we replace
k by k+1 = max{k =2; }.
Now, since xk−1 ∈ X , we have, using (4), (7) and the
nonexpansiveness of PX that
dtk−1 (x∗ − xk ) ¿ k−1 (fk−1 − w) − kdk−1 k
×kxk−1 + k−1 dk−1 − xk−1 k
= k−1 (fk−1 − w) − k−1 kdk−1 k2
= k−1 (fk−1 − w)
− k−1 (fk−1 − wˆ k−1 )
= k−1 (wˆ k−1 − w) ¿ 0:
Hence, assertion (8) holds true. Using the denition
of dk , we get
kx∗ − xk+1 k2 = kx∗ − PX (xk + k dk )k2
3. Convergence analysis
6 kx∗ − xk − k dk k2
In this section, we establish the convergence of Algorithm VTVM under kmax = ∞ and 0 = 0.
= kx∗ − xk k2 + k2 kdk k2
− 2k dtk (x∗ − xk )
= kx∗ − xk k2 + k2 kdk k2
Lemma 1. Consider Algorithm VTVM; and for
convenience; let us denote the target value at any
(inner) iteration k by wˆ k . Suppose that there exist
values w and w such that
∗
f 6w ¡ wˆ k ¡ w6f
k
(7)
for all k:
Then we must have both {wˆ k } → w and {fk } → w.
Proof. Let us rst show that
dtk−1 (x∗ − xk )¿0
for all k:
(8)
Note that this is trivially true for k = 1, since d0 = 0.
By induction, consider any k¿2, and assume that
dtk−2 (x∗ − xk−1 )¿0. Using the denition of dk−1 , the
induction hypothesis, the convexity of f, Eq. (7), the
fact that k−1 ∈ [; 1], and the Cauchy–Schwarz inequality, we obtain the following string of relations:
dtk−1 (x∗ − xk ) = dtk−1 (x∗ −xk−1) + dtk−1 (xk−1 −xk)
t
(x∗ − xk−1 )
= − gk−1
t
∗
k−1 dk−2 (x − xk−1 )
+ dtk−1 (xk−1 − xk )
¿ (fk−1 − f∗ ) + dtk−1 (xk−1
+
− xk )
¿ k−1 (fk−1 − w)
− kdk−1 k kxk − xk−1 k
= k−1 (fk−1 − w) − kdk−1 k
× kPX (xk−1 + k−1 dk−1 ) − xk−1 k:
+ 2k gkt (x∗ − xk )
− 2k
∗
t
∗
k dk−1 (x
2
6 kx − xk k +
− xk )
k2 kdk k2
+ 2k (f∗ − fk ):
(9)
The last inequality holds from (8) and the denition of
a subgradient. Now, from (4) and (7), we get k (f∗ −
fk )6k (wˆ k − fk ) = −k (fk − wˆ k )2 =kdk k2 . Hence,
using this in (9) along with (4) and the assumption
on k we have
kx∗ − xk+1 k2 6 kx∗ − xk k2 + k2
− 2k
(fk − wˆ k )2
kdk k2
(fk − wˆ k )2
kdk k2
= kx∗ − xk k2 + k (k − 2)
(fk − wˆ k )2
kdk k2
¡ kx∗ − xk k2 :
Hence, {kx∗ − xk k2 } is a bounded monotone decreasing sequence and is therefore convergent.
Consequently, we have, limk→∞ (fk −wˆ k )2 =kdk k2 =
0. From (2) and (7), this can happen only if both
and this completes the
{fk } → w and {wˆ k } → w,
proof.
It is interesting to note that this analysis requires
k 61, while the ordinary pure subgradient approach
H.D. Sherali et al. / Operations Research Letters 26 (2000) 1–8
with the step-length (4) permits 1 ¡ k 6ˆ ¡ 2 as
well. Now, observe that {zk } is a monotone nonincreasing sequence that is bounded below by f∗ , and
is hence a convergent sequence. Let z be the limit of
this sequence, and consider the following results.
Lemma 2. For Algorithm VTVM; there exists an
outer iteration L such that ‘ ≡ for all ‘¿L.
Proof. Consider Algorithm VTVM for iterations
k¿K such that zk ¡ z + for all k¿K, where
0 ¡ ¡ . Let us examine any outer loop ‘ for which
k¿K. Given w‘ and ‘ , note that w‘+1 and ‘+1
for the subsequent outer loop iteration are given by
either (5) or (6). In the case of (5), having computed w‘+1 = zk+1 − ‘ − k+1 , and noting that
k+1 6k+1 ¡ since all objective values are conned in [z;
z + ), we get
(zk+1 − w‘+1 ) = (‘ + k+1 ) ¡ (‘ + )
¡ (‘ + )6 23 ‘ :
(10)
Similarly, in the case of (6), having computed w‘+1 ,
we have that
(zk+1 − ‘ ) + w‘
(zk+1 − w‘+1 ) = zk+1 −
2
‘ (zk+1 − w‘)
+
:
(11)
=
2
2
Case (i): Suppose that ‘ = . Then from (5) or (6)
at the previous outer loop for some iteration k ′ ¡ k,
we must have had (zk ′ +1 − w‘ )6, which implies
that
(zk+1 − w‘ )6(zk ′ +1 − w‘ )6:
(12)
Now, if ‘+1 is given by (5), we have from (10) that
‘+1 = . On the other hand, if ‘+1 is given by (6), we
have from (11) and (12) that (zk+1 − w‘+1 )6=6 +
=2 = 2=3, and so ‘+1 = as well. Hence, we have
shown that ‘ = implies that ‘+1 = also, once k¿K
for such outer loops.
Case (ii): Suppose that ‘ ¿ . Hence, we have
from (5) or (6) at the previous outer loop that for some
k ′ ¡ k,
‘ = (zk ′ +1 − w‘ )¿(zk+1 − w‘ ):
(13)
Now, if ‘+1 is given by (5), then we either have
‘+1 = , or else from (10), we get ‘+1 62‘ =3. Similarly, if ‘+1 is given by (6), then we either have
5
‘+1 = , or else from (11) and (13), we get ‘+1 =
(zk+1 − w‘+1 )6‘ =2 + ‘ =262‘ =3. In either case,
{‘ } decreases at least at a geometric rate, ultimately
becoming equal to at some nite outer loop iteration
L, after which it remains at from Case (i) above.
This completes the proof.
Lemma 3. Suppose that for Algorithm VTVM;
we have z − − f∗ ¿ ¿ 0 for some satisfying
0 ¡ ¡ . Then there exists an outer iteration ‘˜
such that
z − − ¡ w‘ ¡ z − + for all ‘¿‘:˜
(14)
Proof. Following the proof of Lemma 2, let us consider the algorithmic process once we have k¿K that
is suciently large so that zk ¡ z + for all k¿K,
and that the outer loop index ‘¿L, where L is large
enough so that ‘ ≡ , for all ‘¿L.
Suppose that w‘ 6z − − for some outer loop ‘.
Then, zk+1 ¿ zk+1 − ¿z − ¿w‘ + for all corresponding inner loop iterations k. Hence, the algorithm
continues to increase , and ultimately, increases the
target value at Step 3(b). By successively increasing
the target value according to (6) in this fashion, we
will reach an outer iteration ‘˜ such that w‘˜ ¿ z − −
. Moreover, since any such increase via (6) while
w‘ 6z − − and zk+1 ¡ z + yields
(z + ) − + (z − − )
2
= (z − ) ¡ z − +
w‘+1 ¡
we also have w‘˜ ¡ z − + :
On the other hand, suppose that w‘ ¿z − + for
some outer loop ‘¿L. Note that when (k + 1)¿K,
we have zk+1 ¡ z + 6w‘ + ≡ w‘ + ‘ . But the
condition zk+1 ¡ w‘ + ‘ occurs in the algorithm only
when an improvement in objective value at Step 2(a)
causes zk+1 to fall below w‘ + ‘ . Therefore in this
case, we must have had ‘ = and w‘ ¿z − +
when for the rst time after k +1¿K, an improvement
caused zk+1 to fall below z + , and hence below w‘ +
, thereby triggering a transfer to Step 3(a). Using
(5) with ‘ = , the consequent decrease in the target
value yields w‘+1 = zk+1 − − k+1 ¡ (z + ) −
− k+1 ¡ z − + . Incrementing ‘ by 1, either
this revised w‘ then satises the lower bound in (14)
as well, or else, we have w‘ 6z − − . In the latter
6
H.D. Sherali et al. / Operations Research Letters 26 (2000) 1–8
case, as above, we will again obtain (14) holding for
some ‘˜ after successive increases in the target value.
Therefore, we have shown thus far that for some inner
˜
iteration k¿K
during an outer iteration ‘,˜ we have
z − − ¡ w‘˜ ¡ z − + :
(15)
Now, it remains to show that (15) continues to hold
for all ‘¿‘.˜ Let j(1) be the rst iteration after k˜ at
which the target value is changed at the next outer
loop, so that either the target value is decreased via
(5) or it is increased via (6) to yield, respectively,
w‘+1
˜ = (zj(1) − ) − j(1)
or
(16)
(zj(1) − ) + w‘˜
:
2
For the rst case in (16), we have j(1) 6j(1) ¡
because for k¿K the objective value improves by less
than . Hence, z − + ¿ w‘˜ ¿ w‘+1
˜ = (zj(1) − ) −
j(1) ¿ z − − . The second case in (16) yields,
using (15) that
(zj(1) − ) + w‘˜
z − − ¡ w‘˜ ¡ w‘+1
˜ =
2
(z + − ) + (z − + )
¡
= z − + :
2
continues to satisfy (15).
Hence, in either case, w‘+1
˜
By induction, this completes the proof.
w‘+1
˜ =
Theorem 1. Algorithm VTVM generates a sequence
where z − 6f∗ .
{zk } → z;
Proof. Consider the algorithm after the nal restart
has been performed. Assume on the contrary that
z − ¿ f∗ . We can therefore choose a satisfying ¿ ¿ 0 such that z − − f∗ ¿ ¿ 0. By
Lemma 3, we can nd an outer iteration ‘˜ such that
w ≡ z − − ¡ w‘ ¡ z − + ≡ w for all ‘¿‘.˜
Since f∗ 6w ¡ w‘ ¡ w ¡ z6f
k for all k and ‘ suf
ciently large, we get by Lemma 1 that {fk } → w.
But this is a contradiction because w ¡ z since ¿ ,
and so the proof is complete.
Remark 3. As evident from the proofs of Lemma 3
and Theorem 1, Algorithm VTVM can be operated
with ‘ held xed at ∀‘, and we would still have
{zk } → z where z − 6f∗ . However, from the viewpoint of computational eciency (as veried by our
experiments), it is important to permit variable acceptance tolerances ‘ , as prescribed by the stated procedure. This is so because a small, xed acceptance
tolerance of ‘ = can possibly lead to a sequence of
increases in the target value until the gap (fk − w‘ )
becomes relatively small, thereby resulting in small
step-lengths via (4), and inducing a slow progress at
iterates that are as yet remote from optimality.
4. Computational experience
We now present some preliminary computational
results using 15 convex test problems from the literature. Table 1 gives the sizes and the sources of these
problems, along with the standard, prescribed, starting
solutions.
Table 2 gives the results obtained. All the algorithms tested were coded in C and run on an IBM
RS=6000 computer. Run 1 corresponds to Algorithm VTVM with the ADS de
ection strategy (see
Section 1) and with a maximum limit of kmax = 2000
iterations. Note that we have used a xed set of
parameter values as prescribed in Remark 1, along
with the strategies of Remarks 1 and 2, and with an
initial lower bound LB = −∞ for all the runs. Run
2 corresponds to Kim et al.’s [9] algorithm run for
2000 iterations using their recommended parameters,
including a strong convexity constant of 1 as used in
their runs, and with an initial lower bound of −106
for all the problems. Run 3 is the same as Run 1, but
with an additional improvement-based stopping criterion. Note that as stated, the algorithm is terminated
whenever the iteration count exceeds kmax or when the
norm of the current subgradient becomes suciently
small. Additionally, we can terminate the algorithm
based on its progress in improving the incumbent
value. Hence, in Run 3, we terminate the algorithm
when each of the following conditions holds. (i)
k ¿ 500, (ii) the algorithm has executed Step 3(a)
(outer loop success iteration) at least once, and (iii)
Step 3(b) is visited four consecutive times via Step
2(b), with the average of the relative improvements
k+1 =(zk+1 − w‘ ) over these four visits being less
than or equal to 0.05.
The results indicate that Algorithm VTVM is fairly
robust and viable for a variety of problems, yielding near-optimal solutions with reasonable eort. This
7
H.D. Sherali et al. / Operations Research Letters 26 (2000) 1–8
Table 1
Test problems
Problem
n
Starting solution x1
Source(s)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
5
10
50
48
48
50
30
4
4
6
5
4
4
6
10
(0; 0; 0; 0; 1)
(1; : : : ; 1)
(i − (n + 1)=2; i = 1; : : : ; n)
(0; : : : ; 0)
(0; : : : ; 0)
(0; : : : ; 0)
(0; : : : ; 0)
(0; 0; 0; 0)
(0; 0; 0; 0)
(0; 0; 0; 0; 0; 0)
(0; 0; 0; 0; 1)
(0; 0; 0; 0)
(15; 22; 26; 11)
(0; 0; 0; 0; 0; 0)
(0; : : : ; 0)
Shor’s Problem [11]: Test 1
Lemarechal and Miin [16, p. 151] (MAXQUAD)
Gon’s polyhedral problem [11]: Test 3
Lemarechal and Miin [16, p. 161] (TR48)
Lemarechal and Miin [16, p. 165] (A48)
Kiwiel [11]: Test 5
Kiwiel [11]: Test 6
Streit’s Problem no. 1, Kiwiel [11]: Test 8
Streit’s Problem no. 2, Kiwiel [11]: Test 9
Streit’s Problem no. 3, Kiwiel [11]: Test 10
Hock and Schittkowski [8, p. 105]
Hock and Schittkowski [8, p. 66]
Chatelon et al.’s Minimax Location Problem [11]: Test 13
Chatelon et al.’s Minimax Location Problem [11]: Test 14
Kiwiel [11]: Test 7
Table 2
Computational results for runs 1, 2, and 3
Problem
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
f(x∗ )
22.60016
−0:841408
0.0
−638565
−9870
0.0
0.0
0.707107
1.014214
0.014706
−32:348679
−44:0
23.886767
68.82856
−0:368166
Run 1: VTVM+ADS
Run 2: Kim et al. (1991)
Run 3: Run 1+Improvement-based
stopping criterion
f(xbest )
(cpu s)
f(xbest )
(cpu s)
f(xbest )
Iters
22.600162
−0:801792
0.000002
−626175:71
−9870:00
0.025849
0.398131
0.707107
1.014214
0.128475
−32:3254
−43:950579
24.818542
68.831859
−0:339827
0.05
0.10
0.06
1.15
1.12
0.67
0.18
0.05
0.04
0.67
0.04
0.04
0.42
0.60
0.13
22.685788
−0:786540
481.154344
−560439:57
−9546:11
1.055688
0.058670
0.794324
1.014360
0.119763
−28:144825
−43:971590
26.708094
68.836753
−0:299131
0.04
0.09
0.05
1.16
1.13
0.67
0.16
0.04
0.04
0.68
0.03
0.01
0.39
0.63
0.12
22.600162
−0:801792
0.000002
−626175:71
−9870:00
0.025849
0.407191
0.707107
1.014214
0.128475
−32:317630
−43:74894
24.818605
68.832059
−0:339827
1995
1971
1999
1071
1541
1982
1994
1992
1959
1966
1967
1990
276
796
1980
Legend: f(x∗ ): optimal objective value.
f(xbest ): best objective value obtained by the corresponding algorithm.
cpu s: total execution time (in s) on an IBM RS=6000 computer.
Iters: total number of iterations until termination occurred for Run 3.
might be acceptable in Lagrangian relaxation applications, for example. Moreover, it is simple to implement. In contrast, the bundle method implemented
in Kiwiel [11] typically yields more accurate solutions in signicantly fewer iterations, although each
iteration is more complex in that it requires the solution of a quadratic program.
The results for Run 2 indicate that Kim et al.’s algorithm is quite sensitive to the strong convexity assumption, which does not necessarily hold for these
8
H.D. Sherali et al. / Operations Research Letters 26 (2000) 1–8
test problems. The results for Run 3 indicate that
the improvement-based stopping criterion prescribed
above oers a reasonable alternative to the criterion
based simply on the maximum number of iterations.
Note that for most test problems, progress that is acceptable to this criterion continues until close to the
limit of 2000 iterations.
Finally, we comment that for Run 1, we also
attempted the MGT and the pure subgradient strategies (see Section 1). The ADS strategy performed
the same or better than the MGT (respectively, the
pure subgradient) strategy on 12 (respectively, 13)
out of the 15 test problems. Also, Algorithm VTVM
performed the same or better on 13 out of the 15 test
problems than its variant in which ‘ is held xed at
the value (see Remark 3). In addition, we attempted
to solve some larger sized, randomly generated, dual
assignment test problems. For example, using test
cases of sizes 200 × 200; 250 × 250, and 300 × 300
having optimal values −2157; −2697; −3246, respectively, Algorithm VTVM terminated within 500
iterations in each case, nding solutions of objective
values −2156:46; −2696:78, and −3245:88, consuming a total of 26.8, 42.4 and 61.3 cpu s, respectively.
Acknowledgements
This material is based upon work supported
by the National Science Foundation under Grant
No. DMI-9521398 and DMI-9812047 and the Air
Force Oce of Scientic Research under Grant No.
F-49620-96-1-0274.
References
[1] M.S. Bazaraa, H.D. Sherali, On the choice of step sizes
in subgradient optimization, Eur. J. Oper. Res. 7 (1981)
380–388.
[2] U.G. Brannlund, On relaxation methods for nonsmooth
convex optimization, Ph.D. Thesis, Kungliga Tekniska
Hogskolan, S-100 44 Stockholm, Sweden, 1993.
[3] P.M. Camerini, L. Fratta, F. Maoli, On improving relaxation
methods by modied gradient techniques, Math. Programm.
Study 3 (1975) 26–34.
[4] J.L. Gon, On the convergence rate of subgradient methods,
Math. Programm. 13 (1977) 329–347.
[5] J.L. Gon, K.C. Kiwiel, Convergence of a simple subgradient
level method, GERAD Report G-96-56, 1998, Math.
Programm., to appear.
[6] M. Held, R.M. Karp, The traveling salesman problem and
minimum spanning trees: part II, Math. Programm. 1 (1971)
6–25.
[7] M. Held, P. Wolfe, H.D. Crowder, Validation of subgradient
optimization, Math. Programm. 6 (1974) 62–88.
[8] W. Hock, K. Schittkowski, Test examples for nonlinear
programming codes, Lecture Notes in Economics and
Mathematical Systems, Vol. 187, Springer, Berlin, 1981.
[9] S. Kim, H. Ahn, S. Cho, Variable target value subgradient
method, Math. Programm. 49 (1991) 359–369.
[10] K.C. Kiwiel, Methods of Descent of Nondierentiable
Optimization, Lecture Notes in Mathematics, Vol. 1133,
Springer, Berlin, 1985.
[11] K.C. Kiwiel, Proximity control in bundle methods for convex
nondierentiable minimization, Math. Programm. 46 (1990)
105–122.
[12] K.C. Kiwiel, The eciency of subgradient projection methods
for convex optimization, part I: general level methods; part
II: implementations and extensions, SIAM J. Control Optim.
34 (2) (1996) 660–697.
[13] K.C. Kiwiel, Subgradient method with entropic projections
for convex nondierentiable minimization, J. Optim. Theory
Appl. 96 (1) (1998) 159–173.
[14] A.N. Kulikov, V.R. Fazylov, Convex optimization with
prescribed accuracy, USSR Comput. Maths. Math. Phys.
30(3) (1990) 16 –22. (Zh. Vychisl Mat. i Fiz. 30(5) (1990)
663– 671.)
[15] C. Lemarechal, Numerical experiments in nonsmooth
optimization, in: E.A. Nurminski (Ed.), Progress in
Nondierentiable Optimization, Pergamon Press, Oxford,
1982, pp. 61–84.
[16] C. Lemarechal, R. Miin, Nonsmooth optimization,
Proceedings of an IIASA Workshop, March 28–April 8, 1977,
Pergamon Press, Oxford, 1978.
[17] B.T. Polyak, A general method for solving extremal problems,
Sov. Math. 8 (1967) 593–597.
[18] H.D. Sherali, O. Ulular, A primal-dual conjugate subgradient
algorithm for specially structured linear and convex
programming problems, Appl. Math. Optim. 20 (1989) 193–
221.
[19] H.D. Sherali, G. Choi, Z. Ansari, Memoryless and limited
memory space dilation and reduction algorithms, Department
of Industrial and Systems Engineering, Virginia Polytechnic
Institute and State University, Blacksburg, 1998.