Directory UMM :Data Elmu:jurnal:A:Advances In Water Resources:Vol23.Issue8.2000:
Advances in Water Resources 23 (2000) 867±880
www.elsevier.com/locate/advwatres
A least-squares penalty method algorithm for inverse problems of
steady-state aquifer models
Hailong Li *, Qichang Yang
Institute of Biomathematics, Anshan Normal College, Anshan 114005, People's Republic of China
Abstract
Based on the generalized Gauss±Newton method, a new algorithm to minimize the objective function of the penalty method in
(Bentley LR. Adv Wat Res 1993;14:137±48) for inverse problems of steady-state aquifer models is proposed. Through detailed
analysis of the ``built-in'' but irregular weighting eects of the coecient matrix on the residuals on the discrete governing equations,
a so-called scaling matrix is introduced to improve the great irregular weighting eects of these residuals adaptively in every Gauss±
Newton iteration. Numerical results demonstrate that if the scaling matrix equals the identity matrix (i.e., the irregular weighting
eects of the coecient matrix are not balanced), our algorithm does not perform well, e.g., the computation cost is higher than that
of the traditional method, and what is worse is the calculations fail to converge for some initial values of the unknown parameters.
This poor situation takes a favourable turn dramatically if the scaling matrix is slightly improved and a simple preconditioning
technique is adopted: For naturally chosen simple diagonal forms of the scaling matrix and the preconditioner, the method performs
well and gives accurate results with low computational cost just like the traditional methods, and improvements are obtained on: (1)
widening the range of the initial values of the unknown parameters within which the minimizing iterations can converge, (2) reducing the computational cost in every Gauss±Newton iteration, (3) improving the irregular weighting eects of the coecient
matrix of the discrete governing equations. Consequently, the example inverse problem in Bentley (loc. cit.) is solved with the same
accuracy, less computational eort and without the regularization term containing prior information on the unknown parameters.
Moreover, numerical example shows that this method can solve the inverse problem of the quasilinear Boussinesq equation almost
as fast as the linear one.
In every Gauss±Newton iteration of our algorithm, one needs to solve a linear least-squares system about the corrections of both
the parameters and the groundwater heads on all the discrete nodes only once. In comparison, every Gauss±Newton iteration of the
traditional method has to solve the discrete governing equations as many times as one plus the number of unknown parameters or
head observation wells (Yeh WW-G. Wat Resour Res 1986;22:95±108).
All these facts demonstrate the potential of the algorithm to solve inverse problems of more complicated non-linear aquifer
models naturally and quickly on the basis of ®nding suitable forms of the scaling matrix and the preconditioner. Ó 2000 Elsevier
Science Ltd. All rights reserved.
Keywords: Penalty method; Inverse problem; Scaling matrix; Preconditioner; Aquifer models; LSQR algorithm; Parameter
identi®cation; Gauss±Newton method; Finite element method
1. Introduction
The traditional least-square methods for distributed
parameter identi®cation of groundwater models minimize a quadratic objective function with respect to the
parameters subject to the discrete governing equations
of an elliptic or parabolic aquifer model (see e.g., [4±
*
Corresponding author. Present address: Department of Earth
Sciences, The University of Hong Kong, Hong Kong SAR, People's
Republic of China. Tel.: +852-2857-8278; fax: +852-2517-6912.
E-mail addresses: [email protected], [email protected]
(H. Li).
6,8,9,14±16,21]). This means that the discrete equations
of the aquifer models are regarded as an exact description of the real aquifer. But in practice there exist not
only measurement errors of heads but also errors of the
conceptual models, of the discretization of the models,
of the determination of boundary conditions and source
terms etc. Hence the concept of measurement error must
be generalized such that it includes also the other errors
mentioned above. Such a generalization of the
measurement error concept is particularly important in
least-squares and maximum likelihood methods of
groundwater inverse problems. For this reason, exact
enforcement of the discrete governing equations may
0309-1708/00/$ - see front matter Ó 2000 Elsevier Science Ltd. All rights reserved.
PII: S 0 3 0 9 - 1 7 0 8 ( 0 0 ) 0 0 0 1 8 - X
868
H. Li, Q. Yang / Advances in Water Resources 23 (2000) 867±880
Notation
A
C
Dc2
H
h x; y
J
Jg
Jgmp
Jm
Jp
Jt
n
nm
np
P
P
Rg
Rgm
n n coecient matrix of the discrete
aquifer model (see (9), (10a)±(10c)).
precondition
n np n np
matrix.
a diagonal form of C de®ned by (37)
and (38).
the piezometric head or watertable
vector (see (5)).
the piezometric head or watertable, m.
nnm np nnp Jacobian matrix
de®ned by (32) and (39).
weighted square sum of the discrete
governing equation residuals (see (15)
and (20)).
the objective function of the penalty
method de®ned in (19).
weighted square sum of the head
observation residuals (see (13) and
(17)).
weighted square sum of the parameter prior information residuals
(see (14) and (18)).
traditional objective function de®ned
in (16).
the number of the discrete nodes.
the number of observation wells.
the number of the unknown parameters (see (4a), (4b) and (11)).
n np Jacobian matrix de®ned in
(39), (41) and (42).
np -dimensional unknown parameter
(log-transimissivity) vector de®ned by
(11).
n-dimensional discrete governing
equation residual vector de®ned by
(15).
not be necessarily appropriate, and may actually degrade the head and parameter estimation [2,15]. This is
the ®rst motivation of this paper.
Bentley [2] formed a penalty method least-squares
objective function by adding a general positive de®nite
quadratic form of the discrete governing equation residuals into the traditional objective function and minimized the objective function without any constraints
using a ``damped direction-search and linesearch algorithm''. Then he suggested at the end of his paper on
p. 147 that
Convergence, as well as selecting an optimum solution technique, remain topics for future investigation.
Rgr
Rm
T x; y and Ti
U
UE
U1
W
Wg
Wm
Wp
Yi 1 6 i 6 np
Cd and Cn
DP
k2g
kp
v
an nm -dimensional vector composed
of the components of Rg whose indices equal those of the nodes coincident with the head observation
wells.
an n ÿ nm -dimensional vector composed of the rest of the components
of Rg except those of Rgm .
nm -dimensional head observation
residual vector de®ned by (13).
unknown transmissivity, m2 /day (see
(1), (3), (4a), (4b) and (11)).
n n scaling matrix (see (15)).
a diagonal form of U de®ned by
(25).
a diagonal form of U de®ned by (24).
block diagonal matrix Diagk2g W g ;
W m ; kp W p .
n n weighting matrix of the residual
vector Rg .
nm nm weighting matrix of the
residual vector Rm .
np np weighting matrix of the
parameter prior information residual
vector Rp .
log-transimissivity (Yi log Ti ).
the
Dirichlet
and
Neumann
boundaries.
the dierence vector between the
computed and true values of the
parameter vector P.
weighting factor of the penalty term
Jg (see (19) and (20)).
weighting factor of the regularization
term Jp (see (16) and (18)).
n np -dimensional vector de®ned
T
by H T ; P T .
Hence, it is necessary to ®nd an eective algorithm to
minimize the penalty method objective function. This is
the second motivation of this paper.
The algorithm proposed in this paper can be brie¯y
introduced as follows.
First, dierent from [2], residuals of the discrete
governing equations are scaled by multiplying them with
a ``scaling matrix'' determined adaptively by the current
coecient matrix of the discrete governing equations.
With the scaled residuals, a similar penalty method
objective function to that in [2] is formed. Detailed numerical analysis shows that the discrete governing
equation residuals are not distributed uniformly on the
discrete nodes due to the wide value range of the discrete
governing equation coecients and their ``built-in'' but
869
H. Li, Q. Yang / Advances in Water Resources 23 (2000) 867±880
irregular weighting eects on these residuals. This irregularity of the residuals can be eciently improved by
using the above mentioned scaling matrix.
Second, based on the generalized Gauss±Newton
method [11,12,21], every iteration minimizing the penalty method objective function is formulated with a
linear least-squares system whose unknowns are the
corrections of both the unknown parameters and the
groundwater heads on all the discrete nodes. The number of linear equations contained in this linear system
has to be more than or at least equal to the number of
unknowns in it. So the number of observation wells has
to be more than or at least equal to the number of unknown parameters when there is no regularization term
containing prior information on the unknown parameters in the objective function.
Finally, a suitable algorithm is chosen to solve this
linear least-squares system which is usually ill-conditioned. There are several sources of this ill-conditioned
state, e.g., the ill-posedness of the inverse aquifer
problems themselves, the wide value range of the discrete governing equation coecients mainly caused by
the non-uniformity of the triangulation and the wide
range of the transimissivity values, the deterioration of
the condition number of the least-squares systems due to
large weighting factor kg of the penalty term used to
enforce the discrete governing equations, etc. Moreover,
this linear least-squares system is a large sparse system
incorporating the discrete governing equations. In order
to overcome these diculties, an iterative least-squares
QR factorization (LSQR) algorithm for sparse linear
least-squares problems [17,18] is chosen to solve it. At
the same time, a simple but eective preconditioner is
implemented to improve the ill-conditioned state.
Numerical results demonstrate that if either the discrete governing equation residuals are not scaled or no
preconditioning technique is adopted, then the penalty
method algorithm performs poorly due to the ill-conditioned state, e.g., the computation cost is higher than
that of the traditional method, and what is worse is, the
calculations fail to converge for some initial values of
the unknown parameters. But the situation can be improved signi®cantly by some naturally chosen diagonal
forms of the scaling matrix and the preconditioner.
Due to the sparse iterative data structure of the
LSQR algorithm, the size of the discrete governing
equations of our algorithm can be as large as that of the
traditional methods.
In every Gauss±Newton iteration of our algorithm,
one needs to solve the linear least-squares system only
once. In comparison, every Gauss±Newton iteration of
the traditional methods has to solve the discrete governing equations as many times as one plus the number
of unknown parameters if one uses the parameter perturbation and numerical dierence quotient method, or
as many times as one plus the number of head obser-
vation wells if one uses the state adjoint sensitivity
equation method (variatonal method) [21]. The total
number of iterations required to minimize the penalty
method objective function is less than or equal to the
number required for the two traditional methods tested.
All these facts demonstrate the potential of this
method to solve inverse problems of more complicated
non-linear aquifer models naturally and quickly on the
basis of ®nding suitable forms of the scaling matrix and
the preconditioner.
The paper is organized as follows. In the next part,
the penalty method objective function and its minimizing algorithm based on the Gauss±Newton method are
given in detail both for a linear aquifer model and a nonlinear model described by the elliptic Boussinesq equation. In Section 3, two numerical examples are given.
One of the examples is taken from [2,6]. The other is the
elliptic Boussinesq equation describing an uncon®ned
aquifer [1,19]. Through these numerical examples, the
eects and utility of both the scaling matrix and the
preconditioner are illustrated. At last, the conclusions
are drawn on the basis of the former sections and
problems that remain to investigate are stated.
2. Algorithm
2.1. Objective function of penalty method
The inverse problems of typical groundwater ¯ow
equations in con®ned and uncon®ned aquifers will be
used as examples to illustrate the algorithm of the leastsquares penalty method. Consider two-dimensional
steady ¯ow described by
ÿr sT rh ÿ Q B h ÿ hr 0;
x; y 2 X
1
in inhomogeneous, isotropic aquifers subject to boundary conditions
h hd x; y;
sT
x; y 2 Cd ;
oh
q x; y;
on
x; y 2 Cn ;
2
3
where
X: Groundwater ¯ow region. It is a given bounded
horizontal domain in x; y plane.
Cd and Cn : The Dirichlet and Neumann boundaries
of X, respectively. Cd [ Cn oX:
x; y and r: Spatial variables and the gradient
operator.
h x; y: Groundwater head of a con®ned aquifer or
watertable of an uncon®ned aquifer. Unknown.
Only ®nite measurement values at given observation
wells are available.
s x; y; h: Speci®ed aquifer-choice function. For con®ned aquifer s 1. In this case (1)±(3) is a linear
870
H. Li, Q. Yang / Advances in Water Resources 23 (2000) 867±880
problem. For uncon®ned aquifer s h ÿ hb x; y
=M [1,19], where hb denotes the given elevation of
the aquifer bottom and it has the same datum plane
as h, and M is a positive constant describing the average thickness of the uncon®ned aquifer. In this,
case (1) represents the elliptic Boussinesq equation
and (1)±(3) becomes a non-linear problem.
T x; y: Transmissivity of the aquifer. Unknown. In
this paper it is assumed that T is piecewise constant
in X, i.e., there are np given subdomains (zones) Xi
of X 1 6 i 6 np , such that
X
np
[
Xi Xi \ Xj ;;
i 6 j 1 6 i; j 6 np
4a
x; y 2 Xi ; 1 6 i 6 np ;
4b
where Ti 1 6 i 6 np are unknown positive constants.
Q x; y: Recharge term determined by factors such
as source, sink, precipitation and evapotranspiration etc.
B x; y and hr : Speci®ed leakage parameters. We
assume B 6 0 for the sake of the well-posedness of
(1)±(3).
o=on: Normal derivative with respect to Cn .
hd x; y: Speci®ed head or watertable on Cd .
q x; y: Speci®ed ¯ux on Cn .
The discrete governing equations of (1)±(3) are the
starting point of our method. One can discretize governing equations (1)±(3) using various methods such as
®nite-dierence, ®nite-volume or ®nite-element method
according to one's preference. In this paper we use linear
triangle ®nite element method to discretize (1)±(3). Let
f xi ; yi ; 1 6 i 6 ng be the position of the nodes of the
triangulation, and /i x; y 1 6 i 6 n be the linear basis
function corresponding to the ith discrete node xi ; yi .
Let
H : h1 ; . . . ; hn
T
5
be the head or watertable vector, where hi denotes the
®nite element approximation to h xi ; yi , superscript ``T''
indicates the transpose of a vector or matrix. Then
y and s x; y; h, the respective ®nite element aph x;
proximations of h x; y and s x; y; h, can be written as
n
X
hi /i x; y;
6
i1
s
n
X
s xi ; yi ; hi /i x; y:
7
i1
Discretized log-transmissivities Yi : log Ti ; 1 6 i 6 np ,
the recharge term Q, the leakage parameters B and hr ,
are constants on each triangle element e, i.e.,
Yi je Y e ;
9
where A is an n n matrix with elements
aij sij bij ;
10a
1 6 i; j 6 n
in which
(
wdP
dij ;
e R
sij
10Y e sr/i r/j dx dy;
i 2 D;
i 62 D;
10b
and
bij
T x; y Ti
y
h x;
A H ; P H F ;
e2Ti \Tj
i1
and
By applying the standard Galerkin ®nite element procedure, the discrete governing equations of (1)±(3) can
be written in matrix form as follows (see e.g., [19]):
Qje Qe ;
Bje Be ;
hr je her :
8
(
0; P
R
ÿ
Be e /i /j dx dy;
i 2 D;
i 62 D;
10c
e2Ti \Tj
where dij equals 1 when i j and 0 when i 6 j, wd is a
positive weighting constant that is chosen large enough
compared with all the other diagonal entries of A for the
discrete Dirichlet boundary conditions to be enforced at
least as accurately as other discrete governing equations.
e is the triangle element, Ti is the set of all the triangle
elements in the neighborhood of the ith node
xi ; yi 1 6 i 6 n), i.e., Ti fej xi ; yi is a vertex of eg.
D : fi : xi ; yi 2 Cd g is the serial number set of all the
discrete nodes on Cd . And as usual, P is the ``logtransmissivity'' vector, i.e.,
T
T
P : Y1 ; . . . ; Ynp log T1 ; . . . ; log Tnp :
11
F is an n-dimensional vector determined by forcing
terms such as Q, B, hr and the boundary conditions of
(1)±(3), whose components can be written as
(
wP
i 2 D;
d hd xi ; yi ;
R
R
fi
Qe Be her e /i dx dy Cn q/i dCn ; i 62 D:
e2Ti
12
If the aquifer is con®ned s 1, then either the conjugate gradient method or the Cholesky decomposition
method is used to solve the discrete aquifer model (9),
otherwise s h ÿ hb =M, (9) is a non-linear problem
and it is solved by iterative method.
As in [2], the observation head residual vector Rm and
parameter prior information residual vector Rp can be
de®ned by
Rm H Hs ÿ Hm ;
13
Rp P P ÿ P ;
14
where Hs and Hm are nm -dimensional vectors whose respective components denote the simulated and measured
head values at the observation wells; nm is the number of
observation wells; P denotes an a priori estimate of P.
In a departure from [2], we de®ne a generalized residual vector of the discrete governing equations as
H. Li, Q. Yang / Advances in Water Resources 23 (2000) 867±880
Rg H ; P U A H ; P H ÿ F ;
15
where U is an n n given non-singular ``scaling'' matrix
used to scale the elements of Rg . Later we will discuss
how to choose it. Then the traditional least-squares
method for aquifer parameter identi®cation can be
written as [4±6,8]
(
min Jt H ; P : Jm Jp
P
16
subject to U ÿ1 Rg H ; P 0;
where
Jm RTm W m Rm ;
17
Jp kp RTp W p Rp
18
and W m is a given nm nm symmetric positive de®nite
matrix determined by the measurement errors of the
head, W p is a given np np symmetric semi-positive
de®nite matrix determined by the errors of the prior
information on P, kp is a non-negative weighting factor
used to measure the exactness and reliability of the prior
information on the unknown parameters compared
to those of the information on the models and head
measurements.
The least-squares penalty method can be written as
min Jgmp H ; P : Jm Jp Jg Jt Jg ;
H ;P
19
where
Jg k2g RTg W g Rg
20
is the penalty term to enforce the discrete governing
equations, W g is a symmetric positive de®nite weighting
matrix, and kg is a weighting factor to control the enforcement of the discrete governing equations. In this
paper we will set W g to be I n , i.e., the nth-order identity
matrix.
The least-squares penalty method objective function
in [2] is the special case of (19) when U I n . Such a
choice of U will cause non-uniformity of the components
of Rg because the matrix A in (15) has irregular weighting
eects due to the great value range in the elements of A.
In order to analyse the weighting eects of A when U
and W g equal I n , we de®ne 0 < r1 6 r2 6 6 rn as the
eigenvalues of the matrix AT A and Vi (1 6 i 6 n) as the
corresponding unit orthogonal eigenvectors. Then
V : V1 ; . . . ; Vn
21
is an orthogonal matrix. De®ne Rh : H ÿ Aÿ1 F as the
computational residual vector between the heads of
least-squares method and those of direct method. Then
from (15), (20) and (21) we have
T
22
Jg UI k2g V T Rh Diagr1 ; . . . ; rn V T Rh :
n
Eq. (22) implies that the weighting factor of the ith
component of V T Rh is k2g ri , 1 6 i 6 n. Due to the non-
871
uniformity of the spatial discretization and the large
range of the transimissivity T values, the value range in
the elements of A is great. This will result in large condition number of AT A, namely the ratio rn =r1 [20]. For
instance, in the simple synthetic examples in [2], our
calculations show that the largest and smallest diagonal
elements of AT A are respectively 4:5 105 and 37.5 with
the latter corresponding to the node at the ``northeast
corner'' of the domain in the example (Fig. 1 in [2] or
this paper). It is well known that r1 is less than or equal
to the smallest diagonal element of AT A and rn greater
than or equal to its greatest diagonal element. Hence the
ratio rn =r1 P 4:5 105 =37:5 1:2 104 . This explains
the interesting phenomenon in the numerical examples
in [2] that the node with the largest residual is always at
the northeast corner of the square domain, because its
weighting factor corresponds to the smallest eigenvalue
of AT A.
Hence we reach the following conclusion: only by
minimizing (19) with a scaling matrix U that can appropriately rectify the non-uniform or irregular
weighting eects of AT A on the components of the residual vector Rg , one can get a scaled ``stable'' residual
vector Rg independent of the wide value range of the
entries of A and make it possible to exploit the deeper
potential information contained in Rg . Moreover, the
scaling matrix will improve the illcondition caused by
the wide value range of the entries of A.
Therefore, a suitable scaling matrix U is necessary,
especially for realistic problems with a wide range of
entries in A. In this paper, three simple forms of the
scaling matrix U are given. Their eects will be shown
and compared in the next section. These forms are:
(a) the identity matrix, i.e.,
U I n;
23
where there is no ``scaling'' or improvement as in [2];
(b) the in®nite row norm form
ÿ1
24
U U 1 : Diag aÿ1
11 ; . . . ; ann ;
where Diag. . . indicates a diagonal matrix, and aii is
the ith diagonal element of A;
(c) the Euclidean row norm (2-norm) form
ÿ1
25
U U E : Diag sÿ1
1 ; . . . ; sn ;
where si is the Euclidean norm of the ith row of A,
i.e.,
!1=2
n
X
2
:
26
aij
si
j1
Although the above forms of U given by (24) and (25)
are dependent on H and P, they will be regarded as
constant during each iteration of the Gauss±Newton
method when minimize (19) in the next section. The
reason to do so is that we are interested in improving or
872
H. Li, Q. Yang / Advances in Water Resources 23 (2000) 867±880
scaling the irregular weighting eects of A with the
simplest calculation without changing the original purpose of the least-squares penalty method. This means
that the penalty term Jg is corrected adaptively at every
iteration according to the change of the entries of A
mainly caused by changes in the parameter vector P.
2.2. Algorithm
Let
v v1 ; . . . ; vn ; . . . ; vnnp T : H T ; P T T :
27
Then the objective function Jgmp de®ned by (19) is a
non-linear function of the variable vector v. The
minimization must be carried out iteratively. Let iit be
the iteration index and v0 be a given initial estimate of v.
At every iteration we wish to determine a correction Dv
of the current iterative value of v, i.e. viit , such that
viit 1 viit Dv;
iit 0; 1; 2; . . .
28
results in a reduction of the objective function Jgmp . The
Gauss±Newton method for obtaining Dv leads to the
following linear system of equations [11,12,21]
J T WJDv ÿJ T WR;
29
where
W Diagk2g W g ; W m ; kp W p
30
and R is the entire residual vector composed of Rg , Rm
and Rp , i.e.,
T
T
R r1 ; . . . ; rn nm np : RTg ; RTm ; RTp ;
31
J is the Jacobian or sensitivity matrix when v viit with
elements
ori
Jij
i ; 1 6 i 6 n nm n p ; 1 6 j 6 n np :
ovj vv it
32
This n nm np n np matrix J is very large because the node number n is usually very large. Hence it
will be time-consuming and require substantial computer-storage to solve (29) directly. Note that (29) is
equivalent to the following linear least-squares system
WRk
33
min kWJDv
Dv
2
is the
where k k2 indicates the L2 -norm of a vector, W
square root or Cholesky decomposition of the positive
de®nite matrix W, i.e.,
W:
TW
W
34
Instead of solving (29) directly, we consider the linear
least-squares system (33) which can be solved using the
iterative LSQR method [17,18] or the direct QR factorization method [10,12]. In our numerical tests both
methods are tried. For relatively small n (e.g., 6100)
both work well. When n is larger, the direct QR factorization method needs more storage and its speed is
comparable to the iterative method. Hence the iterative
LSQR method is prefered, because it can use a sparse
matrix data structure to store the non-zero elements of
J. Consequently, the node number n can be as large as
that of the traditional method.
In order to solve (33) eciently and accurately, a
suitable preconditioning technique is needed. Let C be a
suitable n np n np non-singular matrix which
can improve the condition number of the matrix WJC.
Then instead of solving (33) directly, we ®rst solve the
following system about an n np -dimensional unknown vector n
WRk
35
min kWJCn
2
n
and then calculate
Dv Cn:
36
The simplest form of C is a diagonal matrix. We wish to
improve the condition number of WJC
by choosing a
diagonal matrix C : Dc2 such that all the diagonal elements of Dc2 J T WJDc2 is equal to 1. This leads to the
following form of C
C : Dc2 Diagq1 ; . . . ; qnnp ;
37
where
qj
nn
m np
X
Jij wik Jkj
i;k1
!ÿ1=2
;
1 6 j 6 n np
38
with wik denoting the i; k entry of W. In our numerical
tests this simple procedure accelerates the convergence
of the iterative LSQR algorithm signi®cantly and it also
widens the range of initial values of the unknown
parameter P within which our algorithm can converge
(see Section 4, Table 1).
The main work in implementing the above algorithm
is to calculate J. By calculating directly using (4a), (4b),
(5)±(9), (10a)±(10c), (11)±(15), (27), (31) and (32), setting
the matrix U according to the entries in A of the current
iteration and regarding U as a constant matrix independent of v, J can be written as
0
1
U A Z UP
O A
Mw
39
J@
O
I np
in which Z is an n n matrix with elements
X osik
o AH ÿ F i
hk
ÿ aij
ohj
ohj
k2Ni
(
0P if i 2P
D orR j 62 Ni
e
10Y s0j /j r/i r/k dx dy else;
hk
e
zij
k2Ni
e2Ti \Tk
40
873
H. Li, Q. Yang / Advances in Water Resources 23 (2000) 867±880
Table 1
Comparison of the eects of dierent residual-scaling matrix U and the preconditioner C on the computational cost
a
2
Uniform initial trasmissivity T (m /day)
Method
1.0
10.0
100.0
200.0
(1) U I n ; C I nnp
CPU time (s)
GN Lsqr Slsqr
1
Failed
1
Failed
134.4
9 (174)3342) 16360
177.0
11 (151)3079) 21557
(2) U I n ; C Dc2
CPU time (s)
GN Lsqr Slsqr
1
Failed
71.1
14 (475)710) 8659
51.5
11 (374)719) 6264
55.0
12 (180)714) 6695
(3) U U 1 ; C I nnp
CPU time (s)
GN Lsqr Slsqr
1
Failed
51.4
14 (385)609) 6256
33.0
10 (377)434) 4034
47.0
14 (365)441) 5732
(4) U U E ; C I nnp
CPU time (s)
GN Lsqr Slsqr
1
Failed
51.1
14 (415)587) 6221
32.8
10 (370)429) 3995
47.0
14 (374)438) 5728
(5) U U 1 ; C Dc2
CPU time (s)
GN Lsqr Slsqr
32.6
18 (199)279) 3974
25.8
15 (205)231) 3142
16.3
10 (187)206) 1980
23.5
14 (197)207) 2846
(6) U U E ; C Dc2
CPU time (s)
GN Lsqr Slsqr
31.5
18 (193)265) 3840
25.5
15 (202)228) 3104
17.9
11 (183)203) 2175
23.0
14 (195)203) 2798
(7) B & Y:
CPU time (s)
BY
18.6
12
16.7
11
19.5
13
23.0
15
(8) C & N: CN
71
54
57
90
a
GN: total Gauss±Newton iterations. Lsqr : LSQR iterations within the Gauss±Newton iteration. It diers for dierent Gauss±Newton steps, hence
only its range is listed in parentheses. Every iteration of LSQR includes 3 n np nm 5 n np multiplications and two products between
T
matrices and vectors de®ned by WJMn
and WJM
g, where n and g are n np -dimensional and n np nm -dimensional vectors, respectively
(see [17] p. 62). Slsqr : sum of Lsqr for all the Gauss±Newton iteration. Failed: this means GN > 100 and there is no indication that the minimizing
iteration converges. CN: the number of iterations minimizing the traditional objective function de®ned by (16) by using a method called ``conjugate
gradient algorithm coupled with Newton's method'' described by Carrera and Neuman in [6]. The data of CN are quoted from [6]. BY: the number of
the Gauss±Newton iterations minimizing the traditional objective function Jt de®ned by (16) with the sensitivity coecients used in every Gauss±
Newton iteration approximated by parameter perturbation and numerical dierence quotient method described by Becker and Yeh [3,21].
where s0j os=oh xj ; yj ; hj , and Ni is the serial number set of all the nodes in the neighborhood of the ith
node xi ; yi . For linear aquifer models Z 0 because of
s 1. M w is an nm n matrix whose ith row is an ndimensional vector with its ni th component being 1 and
the others 0. Here ni 1 6 i 6 nm indicates the serial
number of the node located at the ith head observation
well. I np is the np th-order identity matrix. P is an n np
matrix with elements
pij
X oaik
k2Ni
(
oYj
hk
0 if iP
2D P
hk
ln 10
k2Ni
e2Ti \Tj
R
e
e
10Y ej sr/i r/k dx dy else;
41
where
ej :
oY e
oYj
1
0
when e Xj ;
when e
6 Xj ;
1 6 j 6 np
42
and Xj is de®ned by (4a) and (4b). Here it is assumed
that each triangle element e is contained by one of the
subdomains X1 ; . . . ; Xnp .
Based on the above discussion, the penalty method
algorithm can be described as follows:
1. Initialization. Choose a suitable P0 as the initial value
of P. The initial value H0 of H is obtained by solving
T
(9) when P P0 . Set v0 H0T ; P0T , J0 0. Choose
two suitably small numbers dv > 0 and dJ > 0 as
the convergence criteria of the Gauss±Newton iteration. Choose two suitable np -dimensional vectors
P ÿ and P as the lower and upper bounds of the
parameter vector P.
2. For iit 0; 1; 2; . . . repeat steps 3±5.
3. Implement Gauss±Newton method.
(a) Calculate Jacobian matrix J. Calculate Jij jvviit
1 6 i 6 n nm np ; 1 6 j 6 n np using (39)±
(42), while the matrix U in (39) is determined
by the current entries in A. (In numerical examples three forms of U de®ned by (23)±(26) are
used and their utilities are compared.)
874
H. Li, Q. Yang / Advances in Water Resources 23 (2000) 867±880
(b) Calculate Dv. Calculate Dv by solving the linear least-squares system (35) and (36) using iterative LSQR algorithm [17,18].
(c) Update v. Calculate viit 1 using (28).
4. Verify the convergence. Calculate the value of
Jiit 1 : Jgmp using (19). If
jJiit 1 ÿ Jiit j < dJ
or jDvj1 < dv
43
holds, stop, where jDvj1 indicates the in®nity norm of
Dv. else set iit : iit 1.
5. Implement the upper and lower bound constraints to P.
Let P iit 1 be the last corresponding np components of
viit 1 . Judge if P ÿ 6 P iit 1 6 P holds, where the inequality refers to the corresponding components of
the three vectors. If there are components of P iit 1
which are less than their lower bounds or greater than
their upper bounds, then set them to equal their lower
or upper bound values, respectively. Then set the last
corresponding np components of viit 1 to equal those
of P iit 1 and go to step 3.
3. Numerical examples
3.1. Example of linear aquifer model
In order to demonstrate the utility of our algorithm,
the steady-state test problem used by Carrera and
Neuman in [6] and Bentley in [2] will be solved. It can be
described by specifying the factors in the aquifer model
(1)±(3) as follows (see [2,6]):
X. A square domain of 6000 m 6000 m (see
Fig. 1).
Cd and hd . Cd is the southern side of the square domain, hd 100 m.
Cn and q. The other three sides of the domain are Cn .
On the western side the ¯ux q 0:25 m2 /day, on the
northern and eastern sides there is no ¯ow, i.e.,
q 0:
Transmissivity T x; y. The domain X is divided into
constant-transmissivity
zones
of
np 9
2000 m 2000 m with the corresponding transmissivity values Ti 1 6 i 6 9 ranging from 5 to 150
m2 /day (see Fig. 1).
s x; y; h, B x; y and hr . The aquifer is assumed to be
con®ned, i.e., s x; y; h 1. The stratum overlying
and underlying the aquifer are supposed to be impervious, i.e., B x; y 0. Hence Bhr 0.
Q x; y. In the upper half part of the northern three
zones Q 0:274 10ÿ3 m/day, in the lower part of
them Q 0:137 10ÿ3 m/day, in other six zones
Q 0.
Head measurements. They are available at the 18 observation wells shown as points in Fig. 1.
In computation, the ¯ow region is discretized into
12 12 squares, each of which is divided into two tri-
Fig. 1. The domain X, the 18 observation wells, the 9 piecewise constant transmissivity T and the zones.
angle elements. All the head observation wells coincide
with the discrete nodes. In order to obtain head measurements at observation wells, as in [2,6], the ``true''
head values at discrete nodes, which range from 100.0 m
on the southern Direchlet boundary to 205.88 m at the
north-east corner, were numerically calculated using the
true transmissivity values shown in Fig. 1. Then head
values at the 18 observation wells were corrupted by
uncorrelated Gaussian noise of variance 1. Consequently the head measurement weighting matrix W m in
(17) is set to be the nm th-order identity matrix as in [2,6].
To test the utility of our algorithm, the regulation
term Jp containing prior information on P is not used in
all our calculation, i.e., kp 0. But P is restricted from
below and above by choosing P ÿ with uniform components log 0:01 ÿ2 and P with uniform components
log 103 3. The Gauss±Newton iteration stop criteria
are chosen to be dJ 10ÿ7 and dv 10ÿ4 .
The numerical tests are composed of two parts.
The ®rst part is designed to investigate the eects of
dierent forms of the residual scaling matrix U and the
preconditioner C on the calculation speed and numerical results. For this purpose, three dierent forms of U,
i.e., I n , U 1 and U E de®ned by (23)±(26), two dierent
forms of C, i.e., I nnp and Dc2 de®ned by (37) and (38),
and four dierent but uniform initial values of
Ti 1 6 i 6 np 9 as in [6], namely, 1, 10, 100 and
200 m2 /day, are used combinatorially. The weighting
factor kg is respectively ®xed to be 1:0 when U I n
considering the great equivalent weighting eects of AT A
and to be 10.0 for the other two forms of U de®ned by
(24)±(26) because of their counteracting the weighting
eects of AT A. Hence there are altogether
4 2 3 24 dierent cases.
H. Li, Q. Yang / Advances in Water Resources 23 (2000) 867±880
In Table 1 the relevant data on the computational
costs corresponding to the 24 cases are summarized.
The ®rst eight rows corresponding to methods 1±4
show that if either the discrete governing equation residuals are not scaled (i.e., U I n ), or no precondition
technique is adopted (i.e., C I nnp ), our algorithm
performs poorly, e.g., the computation cost is higher
than that of the traditional method, and what is worse is
the Gauss±Newton iterations fail to converge for some
initial values of the unknown parameters. But the situation is always improved with the gradual improvement
of U and C.
The successive four rows corresponding to methods 5
and 6 demonstrate the improvements due to setting
U U 1 or U E (Eqs. (24)±(26)) and C Dc2 (Eqs. (37)
and (38)). The range of initial values of T for which the
Gauss±Newton iteration converges has increased, there
are no longer ``Failed'' calculations for all the four different initial values of T; and the computational eort is
also signi®cantly reduced. For example, when the initial
values of P are uniform log 100 m2 /day, Slsqr , the number of LSQR iterations is reduced from 16 360 for
U I n and C I nnp to 1980 for U U 1 and
C Dc2 . And the CPU time is reduced proportionally.
In the next two rows that begin with ``B & Y'' we
show the computational eort of the method described
by Becker and Yeh [3,21], which uses a Gauss±Newton
method to minimize the objective function Jt de®ned by
(16) means of by the parameter perturbation and numerical dierence quotient method. Comparison of the
CPU times and the number of iterations required by the
``B & Y'' method (i.e., BY) to the results from the improved scaling matrixes demonstrates that the computational cost of our algorithm is reduced to a level
similar to that of the traditional method. This trend
shows potential of our algorithm to solve inverse
problems more quickly than the traditional method on
the basis of ®nding more suitable forms of U and C.
In the last row, CN, the number of iterations required
by the ``conjugate gradient algorithm coupled with
Newton's method'' used by Carrera and Neuman in [6]
to minimize the traditional objective function Jt de®ned
by (16), is quoted. One can see that GN, the number of
Gauss±Newton iterations required by our algorithm, is
less than CN. Unfortunately, the computational costs of
our algorithm and that used in [6] in every iteration
minimizing the objective functions are dicult to compare due to the lack of relevant data.
The ®ve Failed cases in Table 1, which mean that the
iterations fail to converge, can be improved so that the
iterations converge if the Marquardt's method [7,12] is
incorporated in our algorithm. But it takes more iterations and CPU times than those of the other unfailed
cases listed in Table 1 where the Marquardt's method is
not used. The detail will not be stated here due to space
limitation.
875
In Table 2 the numerical results corresponding to the
24 cases mentioned above and to the traditional method
by Becker and Yeh [3,21] are summarized. Theoretically,
(i.e., if there are no round o errors,) dierent forms of
non-singular matrix C do not in¯uence the calculated
results. The numerical results support this fact: the optimization results neither depend on the initial values of
T, nor on the change of C up to ®ve signi®cant ®gures.
Hence, only results corresponding to the three dierent
forms of U are listed in Table 2, and they are valid for
all the combinations of dierent initial values of T and
forms of C in Table 1 which are not Failed. The last
column is the results calculated by using the traditional
method of Becker and Yeh.
The ®rst four rows in Table 2 give the values of Jt , the
traditional objective function that appeared in (19), of
Jg , the weighted square sum of the discrete governing
equation residuals de®ned by (20), of Jgmp , the penalty
method objective function de®ned in (19) and of the
ratio Jg =Jgmp successively. It is well-known that greater
the weighting factors of residuals in a least-squares
system, smaller these residuals will be. Hence the ratio
Jg =Jgmp can be regarded as a measure of the weighting
eects of kg U on the discrete governing equation residuals. According to this measure the weighting eects
of kg U 10:0U E are the weakest, and those of
kg U 1:0I n the strongest.
The data listed under ``Identi®ed T '' and ``The
Statistic Data of log T '' demonstrate that the numerical
results of T dier very slightly when they are calculated
by using our algorithm with the three dierent choices of
kg U and when they are obtained by the traditional
method of Becker and Yeh. For example, r DP , the
standard deviation of the residuals between the computed and true log-transmissivity, ranges only from
2:78 10ÿ3 to 3:33 10ÿ3 , and SSE DP , the standard
square error between the computed and true log-transmissivity, from 2:81 10ÿ3 to 3:34 10ÿ3 .
One may conjecture that the discrete governing
equations will be less accurately satis®ed at the nodes
coincident with the head observation wells than at other
nodes due to the measurement error at the head observation wells. This conjecture can be seen and understood
clearly if ``The statistic data of Rgm and Rgr '' are compared row by row, where Rgm is an nm -dimensional
vector composed of the components of Rg whose indices
equal those of the nodes coincident with the head observation wells and Rgr is an n ÿ nm -dimensional vector
composed of the rest of the components of Rg except
those of Rgm . For example, SSE Rgm , the standard
square error of Rgm , is much greater than SSE Rgr for
all the three choices of kg U. The ratio SSE Rgm /
SSE Rgr ranges from 2.16 to 9.69. The other pair of
statistics jRgm j1 and jRgr j1 , the respective in®nity norms
of Rgm and Rgr , also show the similar phenomenon with
the ratio jRgm j1 =jRgr j1 ranging from 1.96 to 3.06. That
876
H. Li, Q. Yang / Advances in Water Resources 23 (2000) 867±880
Table 2
Comparison of the numerical results of our algorithm and of the traditional method
U
kg
Jt
Jg
Jgmp
Jg =Jgmp
True T
T1 150
T2 150
T3 50
T4 150
T5 50
T6 15
T7 50
T8 15
T9 5
The statistic data of log T
E DP
r DP
SSE DP
In
1.0
a
U1
10.0
16.0
0.115
16.1
0.715%
UE
10.0
14.8
0.674
15.5
4.35%
14.5
0.839
15.3
5.47%
159.10
134.04
51.667
120.77
49.973
15.482
67.600
14.998
5.0329
158.51
133.32
51.652
120.18
50.016
15.474
67.651
14.991
5.0329
Traditional
method
16.2
0.0
0.0%
2
Identi®ed T (m /day)
158.71
137.20
51.608
123.02
49.756
15.516
66.537
15.028
5.0254
160.66
136.71
51.679
123.25
49.758
15.511
67.256
15.025
5.0310
5.89 10ÿ3
2.78 10ÿ3
2.81 10ÿ3
4.90 10ÿ3
3.25 10ÿ3
3.27 10ÿ3
4.24 10ÿ3
3.33 10ÿ3
3.34 10ÿ3
The statistic data of Rgm and Rgr
1.31 10ÿ3
k2g SSE Rgm
2.01 10ÿ2
2.50 10ÿ2
0.0
ÿ3
2.58 10ÿ3
9.69
0.0373
0.0190
1.96
0.0
k2g SSE Rgr
SSE Rgm =SSE Rgr
jRgm j1
jRgr j1
jRgm j1 =jRgm j1
ÿ4
6.05 10
2.16
0.140
0.0458
3.06
2.08 10
9.66
0.0396
0.0170
2.33
7.05 10ÿ3
2.96 10ÿ3
3.01 10ÿ3
0.0
0.0
a
C and the initial values of T can be any one of the combinations in Table 1. DP is the dierence vector between the computed and true values of the
parameter vector P de®ned by (11). Rgm is an nm -dimensional vector composed of the components of the discrete governing equation residual vector
Rg (see (15)) whose indices equal those of the discrete nodes coincident with the head observation wells. Rgr is an n ÿ nm -dimensional vector
composed of the rest of the components of Rg except those of Rgm . For an arbitrary m-dimensional vector : e1 ; . . . ; em T , which can be set to be DP
P
or Rgm or RgrP
, respectively, the vector functions E , r , SSE and jP
j1 are de®ned as follows: E : 1=m mi1 ei indicates the mean;
m
2
m
2
r : 1=m i1 ei ÿ E is the standard deviation; SSE : 1=m i1 ei is the standard square error; jj1 : maxi fje1 j; . . . ; jem jg is the
in®nity norm.
is to say, statistically, the residuals of the discrete governing equations discretized at the head-observationwell nodes are much greater than those of the discrete
governing equations at other nodes. Therefore, the
above-mentioned conjecture is true in the sense of
statistics.
If ``The statistic data of Rgm and Rgr '' are compared
column by column, the following interesting facts can be
found: among the three group values listed in the three
columns, the values of k2g SSE Rgm and k2g SSE Rgr in the
®rst column corresponding to kg U 1:0I n are always
the smallest, but the values of jRgm j1 and jRgr j1 in the
®rst column are always the biggest. These results imply
that the components of the residual vector Rg when
U I n are irregularly distributed and they have been
improved when U is set to be U 1 or U E .
The second part of our numerical tests are designed
to investigate the eects of the weighting factor kg on the
numerical results. To this end the initial values of T are
®xed to be uniform 100 m2 /day, U U E and C Dc2 ,
then our algorithm is used to minimize the penalty
method objective function Jgmp corresponding to dier-
ent k2g , which is respectively set to be 10:0; 102 ; 104 ; 106
and 108 . The results are listed in Table 3.
First let us look at the data listed above ``The computation cost'' in Table 3. If they are compared column
by column, the following facts can be seen:
1. As k2g increases from 10.0 to 106 , the values of Jt and
Jgmp tend to the same constant 16.2 increasingly, and
the values of Jg decreases with an approximate rate of
kÿ2
g . So the ratio Jg =Jgmp decreases also with the same
approximate rate kÿ2
g . In fact, it has been theoretically
proven that Jt , Jgmp and Jg have the asymptotic
properties
lim Jt lim Jgmp C1 ;
kg !1
kg !1
lim k2g Jg C2 ;
kg !1
where C1 is the value of Jt corresponding to the traditional-method solution to problem (16), C2 is a nonnegative constant that vanishes if and only if C1
vanishes [13]. The numerical results in Table 3 show
these properties clearly when k2g 6 106 with the constants C1 and C2 approximating 16.2 and 94:1, respectively. Unfortunately, by k2g 108 , they begin to
877
H. Li, Q. Yang / Advances in Water Resources 23 (2000) 867±880
Table 3
Comparison of the eects of dierent weighting factor kg on the numerical results
k2g
10:0
10
Jt
Jg
Jgmp
Jg =Jgmp
6.95
4.06
11.01
36.9%
14.5
8.38 10ÿ1
15.3
5.47%
True T
T1 150
T2 150
T3 50
T4 150
T5 50
T6 15
T7 50
T8 15
T9 5
Identi®ed T (m2 /day)
130.14
104.24
50.335
94.447
50.690
15.254
70.870
14.779
5.0383
The statistic data of log T
E DP
)2.85 10ÿ2
r DP
9.44 10ÿ3
SSE DP
1.03 10ÿ2
The computation cost
CPU time (s)
GN Lsqr Slsqr
14.7
12 (138)156)
1782
2
a
106
108
16.2
9.39 10ÿ3
16.2
0.058%
16.2
9.41 10ÿ5
16.2
0.000581%
16.2
6.64 10ÿ1
16.9
3.94%
158.51
133.32
51.652
120.18
50.016
15.474
67.651
14.991
5.0329
160.68
136.69
51.675
123.12
49.754
15.511
67.326
15.024
5.0311
160.70
136.72
51.675
123.15
49.752
15.511
67.325
15.024
5.0311
160.76
136.71
51.677
122.96
49.756
15.511
67.495
15.021
5.0311
4.24 10ÿ3
3.33 10ÿ3
3.34 10ÿ3
7.01 10ÿ3
2.95 10ÿ3
3.00 10ÿ3
7.03 10ÿ3
2.95 10ÿ3
3.00 10ÿ3
7.09 10ÿ3
2.99 10ÿ3
3.04 10ÿ3
17.9
11 (183)203)
2175
11.1
7 (175)205)
1354
25.4
14 (104)325)
3089
25.5
20 (71)506)
3097
10
4
a
These results are calculated when U U E ; C Dc2 , initial values of Ti 1 6 i 6 9 equal uniform 100 m2 /day. The signs of this table have the same
meanings as those in Tables 1 and 2.
lose part of their ®delity due to the serious deterioration of the condition number of the weighting
matrix W de®ned by (30).
2. When k2g 10:0, Jg takes comparatively large value
4.06 and the ratio Jg =Jgmp reaches 36.9%, implying
that much of the head measurement noise is ``absorbed'' by the discrete governing equation residuals.
This leads to the large errors in the estimated values of
Ti 1 6 i 6 9 (see the second column listed under
k2g 10:0). When k2g P 102 , accurate and stable estimates of Ti 1 6 i 6 9 are obtained due to the sucient
enforcement of the discrete governing equations, even
in the case of k2g 108 . They also dier very slightly
from each other. Moreover, if one compares our results of Ti 1 6 i 6 9 when 104 6 k2g 6 108 with the traditional method results of Ti listed in the last column
of Table 2, one can easily see that each Ti 1 6 i 6 9
always has the same ®rst three signi®cant ®gures
with only one exception of T7 67:5 when k2g 108 .
3. The facts stated in the above paragraph are also supported by the data listed under ``The statistic data of
log T ''.
In the last two rows the ``CPU Time'' and Slsqr , the
sum of LSQR iterative numbers in all the Gauss±Newton iterations, are listed for dierent values of k2g . One
may see that although they dier from each other, the
average CPU time is only 18.9 s, which is a little less
than 19.5 s, the CPU time of the traditional method
corresponding to the method 7 B & Y and initial
T 100 m2 /day in Table 1. Particularly, when k2g 104 ,
the CPU time of our algorithm is only 11.1 s, 56.9% of
that of the traditional method.
In short, there is a suciently wide workable range of
kg values within which the penalty method algorithm
performs well enough to give accurate estimations of the
unknown parameter T with low computational cost just
like the traditional method.
3.2. Example of non-linear aquifer model described by
Boussinesq equation
In order to demonstrate the advantage of our algorithm that it can solve inverse problem of non-linear
aquifer models naturally and quickly, an uncon®ned
quasilinear aquifer model described by the Boussinesq
equation [1,19] is arti®cially designed by s
www.elsevier.com/locate/advwatres
A least-squares penalty method algorithm for inverse problems of
steady-state aquifer models
Hailong Li *, Qichang Yang
Institute of Biomathematics, Anshan Normal College, Anshan 114005, People's Republic of China
Abstract
Based on the generalized Gauss±Newton method, a new algorithm to minimize the objective function of the penalty method in
(Bentley LR. Adv Wat Res 1993;14:137±48) for inverse problems of steady-state aquifer models is proposed. Through detailed
analysis of the ``built-in'' but irregular weighting eects of the coecient matrix on the residuals on the discrete governing equations,
a so-called scaling matrix is introduced to improve the great irregular weighting eects of these residuals adaptively in every Gauss±
Newton iteration. Numerical results demonstrate that if the scaling matrix equals the identity matrix (i.e., the irregular weighting
eects of the coecient matrix are not balanced), our algorithm does not perform well, e.g., the computation cost is higher than that
of the traditional method, and what is worse is the calculations fail to converge for some initial values of the unknown parameters.
This poor situation takes a favourable turn dramatically if the scaling matrix is slightly improved and a simple preconditioning
technique is adopted: For naturally chosen simple diagonal forms of the scaling matrix and the preconditioner, the method performs
well and gives accurate results with low computational cost just like the traditional methods, and improvements are obtained on: (1)
widening the range of the initial values of the unknown parameters within which the minimizing iterations can converge, (2) reducing the computational cost in every Gauss±Newton iteration, (3) improving the irregular weighting eects of the coecient
matrix of the discrete governing equations. Consequently, the example inverse problem in Bentley (loc. cit.) is solved with the same
accuracy, less computational eort and without the regularization term containing prior information on the unknown parameters.
Moreover, numerical example shows that this method can solve the inverse problem of the quasilinear Boussinesq equation almost
as fast as the linear one.
In every Gauss±Newton iteration of our algorithm, one needs to solve a linear least-squares system about the corrections of both
the parameters and the groundwater heads on all the discrete nodes only once. In comparison, every Gauss±Newton iteration of the
traditional method has to solve the discrete governing equations as many times as one plus the number of unknown parameters or
head observation wells (Yeh WW-G. Wat Resour Res 1986;22:95±108).
All these facts demonstrate the potential of the algorithm to solve inverse problems of more complicated non-linear aquifer
models naturally and quickly on the basis of ®nding suitable forms of the scaling matrix and the preconditioner. Ó 2000 Elsevier
Science Ltd. All rights reserved.
Keywords: Penalty method; Inverse problem; Scaling matrix; Preconditioner; Aquifer models; LSQR algorithm; Parameter
identi®cation; Gauss±Newton method; Finite element method
1. Introduction
The traditional least-square methods for distributed
parameter identi®cation of groundwater models minimize a quadratic objective function with respect to the
parameters subject to the discrete governing equations
of an elliptic or parabolic aquifer model (see e.g., [4±
*
Corresponding author. Present address: Department of Earth
Sciences, The University of Hong Kong, Hong Kong SAR, People's
Republic of China. Tel.: +852-2857-8278; fax: +852-2517-6912.
E-mail addresses: [email protected], [email protected]
(H. Li).
6,8,9,14±16,21]). This means that the discrete equations
of the aquifer models are regarded as an exact description of the real aquifer. But in practice there exist not
only measurement errors of heads but also errors of the
conceptual models, of the discretization of the models,
of the determination of boundary conditions and source
terms etc. Hence the concept of measurement error must
be generalized such that it includes also the other errors
mentioned above. Such a generalization of the
measurement error concept is particularly important in
least-squares and maximum likelihood methods of
groundwater inverse problems. For this reason, exact
enforcement of the discrete governing equations may
0309-1708/00/$ - see front matter Ó 2000 Elsevier Science Ltd. All rights reserved.
PII: S 0 3 0 9 - 1 7 0 8 ( 0 0 ) 0 0 0 1 8 - X
868
H. Li, Q. Yang / Advances in Water Resources 23 (2000) 867±880
Notation
A
C
Dc2
H
h x; y
J
Jg
Jgmp
Jm
Jp
Jt
n
nm
np
P
P
Rg
Rgm
n n coecient matrix of the discrete
aquifer model (see (9), (10a)±(10c)).
precondition
n np n np
matrix.
a diagonal form of C de®ned by (37)
and (38).
the piezometric head or watertable
vector (see (5)).
the piezometric head or watertable, m.
nnm np nnp Jacobian matrix
de®ned by (32) and (39).
weighted square sum of the discrete
governing equation residuals (see (15)
and (20)).
the objective function of the penalty
method de®ned in (19).
weighted square sum of the head
observation residuals (see (13) and
(17)).
weighted square sum of the parameter prior information residuals
(see (14) and (18)).
traditional objective function de®ned
in (16).
the number of the discrete nodes.
the number of observation wells.
the number of the unknown parameters (see (4a), (4b) and (11)).
n np Jacobian matrix de®ned in
(39), (41) and (42).
np -dimensional unknown parameter
(log-transimissivity) vector de®ned by
(11).
n-dimensional discrete governing
equation residual vector de®ned by
(15).
not be necessarily appropriate, and may actually degrade the head and parameter estimation [2,15]. This is
the ®rst motivation of this paper.
Bentley [2] formed a penalty method least-squares
objective function by adding a general positive de®nite
quadratic form of the discrete governing equation residuals into the traditional objective function and minimized the objective function without any constraints
using a ``damped direction-search and linesearch algorithm''. Then he suggested at the end of his paper on
p. 147 that
Convergence, as well as selecting an optimum solution technique, remain topics for future investigation.
Rgr
Rm
T x; y and Ti
U
UE
U1
W
Wg
Wm
Wp
Yi 1 6 i 6 np
Cd and Cn
DP
k2g
kp
v
an nm -dimensional vector composed
of the components of Rg whose indices equal those of the nodes coincident with the head observation
wells.
an n ÿ nm -dimensional vector composed of the rest of the components
of Rg except those of Rgm .
nm -dimensional head observation
residual vector de®ned by (13).
unknown transmissivity, m2 /day (see
(1), (3), (4a), (4b) and (11)).
n n scaling matrix (see (15)).
a diagonal form of U de®ned by
(25).
a diagonal form of U de®ned by (24).
block diagonal matrix Diagk2g W g ;
W m ; kp W p .
n n weighting matrix of the residual
vector Rg .
nm nm weighting matrix of the
residual vector Rm .
np np weighting matrix of the
parameter prior information residual
vector Rp .
log-transimissivity (Yi log Ti ).
the
Dirichlet
and
Neumann
boundaries.
the dierence vector between the
computed and true values of the
parameter vector P.
weighting factor of the penalty term
Jg (see (19) and (20)).
weighting factor of the regularization
term Jp (see (16) and (18)).
n np -dimensional vector de®ned
T
by H T ; P T .
Hence, it is necessary to ®nd an eective algorithm to
minimize the penalty method objective function. This is
the second motivation of this paper.
The algorithm proposed in this paper can be brie¯y
introduced as follows.
First, dierent from [2], residuals of the discrete
governing equations are scaled by multiplying them with
a ``scaling matrix'' determined adaptively by the current
coecient matrix of the discrete governing equations.
With the scaled residuals, a similar penalty method
objective function to that in [2] is formed. Detailed numerical analysis shows that the discrete governing
equation residuals are not distributed uniformly on the
discrete nodes due to the wide value range of the discrete
governing equation coecients and their ``built-in'' but
869
H. Li, Q. Yang / Advances in Water Resources 23 (2000) 867±880
irregular weighting eects on these residuals. This irregularity of the residuals can be eciently improved by
using the above mentioned scaling matrix.
Second, based on the generalized Gauss±Newton
method [11,12,21], every iteration minimizing the penalty method objective function is formulated with a
linear least-squares system whose unknowns are the
corrections of both the unknown parameters and the
groundwater heads on all the discrete nodes. The number of linear equations contained in this linear system
has to be more than or at least equal to the number of
unknowns in it. So the number of observation wells has
to be more than or at least equal to the number of unknown parameters when there is no regularization term
containing prior information on the unknown parameters in the objective function.
Finally, a suitable algorithm is chosen to solve this
linear least-squares system which is usually ill-conditioned. There are several sources of this ill-conditioned
state, e.g., the ill-posedness of the inverse aquifer
problems themselves, the wide value range of the discrete governing equation coecients mainly caused by
the non-uniformity of the triangulation and the wide
range of the transimissivity values, the deterioration of
the condition number of the least-squares systems due to
large weighting factor kg of the penalty term used to
enforce the discrete governing equations, etc. Moreover,
this linear least-squares system is a large sparse system
incorporating the discrete governing equations. In order
to overcome these diculties, an iterative least-squares
QR factorization (LSQR) algorithm for sparse linear
least-squares problems [17,18] is chosen to solve it. At
the same time, a simple but eective preconditioner is
implemented to improve the ill-conditioned state.
Numerical results demonstrate that if either the discrete governing equation residuals are not scaled or no
preconditioning technique is adopted, then the penalty
method algorithm performs poorly due to the ill-conditioned state, e.g., the computation cost is higher than
that of the traditional method, and what is worse is, the
calculations fail to converge for some initial values of
the unknown parameters. But the situation can be improved signi®cantly by some naturally chosen diagonal
forms of the scaling matrix and the preconditioner.
Due to the sparse iterative data structure of the
LSQR algorithm, the size of the discrete governing
equations of our algorithm can be as large as that of the
traditional methods.
In every Gauss±Newton iteration of our algorithm,
one needs to solve the linear least-squares system only
once. In comparison, every Gauss±Newton iteration of
the traditional methods has to solve the discrete governing equations as many times as one plus the number
of unknown parameters if one uses the parameter perturbation and numerical dierence quotient method, or
as many times as one plus the number of head obser-
vation wells if one uses the state adjoint sensitivity
equation method (variatonal method) [21]. The total
number of iterations required to minimize the penalty
method objective function is less than or equal to the
number required for the two traditional methods tested.
All these facts demonstrate the potential of this
method to solve inverse problems of more complicated
non-linear aquifer models naturally and quickly on the
basis of ®nding suitable forms of the scaling matrix and
the preconditioner.
The paper is organized as follows. In the next part,
the penalty method objective function and its minimizing algorithm based on the Gauss±Newton method are
given in detail both for a linear aquifer model and a nonlinear model described by the elliptic Boussinesq equation. In Section 3, two numerical examples are given.
One of the examples is taken from [2,6]. The other is the
elliptic Boussinesq equation describing an uncon®ned
aquifer [1,19]. Through these numerical examples, the
eects and utility of both the scaling matrix and the
preconditioner are illustrated. At last, the conclusions
are drawn on the basis of the former sections and
problems that remain to investigate are stated.
2. Algorithm
2.1. Objective function of penalty method
The inverse problems of typical groundwater ¯ow
equations in con®ned and uncon®ned aquifers will be
used as examples to illustrate the algorithm of the leastsquares penalty method. Consider two-dimensional
steady ¯ow described by
ÿr sT rh ÿ Q B h ÿ hr 0;
x; y 2 X
1
in inhomogeneous, isotropic aquifers subject to boundary conditions
h hd x; y;
sT
x; y 2 Cd ;
oh
q x; y;
on
x; y 2 Cn ;
2
3
where
X: Groundwater ¯ow region. It is a given bounded
horizontal domain in x; y plane.
Cd and Cn : The Dirichlet and Neumann boundaries
of X, respectively. Cd [ Cn oX:
x; y and r: Spatial variables and the gradient
operator.
h x; y: Groundwater head of a con®ned aquifer or
watertable of an uncon®ned aquifer. Unknown.
Only ®nite measurement values at given observation
wells are available.
s x; y; h: Speci®ed aquifer-choice function. For con®ned aquifer s 1. In this case (1)±(3) is a linear
870
H. Li, Q. Yang / Advances in Water Resources 23 (2000) 867±880
problem. For uncon®ned aquifer s h ÿ hb x; y
=M [1,19], where hb denotes the given elevation of
the aquifer bottom and it has the same datum plane
as h, and M is a positive constant describing the average thickness of the uncon®ned aquifer. In this,
case (1) represents the elliptic Boussinesq equation
and (1)±(3) becomes a non-linear problem.
T x; y: Transmissivity of the aquifer. Unknown. In
this paper it is assumed that T is piecewise constant
in X, i.e., there are np given subdomains (zones) Xi
of X 1 6 i 6 np , such that
X
np
[
Xi Xi \ Xj ;;
i 6 j 1 6 i; j 6 np
4a
x; y 2 Xi ; 1 6 i 6 np ;
4b
where Ti 1 6 i 6 np are unknown positive constants.
Q x; y: Recharge term determined by factors such
as source, sink, precipitation and evapotranspiration etc.
B x; y and hr : Speci®ed leakage parameters. We
assume B 6 0 for the sake of the well-posedness of
(1)±(3).
o=on: Normal derivative with respect to Cn .
hd x; y: Speci®ed head or watertable on Cd .
q x; y: Speci®ed ¯ux on Cn .
The discrete governing equations of (1)±(3) are the
starting point of our method. One can discretize governing equations (1)±(3) using various methods such as
®nite-dierence, ®nite-volume or ®nite-element method
according to one's preference. In this paper we use linear
triangle ®nite element method to discretize (1)±(3). Let
f xi ; yi ; 1 6 i 6 ng be the position of the nodes of the
triangulation, and /i x; y 1 6 i 6 n be the linear basis
function corresponding to the ith discrete node xi ; yi .
Let
H : h1 ; . . . ; hn
T
5
be the head or watertable vector, where hi denotes the
®nite element approximation to h xi ; yi , superscript ``T''
indicates the transpose of a vector or matrix. Then
y and s x; y; h, the respective ®nite element aph x;
proximations of h x; y and s x; y; h, can be written as
n
X
hi /i x; y;
6
i1
s
n
X
s xi ; yi ; hi /i x; y:
7
i1
Discretized log-transmissivities Yi : log Ti ; 1 6 i 6 np ,
the recharge term Q, the leakage parameters B and hr ,
are constants on each triangle element e, i.e.,
Yi je Y e ;
9
where A is an n n matrix with elements
aij sij bij ;
10a
1 6 i; j 6 n
in which
(
wdP
dij ;
e R
sij
10Y e sr/i r/j dx dy;
i 2 D;
i 62 D;
10b
and
bij
T x; y Ti
y
h x;
A H ; P H F ;
e2Ti \Tj
i1
and
By applying the standard Galerkin ®nite element procedure, the discrete governing equations of (1)±(3) can
be written in matrix form as follows (see e.g., [19]):
Qje Qe ;
Bje Be ;
hr je her :
8
(
0; P
R
ÿ
Be e /i /j dx dy;
i 2 D;
i 62 D;
10c
e2Ti \Tj
where dij equals 1 when i j and 0 when i 6 j, wd is a
positive weighting constant that is chosen large enough
compared with all the other diagonal entries of A for the
discrete Dirichlet boundary conditions to be enforced at
least as accurately as other discrete governing equations.
e is the triangle element, Ti is the set of all the triangle
elements in the neighborhood of the ith node
xi ; yi 1 6 i 6 n), i.e., Ti fej xi ; yi is a vertex of eg.
D : fi : xi ; yi 2 Cd g is the serial number set of all the
discrete nodes on Cd . And as usual, P is the ``logtransmissivity'' vector, i.e.,
T
T
P : Y1 ; . . . ; Ynp log T1 ; . . . ; log Tnp :
11
F is an n-dimensional vector determined by forcing
terms such as Q, B, hr and the boundary conditions of
(1)±(3), whose components can be written as
(
wP
i 2 D;
d hd xi ; yi ;
R
R
fi
Qe Be her e /i dx dy Cn q/i dCn ; i 62 D:
e2Ti
12
If the aquifer is con®ned s 1, then either the conjugate gradient method or the Cholesky decomposition
method is used to solve the discrete aquifer model (9),
otherwise s h ÿ hb =M, (9) is a non-linear problem
and it is solved by iterative method.
As in [2], the observation head residual vector Rm and
parameter prior information residual vector Rp can be
de®ned by
Rm H Hs ÿ Hm ;
13
Rp P P ÿ P ;
14
where Hs and Hm are nm -dimensional vectors whose respective components denote the simulated and measured
head values at the observation wells; nm is the number of
observation wells; P denotes an a priori estimate of P.
In a departure from [2], we de®ne a generalized residual vector of the discrete governing equations as
H. Li, Q. Yang / Advances in Water Resources 23 (2000) 867±880
Rg H ; P U A H ; P H ÿ F ;
15
where U is an n n given non-singular ``scaling'' matrix
used to scale the elements of Rg . Later we will discuss
how to choose it. Then the traditional least-squares
method for aquifer parameter identi®cation can be
written as [4±6,8]
(
min Jt H ; P : Jm Jp
P
16
subject to U ÿ1 Rg H ; P 0;
where
Jm RTm W m Rm ;
17
Jp kp RTp W p Rp
18
and W m is a given nm nm symmetric positive de®nite
matrix determined by the measurement errors of the
head, W p is a given np np symmetric semi-positive
de®nite matrix determined by the errors of the prior
information on P, kp is a non-negative weighting factor
used to measure the exactness and reliability of the prior
information on the unknown parameters compared
to those of the information on the models and head
measurements.
The least-squares penalty method can be written as
min Jgmp H ; P : Jm Jp Jg Jt Jg ;
H ;P
19
where
Jg k2g RTg W g Rg
20
is the penalty term to enforce the discrete governing
equations, W g is a symmetric positive de®nite weighting
matrix, and kg is a weighting factor to control the enforcement of the discrete governing equations. In this
paper we will set W g to be I n , i.e., the nth-order identity
matrix.
The least-squares penalty method objective function
in [2] is the special case of (19) when U I n . Such a
choice of U will cause non-uniformity of the components
of Rg because the matrix A in (15) has irregular weighting
eects due to the great value range in the elements of A.
In order to analyse the weighting eects of A when U
and W g equal I n , we de®ne 0 < r1 6 r2 6 6 rn as the
eigenvalues of the matrix AT A and Vi (1 6 i 6 n) as the
corresponding unit orthogonal eigenvectors. Then
V : V1 ; . . . ; Vn
21
is an orthogonal matrix. De®ne Rh : H ÿ Aÿ1 F as the
computational residual vector between the heads of
least-squares method and those of direct method. Then
from (15), (20) and (21) we have
T
22
Jg UI k2g V T Rh Diagr1 ; . . . ; rn V T Rh :
n
Eq. (22) implies that the weighting factor of the ith
component of V T Rh is k2g ri , 1 6 i 6 n. Due to the non-
871
uniformity of the spatial discretization and the large
range of the transimissivity T values, the value range in
the elements of A is great. This will result in large condition number of AT A, namely the ratio rn =r1 [20]. For
instance, in the simple synthetic examples in [2], our
calculations show that the largest and smallest diagonal
elements of AT A are respectively 4:5 105 and 37.5 with
the latter corresponding to the node at the ``northeast
corner'' of the domain in the example (Fig. 1 in [2] or
this paper). It is well known that r1 is less than or equal
to the smallest diagonal element of AT A and rn greater
than or equal to its greatest diagonal element. Hence the
ratio rn =r1 P 4:5 105 =37:5 1:2 104 . This explains
the interesting phenomenon in the numerical examples
in [2] that the node with the largest residual is always at
the northeast corner of the square domain, because its
weighting factor corresponds to the smallest eigenvalue
of AT A.
Hence we reach the following conclusion: only by
minimizing (19) with a scaling matrix U that can appropriately rectify the non-uniform or irregular
weighting eects of AT A on the components of the residual vector Rg , one can get a scaled ``stable'' residual
vector Rg independent of the wide value range of the
entries of A and make it possible to exploit the deeper
potential information contained in Rg . Moreover, the
scaling matrix will improve the illcondition caused by
the wide value range of the entries of A.
Therefore, a suitable scaling matrix U is necessary,
especially for realistic problems with a wide range of
entries in A. In this paper, three simple forms of the
scaling matrix U are given. Their eects will be shown
and compared in the next section. These forms are:
(a) the identity matrix, i.e.,
U I n;
23
where there is no ``scaling'' or improvement as in [2];
(b) the in®nite row norm form
ÿ1
24
U U 1 : Diag aÿ1
11 ; . . . ; ann ;
where Diag. . . indicates a diagonal matrix, and aii is
the ith diagonal element of A;
(c) the Euclidean row norm (2-norm) form
ÿ1
25
U U E : Diag sÿ1
1 ; . . . ; sn ;
where si is the Euclidean norm of the ith row of A,
i.e.,
!1=2
n
X
2
:
26
aij
si
j1
Although the above forms of U given by (24) and (25)
are dependent on H and P, they will be regarded as
constant during each iteration of the Gauss±Newton
method when minimize (19) in the next section. The
reason to do so is that we are interested in improving or
872
H. Li, Q. Yang / Advances in Water Resources 23 (2000) 867±880
scaling the irregular weighting eects of A with the
simplest calculation without changing the original purpose of the least-squares penalty method. This means
that the penalty term Jg is corrected adaptively at every
iteration according to the change of the entries of A
mainly caused by changes in the parameter vector P.
2.2. Algorithm
Let
v v1 ; . . . ; vn ; . . . ; vnnp T : H T ; P T T :
27
Then the objective function Jgmp de®ned by (19) is a
non-linear function of the variable vector v. The
minimization must be carried out iteratively. Let iit be
the iteration index and v0 be a given initial estimate of v.
At every iteration we wish to determine a correction Dv
of the current iterative value of v, i.e. viit , such that
viit 1 viit Dv;
iit 0; 1; 2; . . .
28
results in a reduction of the objective function Jgmp . The
Gauss±Newton method for obtaining Dv leads to the
following linear system of equations [11,12,21]
J T WJDv ÿJ T WR;
29
where
W Diagk2g W g ; W m ; kp W p
30
and R is the entire residual vector composed of Rg , Rm
and Rp , i.e.,
T
T
R r1 ; . . . ; rn nm np : RTg ; RTm ; RTp ;
31
J is the Jacobian or sensitivity matrix when v viit with
elements
ori
Jij
i ; 1 6 i 6 n nm n p ; 1 6 j 6 n np :
ovj vv it
32
This n nm np n np matrix J is very large because the node number n is usually very large. Hence it
will be time-consuming and require substantial computer-storage to solve (29) directly. Note that (29) is
equivalent to the following linear least-squares system
WRk
33
min kWJDv
Dv
2
is the
where k k2 indicates the L2 -norm of a vector, W
square root or Cholesky decomposition of the positive
de®nite matrix W, i.e.,
W:
TW
W
34
Instead of solving (29) directly, we consider the linear
least-squares system (33) which can be solved using the
iterative LSQR method [17,18] or the direct QR factorization method [10,12]. In our numerical tests both
methods are tried. For relatively small n (e.g., 6100)
both work well. When n is larger, the direct QR factorization method needs more storage and its speed is
comparable to the iterative method. Hence the iterative
LSQR method is prefered, because it can use a sparse
matrix data structure to store the non-zero elements of
J. Consequently, the node number n can be as large as
that of the traditional method.
In order to solve (33) eciently and accurately, a
suitable preconditioning technique is needed. Let C be a
suitable n np n np non-singular matrix which
can improve the condition number of the matrix WJC.
Then instead of solving (33) directly, we ®rst solve the
following system about an n np -dimensional unknown vector n
WRk
35
min kWJCn
2
n
and then calculate
Dv Cn:
36
The simplest form of C is a diagonal matrix. We wish to
improve the condition number of WJC
by choosing a
diagonal matrix C : Dc2 such that all the diagonal elements of Dc2 J T WJDc2 is equal to 1. This leads to the
following form of C
C : Dc2 Diagq1 ; . . . ; qnnp ;
37
where
qj
nn
m np
X
Jij wik Jkj
i;k1
!ÿ1=2
;
1 6 j 6 n np
38
with wik denoting the i; k entry of W. In our numerical
tests this simple procedure accelerates the convergence
of the iterative LSQR algorithm signi®cantly and it also
widens the range of initial values of the unknown
parameter P within which our algorithm can converge
(see Section 4, Table 1).
The main work in implementing the above algorithm
is to calculate J. By calculating directly using (4a), (4b),
(5)±(9), (10a)±(10c), (11)±(15), (27), (31) and (32), setting
the matrix U according to the entries in A of the current
iteration and regarding U as a constant matrix independent of v, J can be written as
0
1
U A Z UP
O A
Mw
39
J@
O
I np
in which Z is an n n matrix with elements
X osik
o AH ÿ F i
hk
ÿ aij
ohj
ohj
k2Ni
(
0P if i 2P
D orR j 62 Ni
e
10Y s0j /j r/i r/k dx dy else;
hk
e
zij
k2Ni
e2Ti \Tk
40
873
H. Li, Q. Yang / Advances in Water Resources 23 (2000) 867±880
Table 1
Comparison of the eects of dierent residual-scaling matrix U and the preconditioner C on the computational cost
a
2
Uniform initial trasmissivity T (m /day)
Method
1.0
10.0
100.0
200.0
(1) U I n ; C I nnp
CPU time (s)
GN Lsqr Slsqr
1
Failed
1
Failed
134.4
9 (174)3342) 16360
177.0
11 (151)3079) 21557
(2) U I n ; C Dc2
CPU time (s)
GN Lsqr Slsqr
1
Failed
71.1
14 (475)710) 8659
51.5
11 (374)719) 6264
55.0
12 (180)714) 6695
(3) U U 1 ; C I nnp
CPU time (s)
GN Lsqr Slsqr
1
Failed
51.4
14 (385)609) 6256
33.0
10 (377)434) 4034
47.0
14 (365)441) 5732
(4) U U E ; C I nnp
CPU time (s)
GN Lsqr Slsqr
1
Failed
51.1
14 (415)587) 6221
32.8
10 (370)429) 3995
47.0
14 (374)438) 5728
(5) U U 1 ; C Dc2
CPU time (s)
GN Lsqr Slsqr
32.6
18 (199)279) 3974
25.8
15 (205)231) 3142
16.3
10 (187)206) 1980
23.5
14 (197)207) 2846
(6) U U E ; C Dc2
CPU time (s)
GN Lsqr Slsqr
31.5
18 (193)265) 3840
25.5
15 (202)228) 3104
17.9
11 (183)203) 2175
23.0
14 (195)203) 2798
(7) B & Y:
CPU time (s)
BY
18.6
12
16.7
11
19.5
13
23.0
15
(8) C & N: CN
71
54
57
90
a
GN: total Gauss±Newton iterations. Lsqr : LSQR iterations within the Gauss±Newton iteration. It diers for dierent Gauss±Newton steps, hence
only its range is listed in parentheses. Every iteration of LSQR includes 3 n np nm 5 n np multiplications and two products between
T
matrices and vectors de®ned by WJMn
and WJM
g, where n and g are n np -dimensional and n np nm -dimensional vectors, respectively
(see [17] p. 62). Slsqr : sum of Lsqr for all the Gauss±Newton iteration. Failed: this means GN > 100 and there is no indication that the minimizing
iteration converges. CN: the number of iterations minimizing the traditional objective function de®ned by (16) by using a method called ``conjugate
gradient algorithm coupled with Newton's method'' described by Carrera and Neuman in [6]. The data of CN are quoted from [6]. BY: the number of
the Gauss±Newton iterations minimizing the traditional objective function Jt de®ned by (16) with the sensitivity coecients used in every Gauss±
Newton iteration approximated by parameter perturbation and numerical dierence quotient method described by Becker and Yeh [3,21].
where s0j os=oh xj ; yj ; hj , and Ni is the serial number set of all the nodes in the neighborhood of the ith
node xi ; yi . For linear aquifer models Z 0 because of
s 1. M w is an nm n matrix whose ith row is an ndimensional vector with its ni th component being 1 and
the others 0. Here ni 1 6 i 6 nm indicates the serial
number of the node located at the ith head observation
well. I np is the np th-order identity matrix. P is an n np
matrix with elements
pij
X oaik
k2Ni
(
oYj
hk
0 if iP
2D P
hk
ln 10
k2Ni
e2Ti \Tj
R
e
e
10Y ej sr/i r/k dx dy else;
41
where
ej :
oY e
oYj
1
0
when e Xj ;
when e
6 Xj ;
1 6 j 6 np
42
and Xj is de®ned by (4a) and (4b). Here it is assumed
that each triangle element e is contained by one of the
subdomains X1 ; . . . ; Xnp .
Based on the above discussion, the penalty method
algorithm can be described as follows:
1. Initialization. Choose a suitable P0 as the initial value
of P. The initial value H0 of H is obtained by solving
T
(9) when P P0 . Set v0 H0T ; P0T , J0 0. Choose
two suitably small numbers dv > 0 and dJ > 0 as
the convergence criteria of the Gauss±Newton iteration. Choose two suitable np -dimensional vectors
P ÿ and P as the lower and upper bounds of the
parameter vector P.
2. For iit 0; 1; 2; . . . repeat steps 3±5.
3. Implement Gauss±Newton method.
(a) Calculate Jacobian matrix J. Calculate Jij jvviit
1 6 i 6 n nm np ; 1 6 j 6 n np using (39)±
(42), while the matrix U in (39) is determined
by the current entries in A. (In numerical examples three forms of U de®ned by (23)±(26) are
used and their utilities are compared.)
874
H. Li, Q. Yang / Advances in Water Resources 23 (2000) 867±880
(b) Calculate Dv. Calculate Dv by solving the linear least-squares system (35) and (36) using iterative LSQR algorithm [17,18].
(c) Update v. Calculate viit 1 using (28).
4. Verify the convergence. Calculate the value of
Jiit 1 : Jgmp using (19). If
jJiit 1 ÿ Jiit j < dJ
or jDvj1 < dv
43
holds, stop, where jDvj1 indicates the in®nity norm of
Dv. else set iit : iit 1.
5. Implement the upper and lower bound constraints to P.
Let P iit 1 be the last corresponding np components of
viit 1 . Judge if P ÿ 6 P iit 1 6 P holds, where the inequality refers to the corresponding components of
the three vectors. If there are components of P iit 1
which are less than their lower bounds or greater than
their upper bounds, then set them to equal their lower
or upper bound values, respectively. Then set the last
corresponding np components of viit 1 to equal those
of P iit 1 and go to step 3.
3. Numerical examples
3.1. Example of linear aquifer model
In order to demonstrate the utility of our algorithm,
the steady-state test problem used by Carrera and
Neuman in [6] and Bentley in [2] will be solved. It can be
described by specifying the factors in the aquifer model
(1)±(3) as follows (see [2,6]):
X. A square domain of 6000 m 6000 m (see
Fig. 1).
Cd and hd . Cd is the southern side of the square domain, hd 100 m.
Cn and q. The other three sides of the domain are Cn .
On the western side the ¯ux q 0:25 m2 /day, on the
northern and eastern sides there is no ¯ow, i.e.,
q 0:
Transmissivity T x; y. The domain X is divided into
constant-transmissivity
zones
of
np 9
2000 m 2000 m with the corresponding transmissivity values Ti 1 6 i 6 9 ranging from 5 to 150
m2 /day (see Fig. 1).
s x; y; h, B x; y and hr . The aquifer is assumed to be
con®ned, i.e., s x; y; h 1. The stratum overlying
and underlying the aquifer are supposed to be impervious, i.e., B x; y 0. Hence Bhr 0.
Q x; y. In the upper half part of the northern three
zones Q 0:274 10ÿ3 m/day, in the lower part of
them Q 0:137 10ÿ3 m/day, in other six zones
Q 0.
Head measurements. They are available at the 18 observation wells shown as points in Fig. 1.
In computation, the ¯ow region is discretized into
12 12 squares, each of which is divided into two tri-
Fig. 1. The domain X, the 18 observation wells, the 9 piecewise constant transmissivity T and the zones.
angle elements. All the head observation wells coincide
with the discrete nodes. In order to obtain head measurements at observation wells, as in [2,6], the ``true''
head values at discrete nodes, which range from 100.0 m
on the southern Direchlet boundary to 205.88 m at the
north-east corner, were numerically calculated using the
true transmissivity values shown in Fig. 1. Then head
values at the 18 observation wells were corrupted by
uncorrelated Gaussian noise of variance 1. Consequently the head measurement weighting matrix W m in
(17) is set to be the nm th-order identity matrix as in [2,6].
To test the utility of our algorithm, the regulation
term Jp containing prior information on P is not used in
all our calculation, i.e., kp 0. But P is restricted from
below and above by choosing P ÿ with uniform components log 0:01 ÿ2 and P with uniform components
log 103 3. The Gauss±Newton iteration stop criteria
are chosen to be dJ 10ÿ7 and dv 10ÿ4 .
The numerical tests are composed of two parts.
The ®rst part is designed to investigate the eects of
dierent forms of the residual scaling matrix U and the
preconditioner C on the calculation speed and numerical results. For this purpose, three dierent forms of U,
i.e., I n , U 1 and U E de®ned by (23)±(26), two dierent
forms of C, i.e., I nnp and Dc2 de®ned by (37) and (38),
and four dierent but uniform initial values of
Ti 1 6 i 6 np 9 as in [6], namely, 1, 10, 100 and
200 m2 /day, are used combinatorially. The weighting
factor kg is respectively ®xed to be 1:0 when U I n
considering the great equivalent weighting eects of AT A
and to be 10.0 for the other two forms of U de®ned by
(24)±(26) because of their counteracting the weighting
eects of AT A. Hence there are altogether
4 2 3 24 dierent cases.
H. Li, Q. Yang / Advances in Water Resources 23 (2000) 867±880
In Table 1 the relevant data on the computational
costs corresponding to the 24 cases are summarized.
The ®rst eight rows corresponding to methods 1±4
show that if either the discrete governing equation residuals are not scaled (i.e., U I n ), or no precondition
technique is adopted (i.e., C I nnp ), our algorithm
performs poorly, e.g., the computation cost is higher
than that of the traditional method, and what is worse is
the Gauss±Newton iterations fail to converge for some
initial values of the unknown parameters. But the situation is always improved with the gradual improvement
of U and C.
The successive four rows corresponding to methods 5
and 6 demonstrate the improvements due to setting
U U 1 or U E (Eqs. (24)±(26)) and C Dc2 (Eqs. (37)
and (38)). The range of initial values of T for which the
Gauss±Newton iteration converges has increased, there
are no longer ``Failed'' calculations for all the four different initial values of T; and the computational eort is
also signi®cantly reduced. For example, when the initial
values of P are uniform log 100 m2 /day, Slsqr , the number of LSQR iterations is reduced from 16 360 for
U I n and C I nnp to 1980 for U U 1 and
C Dc2 . And the CPU time is reduced proportionally.
In the next two rows that begin with ``B & Y'' we
show the computational eort of the method described
by Becker and Yeh [3,21], which uses a Gauss±Newton
method to minimize the objective function Jt de®ned by
(16) means of by the parameter perturbation and numerical dierence quotient method. Comparison of the
CPU times and the number of iterations required by the
``B & Y'' method (i.e., BY) to the results from the improved scaling matrixes demonstrates that the computational cost of our algorithm is reduced to a level
similar to that of the traditional method. This trend
shows potential of our algorithm to solve inverse
problems more quickly than the traditional method on
the basis of ®nding more suitable forms of U and C.
In the last row, CN, the number of iterations required
by the ``conjugate gradient algorithm coupled with
Newton's method'' used by Carrera and Neuman in [6]
to minimize the traditional objective function Jt de®ned
by (16), is quoted. One can see that GN, the number of
Gauss±Newton iterations required by our algorithm, is
less than CN. Unfortunately, the computational costs of
our algorithm and that used in [6] in every iteration
minimizing the objective functions are dicult to compare due to the lack of relevant data.
The ®ve Failed cases in Table 1, which mean that the
iterations fail to converge, can be improved so that the
iterations converge if the Marquardt's method [7,12] is
incorporated in our algorithm. But it takes more iterations and CPU times than those of the other unfailed
cases listed in Table 1 where the Marquardt's method is
not used. The detail will not be stated here due to space
limitation.
875
In Table 2 the numerical results corresponding to the
24 cases mentioned above and to the traditional method
by Becker and Yeh [3,21] are summarized. Theoretically,
(i.e., if there are no round o errors,) dierent forms of
non-singular matrix C do not in¯uence the calculated
results. The numerical results support this fact: the optimization results neither depend on the initial values of
T, nor on the change of C up to ®ve signi®cant ®gures.
Hence, only results corresponding to the three dierent
forms of U are listed in Table 2, and they are valid for
all the combinations of dierent initial values of T and
forms of C in Table 1 which are not Failed. The last
column is the results calculated by using the traditional
method of Becker and Yeh.
The ®rst four rows in Table 2 give the values of Jt , the
traditional objective function that appeared in (19), of
Jg , the weighted square sum of the discrete governing
equation residuals de®ned by (20), of Jgmp , the penalty
method objective function de®ned in (19) and of the
ratio Jg =Jgmp successively. It is well-known that greater
the weighting factors of residuals in a least-squares
system, smaller these residuals will be. Hence the ratio
Jg =Jgmp can be regarded as a measure of the weighting
eects of kg U on the discrete governing equation residuals. According to this measure the weighting eects
of kg U 10:0U E are the weakest, and those of
kg U 1:0I n the strongest.
The data listed under ``Identi®ed T '' and ``The
Statistic Data of log T '' demonstrate that the numerical
results of T dier very slightly when they are calculated
by using our algorithm with the three dierent choices of
kg U and when they are obtained by the traditional
method of Becker and Yeh. For example, r DP , the
standard deviation of the residuals between the computed and true log-transmissivity, ranges only from
2:78 10ÿ3 to 3:33 10ÿ3 , and SSE DP , the standard
square error between the computed and true log-transmissivity, from 2:81 10ÿ3 to 3:34 10ÿ3 .
One may conjecture that the discrete governing
equations will be less accurately satis®ed at the nodes
coincident with the head observation wells than at other
nodes due to the measurement error at the head observation wells. This conjecture can be seen and understood
clearly if ``The statistic data of Rgm and Rgr '' are compared row by row, where Rgm is an nm -dimensional
vector composed of the components of Rg whose indices
equal those of the nodes coincident with the head observation wells and Rgr is an n ÿ nm -dimensional vector
composed of the rest of the components of Rg except
those of Rgm . For example, SSE Rgm , the standard
square error of Rgm , is much greater than SSE Rgr for
all the three choices of kg U. The ratio SSE Rgm /
SSE Rgr ranges from 2.16 to 9.69. The other pair of
statistics jRgm j1 and jRgr j1 , the respective in®nity norms
of Rgm and Rgr , also show the similar phenomenon with
the ratio jRgm j1 =jRgr j1 ranging from 1.96 to 3.06. That
876
H. Li, Q. Yang / Advances in Water Resources 23 (2000) 867±880
Table 2
Comparison of the numerical results of our algorithm and of the traditional method
U
kg
Jt
Jg
Jgmp
Jg =Jgmp
True T
T1 150
T2 150
T3 50
T4 150
T5 50
T6 15
T7 50
T8 15
T9 5
The statistic data of log T
E DP
r DP
SSE DP
In
1.0
a
U1
10.0
16.0
0.115
16.1
0.715%
UE
10.0
14.8
0.674
15.5
4.35%
14.5
0.839
15.3
5.47%
159.10
134.04
51.667
120.77
49.973
15.482
67.600
14.998
5.0329
158.51
133.32
51.652
120.18
50.016
15.474
67.651
14.991
5.0329
Traditional
method
16.2
0.0
0.0%
2
Identi®ed T (m /day)
158.71
137.20
51.608
123.02
49.756
15.516
66.537
15.028
5.0254
160.66
136.71
51.679
123.25
49.758
15.511
67.256
15.025
5.0310
5.89 10ÿ3
2.78 10ÿ3
2.81 10ÿ3
4.90 10ÿ3
3.25 10ÿ3
3.27 10ÿ3
4.24 10ÿ3
3.33 10ÿ3
3.34 10ÿ3
The statistic data of Rgm and Rgr
1.31 10ÿ3
k2g SSE Rgm
2.01 10ÿ2
2.50 10ÿ2
0.0
ÿ3
2.58 10ÿ3
9.69
0.0373
0.0190
1.96
0.0
k2g SSE Rgr
SSE Rgm =SSE Rgr
jRgm j1
jRgr j1
jRgm j1 =jRgm j1
ÿ4
6.05 10
2.16
0.140
0.0458
3.06
2.08 10
9.66
0.0396
0.0170
2.33
7.05 10ÿ3
2.96 10ÿ3
3.01 10ÿ3
0.0
0.0
a
C and the initial values of T can be any one of the combinations in Table 1. DP is the dierence vector between the computed and true values of the
parameter vector P de®ned by (11). Rgm is an nm -dimensional vector composed of the components of the discrete governing equation residual vector
Rg (see (15)) whose indices equal those of the discrete nodes coincident with the head observation wells. Rgr is an n ÿ nm -dimensional vector
composed of the rest of the components of Rg except those of Rgm . For an arbitrary m-dimensional vector : e1 ; . . . ; em T , which can be set to be DP
P
or Rgm or RgrP
, respectively, the vector functions E , r , SSE and jP
j1 are de®ned as follows: E : 1=m mi1 ei indicates the mean;
m
2
m
2
r : 1=m i1 ei ÿ E is the standard deviation; SSE : 1=m i1 ei is the standard square error; jj1 : maxi fje1 j; . . . ; jem jg is the
in®nity norm.
is to say, statistically, the residuals of the discrete governing equations discretized at the head-observationwell nodes are much greater than those of the discrete
governing equations at other nodes. Therefore, the
above-mentioned conjecture is true in the sense of
statistics.
If ``The statistic data of Rgm and Rgr '' are compared
column by column, the following interesting facts can be
found: among the three group values listed in the three
columns, the values of k2g SSE Rgm and k2g SSE Rgr in the
®rst column corresponding to kg U 1:0I n are always
the smallest, but the values of jRgm j1 and jRgr j1 in the
®rst column are always the biggest. These results imply
that the components of the residual vector Rg when
U I n are irregularly distributed and they have been
improved when U is set to be U 1 or U E .
The second part of our numerical tests are designed
to investigate the eects of the weighting factor kg on the
numerical results. To this end the initial values of T are
®xed to be uniform 100 m2 /day, U U E and C Dc2 ,
then our algorithm is used to minimize the penalty
method objective function Jgmp corresponding to dier-
ent k2g , which is respectively set to be 10:0; 102 ; 104 ; 106
and 108 . The results are listed in Table 3.
First let us look at the data listed above ``The computation cost'' in Table 3. If they are compared column
by column, the following facts can be seen:
1. As k2g increases from 10.0 to 106 , the values of Jt and
Jgmp tend to the same constant 16.2 increasingly, and
the values of Jg decreases with an approximate rate of
kÿ2
g . So the ratio Jg =Jgmp decreases also with the same
approximate rate kÿ2
g . In fact, it has been theoretically
proven that Jt , Jgmp and Jg have the asymptotic
properties
lim Jt lim Jgmp C1 ;
kg !1
kg !1
lim k2g Jg C2 ;
kg !1
where C1 is the value of Jt corresponding to the traditional-method solution to problem (16), C2 is a nonnegative constant that vanishes if and only if C1
vanishes [13]. The numerical results in Table 3 show
these properties clearly when k2g 6 106 with the constants C1 and C2 approximating 16.2 and 94:1, respectively. Unfortunately, by k2g 108 , they begin to
877
H. Li, Q. Yang / Advances in Water Resources 23 (2000) 867±880
Table 3
Comparison of the eects of dierent weighting factor kg on the numerical results
k2g
10:0
10
Jt
Jg
Jgmp
Jg =Jgmp
6.95
4.06
11.01
36.9%
14.5
8.38 10ÿ1
15.3
5.47%
True T
T1 150
T2 150
T3 50
T4 150
T5 50
T6 15
T7 50
T8 15
T9 5
Identi®ed T (m2 /day)
130.14
104.24
50.335
94.447
50.690
15.254
70.870
14.779
5.0383
The statistic data of log T
E DP
)2.85 10ÿ2
r DP
9.44 10ÿ3
SSE DP
1.03 10ÿ2
The computation cost
CPU time (s)
GN Lsqr Slsqr
14.7
12 (138)156)
1782
2
a
106
108
16.2
9.39 10ÿ3
16.2
0.058%
16.2
9.41 10ÿ5
16.2
0.000581%
16.2
6.64 10ÿ1
16.9
3.94%
158.51
133.32
51.652
120.18
50.016
15.474
67.651
14.991
5.0329
160.68
136.69
51.675
123.12
49.754
15.511
67.326
15.024
5.0311
160.70
136.72
51.675
123.15
49.752
15.511
67.325
15.024
5.0311
160.76
136.71
51.677
122.96
49.756
15.511
67.495
15.021
5.0311
4.24 10ÿ3
3.33 10ÿ3
3.34 10ÿ3
7.01 10ÿ3
2.95 10ÿ3
3.00 10ÿ3
7.03 10ÿ3
2.95 10ÿ3
3.00 10ÿ3
7.09 10ÿ3
2.99 10ÿ3
3.04 10ÿ3
17.9
11 (183)203)
2175
11.1
7 (175)205)
1354
25.4
14 (104)325)
3089
25.5
20 (71)506)
3097
10
4
a
These results are calculated when U U E ; C Dc2 , initial values of Ti 1 6 i 6 9 equal uniform 100 m2 /day. The signs of this table have the same
meanings as those in Tables 1 and 2.
lose part of their ®delity due to the serious deterioration of the condition number of the weighting
matrix W de®ned by (30).
2. When k2g 10:0, Jg takes comparatively large value
4.06 and the ratio Jg =Jgmp reaches 36.9%, implying
that much of the head measurement noise is ``absorbed'' by the discrete governing equation residuals.
This leads to the large errors in the estimated values of
Ti 1 6 i 6 9 (see the second column listed under
k2g 10:0). When k2g P 102 , accurate and stable estimates of Ti 1 6 i 6 9 are obtained due to the sucient
enforcement of the discrete governing equations, even
in the case of k2g 108 . They also dier very slightly
from each other. Moreover, if one compares our results of Ti 1 6 i 6 9 when 104 6 k2g 6 108 with the traditional method results of Ti listed in the last column
of Table 2, one can easily see that each Ti 1 6 i 6 9
always has the same ®rst three signi®cant ®gures
with only one exception of T7 67:5 when k2g 108 .
3. The facts stated in the above paragraph are also supported by the data listed under ``The statistic data of
log T ''.
In the last two rows the ``CPU Time'' and Slsqr , the
sum of LSQR iterative numbers in all the Gauss±Newton iterations, are listed for dierent values of k2g . One
may see that although they dier from each other, the
average CPU time is only 18.9 s, which is a little less
than 19.5 s, the CPU time of the traditional method
corresponding to the method 7 B & Y and initial
T 100 m2 /day in Table 1. Particularly, when k2g 104 ,
the CPU time of our algorithm is only 11.1 s, 56.9% of
that of the traditional method.
In short, there is a suciently wide workable range of
kg values within which the penalty method algorithm
performs well enough to give accurate estimations of the
unknown parameter T with low computational cost just
like the traditional method.
3.2. Example of non-linear aquifer model described by
Boussinesq equation
In order to demonstrate the advantage of our algorithm that it can solve inverse problem of non-linear
aquifer models naturally and quickly, an uncon®ned
quasilinear aquifer model described by the Boussinesq
equation [1,19] is arti®cially designed by s