SYSTEMS OF DIFFERENTIAL EQUATIONS

SYSTEMS OF DIFFERENTIAL EQUATIONS

7.1 Introduction

Although the study of differential equations began in the 17th century, it was not until the 19th century that mathematicians realized that relatively few differential equations could

be solved by elementary means. The work of Liouville, and others showed the importance of establishing general theorems to guarantee the existence of solutions to certain specific classes of differential equations. Chapter 6 illustrated the use of an uniqueness theorem in the study of linear differential equations. This chapter is concerned with a proof of this theorem and related topics.

Existence theory for differential equations of higher order can be reduced to the order case by the introduction of systems of equations. For example, the second-order equation

y=

can be transformed to a system of two first-order equations by introducing two unknown functions

and , where

Then we have = = y”, so (7.1) can be written as a system of two first-order equations :

We cannot solve the equations separately by the methods of Chapter 6 because each of them involves two unknown functions.

In this chapter we consider systems consisting of linear differential equations of first order involving unknown functions

. These systems have the form =

Systems of

equations

The and which appear in (7.3) are considered as given functions on

are unknown functions to be determined. Systems of this type are

a given interval J. The functions

linear systems. In general, each equation in the system involves more than one unknown function so the equations cannot be solved separately.

A linear differential equation of order can always be transformed to a linear system. Suppose the given nth order equation is

R(t),

where the coefficients are given functions. To transform this to a system we write =y and introduce a new unknown function for each of the successive derivatives of y. That is, we put

. . . Yn =

and rewrite (7.4) as the system

Yn-1 =

+ R(t). The discussion of systems may be simplified considerably by the use of vector and matrix

notation. Consider the general system (7.3) and introduce vector-valued functions Y .

and a matrix-valued function P = defined by the equations

1 column matrices and write the system (7.3) in the simpler form

for each t in J. We regard the vectors as n x

Y’ = P(t) Y + Q(t).

For example, in system (7.2) we have

193 In system (7.5) we have

Calculus of matrix functions

-a,-

An initial-value problem for system (7.6) is to find a vector-valued function Y which satisfies (7.6) and which also satisfies an initial condition of the form Y(a) = B, where . , b,) is a given n-dimensional vector.

In the case n = (the scalar case) we know from Theorem 6.1 that, if P and Q are continuous on J, all solutions of (7.6) are given by the explicit formula

where A(x) P(t) dt , and a is any point in J. We will show that this formula can be suitably generalized for systems, that is, when P(t) is an n x matrix function and Q(t) is an n-dimensional vector function. To do this we must assign a meaning to integrals of

matrices and to exponentials of matrices. Therefore, we digress briefly to discuss the calculus of matrix functions.

7.2 Calculus of matrix functions

The generalization of the concepts of integral and derivative for matrix functions is

straightforward. If P(t) =

we define the integral

P(t) dt by the equation

That is, the integral of matrix P(t) is the matrix obtained by integrating each entry of P(t), assuming of course, that each entry is integrable on [a, b]. The reader can verify that the linearity property for integrals generalizes to matrix functions.

Continuity and differentiability of matrix functions are also defined in terms of the entries. We say that a matrix function P =

is continuous at if each entry is continuous at

The derivative is defined by differentiating each entry,

whenever all derivatives exist. It is easy to verify the basic differentiation rules for sums and products. For example, if P and Q are differentiable matrix functions, we have

+ Q’

Systems of

equations

if P and Q are of the same size, and we also have

= PQ’ + P’Q

if the product PQ is defined. The chain rule also holds. That is, if =

, where P

is a differentiable matrix function and is a differentiable scalar function, then = . The zero-derivative theorem, and the first and second fundamental theorems of calculus are also valid for matrix functions. Proofs of these properties are requested in the next set of exercises.

The definition of the exponential of a matrix is not so simple and requires further preparation. This is discussed in the next section.

7.3 Infinite series of matrices. Norms of matrices

be an x n matrix of real or complex entries. We wish to define the exponential

Let A =

in such a way that it possesses some of the fundamental properties of the ordinary real or complex-valued exponential. In particular, we shall require the law of exponents in the form

for all real and ,

and the relation

where 0 and Z are the n x n zero and identity matrices, respectively. It might seem natural to define

to be the matrix However, this is unacceptable since it satisfies neither of properties (7.8) or (7.9). Instead, we shall define

by means of a power series expansion,

We know that this formula holds if A is a real or complex number, and we will prove that it implies properties (7.8) and (7.9) if A is a matrix. Before we can do this we need to explain what is meant by a convergent series of matrices.

sequence of m X n matrices

DEFINITION OF CONVERGENT SERIES OF MATRICES.

Given an

whose entries are real or complex numbers, denote the ij-entry of by . all mn series

are convergent, then we say the series of matrices

is convergent, and its sum is

to be the m x n matrix whose ij-entry is the series in (7.10).

Exercises

A simple and useful test for convergence of a series of matrices can be given in terms of the norm of a matrix, a generalization of the absolute value of a number.

is an m x n matrix of real or complex entries, the norm of A, denoted by

DEFINITION OF NORM OF A MATRIX.

If A=

is defined to be the nonnegative number given by the formula

(7.11) In other words, the norm of A is the sum of the absolute values of all its entries. There

are other definitions of norms that are sometimes used, but we have chosen this one because of the ease with which we can prove the following properties.

7.1. FUNDAM ENTAL PROPERTIES OF NORM S. For rectangular matrices A and B, and all real or complex scalars c we have

THEOREM

. Proof. We prove only the result for

, assuming that A is x n and B is nxp. The proofs of the others are simpler and are left as exercises. Writing A =

,B = , we have AB , so from (7.11) we obtain

becomes By induction we also have

Note that in the special case B = A the inequality for

for

These inequalities will be useful in the discussion of the exponential matrix.

The next theorem gives a useful sufficient condition for convergence of a series of matrices.

THEOREM

7.2. TEST FOR CONVERGENCE OF A M ATRIX SERIES. is a sequence Of

matrices such that

also converges. Proof. Let the

converges, then the matrix series

, convergence of im absolute convergence of each series

of be denoted by

Since

Hence each series is convergent, so the matrix series

is convergent.

7.4 Exercises

1. Verify that the linearity property of integrals also holds for integrals of matrix functions.

2. Verify each of the following differentiation rules for matrix functions, assuming P and Q are differentiable. In (a), P and Q must be of the same size so that P Q is meaningful. In

Sy stems of

equations

and they need not be of the same size provided the products are meaningful. In (c) and (d), Q is assumed to be nonsingular. (a)

+ Q)’ = P’ + Q’ .

(c)

(b) (PQ)’ = PQ’ + P’Q.

(d)

be a differentiable matrix function. Prove that the derivatives of and are given by the formulas

3. (a) Let P

= PP’ + P’P,

(b) Guess a general formula for the derivative of and prove it by induction.

be a differentiable matrix function and let be a differentiable scalar function whose range is a subset of the domain of P. Define the composite function

4. Let P

= and prove the chain rule, F’(t)

5. Prove the zero-derivative theorem for matrix functions: If P’(t) = 0 for every t in an open

interval (a, b), then the matrix function P is constant on (a, b).

6. State and prove generalizations of the first and second fundamental theorems of calculus for matrix functions.

7. State and prove a formula for integration by parts in which the integrands are matrix functions.

8. Prove the following properties of matrix norms :

9. If a matrix function P is integrable on an interval [a, b] prove that

10. Let D be an n x

Prove that the matrix series converges and is also a diagonal matrix,

diagonal matrix, say D = diag

(The term corresponding to k = 0 is understood to be the identity matrix

If the matrix series converges, prove that

11. Let D be an

diagonal matrix, D = diag

is an x matrix. Prove that the matrix series

12. Assume that the matrix series

converges, where each

also converges and that its sum is the matrix

Here A and B are matrices such that the products are meaningful.

The

equation

by

7.5 The exponential matrix

Using Theorem 7.2 it is easy to prove that the matrix series

converges for every square matrix A with real or complex entries. (The term corresponding to k 0 is understood to be the identity matrix Z.) The norm of each term satisfies the inequality

Since the series converges for every real a, Theorem 7.2 implies that the series in (7.12) converges for every square matrix A.

For any n matrix A with real or complex entries we

DEFINITION OF THE EXPONENTIAL M ATRIX.

the exponential to be the x matrix given by the convergent series in (7.12). That is,

Note that this definition implies where 0 is the zero matrix. Further properties of the exponential will be developed with the help of differential equations.

7.6 The differential equation satisfied by

Let be a real number, let A be an x matrix, and let

be the x matrix given by

E(t) =

We shall keep A fixed and study this matrix as a function oft. First we obtain a differential equation satisfied by E.

satisfies the matrix

T H EO R EM

7.3. For every real t the matrix function E defined by E(t) =

equation

E’(t) = E(t)A = AE(t).

Proof. From the definition of the exponential matrix we have

Systems of

equations

Let denote the ij-entry of A Then the ij-entry of

. Hence, from the definition of a matrix series, we have

is

Each entry on the right of (7.13) is a power series in convergent for all Therefore its derivative exists for all

and is given by the differentiated series

k=l k!

This shows that the derivative exists and is given by the matrix series

A=

In the last equation we used the property A

. Since A commutes with we could also have written

This completes the proof.

to obtain the relation

Note: The foregoing proof also shows that A commutes with .

7.7 Uniqueness theorem for the matrix differential equation F’(t) =

In this section we prove a uniqueness theorem which characterizes all solutions of the matrix differential equation

= AF(t) . The proof makes use of the following theorem.

For any n n matrix A and any scalar t we have (7.14)

THEOREM

7.4. OF

Hence is nonsingular, and its inverse is Proof. Let F be the matrix function defined for all real t by the equation

F(t) =

We shall prove that F(t) is the identity matrix by showing that the derivative F’(t) is the zero matrix. Differentiating F as a product, using the result of Theorem 7.3, we find

since A commutes with Therefore, by the zero-derivative theorem, F is a constant matrix. But F(0) =

so F(t) = for all This proves (7.14).

The law of exponents for matrices THEOREM 7.5. UNIQUENESS THEOREM. Let A and B be given x n constant matrices.

Then the only n x n matrix function F satisfying the initial-value problem

F’(t) = AF(t),

F(0) = B

for t < + is (7.15)

F(t)

Proof. First we note that is a solution. Now let F be any solution and consider the matrix function

G(t) =

Differentiating this product we obtain G’(t) =

= 0. Therefore G(t) is a constant matrix,

G(t) = G(0) = F(0) = B.

In other words,

and using (7.14) we obtain (7.15). Note: The same type of proof shows that

= B. Multiplying by

is the only solution of the initial-value problem

F(0) = B.

7.8 The law of exponents for exponential matrices

is not always true for matrix exponentials. A counter example is given in Exercise 13 of Section 7.12. However, it is not difficult to prove that

The law of exponents

the formula is true for matrices A and B which commute.

T H EORE M 7.6. Let A and B be two x n matrices which commute, AB = BA . Then we have

Proof.

From the equation AB BA we find that A(BA) = (AB)A = (BA)A =

so B commutes with By induction, B commutes with every power of A. By writing

for every real Now let F be the matrix function defined by the equation

as a power series we find that B also commutes with

Systems of differential equations

Differentiating F(t) and using the fact that B commutes with we find

+ (A +

. By the uniqueness theorem we have

(A +

= (A

F(t) =

But F(0) so F(t) 0 for all Hence

When t = 1 we obtain (7.16). EXAMPLE . The matrices

and

commute for all scalars and Hence we have

7.9 Existence and uniqueness theorems for homogeneous linear systems with constant coefficients

A Y(t), where A is an x n constant matrix and Y is an n-dimensional vector function (regarded as an n x 1 column matrix) is called a homogeneous linear system with constant

The vector differential equation Y’(t)

We shall use the exponential matrix to give an explicit formula for the solution of such a system.

THEOREM 7.7. Let A be a given n x n constant matrix and let B be a given n-dimensional vector. Then the initial-value problem

Y’(t)

Y(0) = B,

has a unique solution on the interval t + This solution is given by the formula (7.18)

Y(t) =

More generally, the unique solution of the initial value problem

Y’(t) = A Y(t),

Y(a) B,

is Y(t) = Proof. Differentiation of (7.18) gives us Y’(t)

A Y(t). Since Y(0) = B , this is a solution of the initial-value problem (7.17).

To prove that it is the only solution we argue as in the proof of Theorem 7.5. Let Z(t)

be another vector function satisfying Z’(t) = AZ(t) with Z(0) B , and let G(t) = Then we easily verify that G’(t)

so G(t) G(0) = Z(0) = B. In other words,

201 = B, so Z(t) =

The problem of calculating

B is treated in exactly the same way.

Y(t). The more general case with initial value Y(a)

7.10 The problem of calculating

Although Theorem 7.7 gives an explicit formula for the solution of a homogeneous system with constant coefficients, there still remains the problem of actually computing the exponential matrix e

directly from the series definition we would have to compute all the powers

If we were to calculate

, . . , and then compute the sum of each series

for k = 0,

In general this is a hopeless task unless A is a matrix whose powers may be readily calculated. For example, if A is a diagonal matrix, say

A = diag

then every power of A is also a diagonal matrix, in fact,

Therefore in this case

is a diagonal matrix given by

Another easy case to handle is when A is a matrix which can be diagonalized. For example, if there is a nonsingular matrix C such that

is a diagonal matrix, say

D , then we have A

, from which we find

and, more generally,

Therefore in this case we have

k=O

k=O

Here the difficulty lies in determining C and its inverse. Once these are known, is easily calculated. Of course, not every matrix can be diagonalized so the usefulness of the foregoing remarks is limited.

EXAMPLE 1. Calculate

for the 2 x

2 matrix A =

Solution. This matrix has distinct eigenvalues = 6, 1, = 1 , so there is a nonsingular matrix C =

such that

where D

diag

= To

Systems of

equations

determine C we can write AC = CD, or

Multiplying the matrices, we find that this equation is satisfied for any scalars a, b, d -d.

Therefore

EX AMPLE 2. Solve the linear system

subject to the initial conditions

Solution. In matrix form the system can be written as

Y’(t) = A Y(t),

where A

By Theorem 7.7 the solution is Y(t) =

Y(0). Using the matrix

calculated in Example

1 we find

from which we obtain

There are many methods known for calculating when A cannot be diagonalized. Most of these methods are rather complicated and require preliminary matrix transforma- tions, the nature of which depends on the multiplicities of the eigenvalues of A. In a later section we shall discuss a practical and straightforward method for calculating

which can be used whether or not A can be diagonalized. It is valid for

matrices A and requires no preliminary transformations of any kind. This method was developed by E. J. Putzer in a paper in the American Mathematical Monthly, Vol. 73

pp. 2-7. It is based on a famous theorem attributed to Arthur Cayley (1821-1895) and William

Hamilton

203 (18051865) which states that every square matrix satisfies its characteristic equation.

The Cayley-Hamilton theorem

First we shall prove the Cayley-Hamilton theorem and then we shall use it to obtain Putzer’s formulas for calculating

7.11 The Cayley-Hamilton theorem

THEOREM 7.8. CAYLEY-HAMILTON THEOREM. Let A be an n x matrix and let (7.19)

det

be its characteristic polynomial. Then f (A) = 0. In other words, A satisfies the equation (7.20)

A” +

Proof. The proof is based on Theorem 3.12 which states that for any square matrix A we have

A (cof

= (det

We apply this formula with A replaced by

A) = Equation (7.21) becomes

A . Since det

(7.22) This equation is valid for all real 1. The idea of the proof is to show that it is also valid

when is replaced by A. The entries of the matrix cof

A) are the cofactors of A. Except for a factor

A of order 1. Therefore each entry of cof

1 , each such cofactor is the determinant of a minor of

A), and hence of

A)}*, is a polynomial in of degree

1. Therefore

k=O

where each coefficient is an n x n matrix with scalar entries. Using this in (7.22) we obtain the relation

k=O

which can be rewritten in the form

n-1

S-l

k=l

Systems of

equations

At this stage we equate coefficients of like powers of in (7.24) to obtain the equations

AB, = =

Equating coefficients is permissible because (7.24) is equivalent to scalar equations, in each of which we may equate coefficients of like powers of il. Now we multiply the equa- tions in (7.25) in succession by A”,

. . . , A, and add the results. The terms on the left cancel and we obtain

A” +

This proves the Cayley-Hamilton theorem.

Note: Hamilton proved the theorem in 1853 for a special class of matrices. A few years later, Cayley announced that the theorem is true for all matrices, but gave no proof.

EXAMPLE .

The matrix A = has characteristic polynomial

12. The Cayley-Hamilton theorem states that A satisfies the equation (7.26)

This equation can be used to express and all higher powers of A in terms of I, A, and For example, we have

It can also be used to express as a polynomial in A. From (7.26) we write 9A + 201) =

and we obtain

9A + 201).

Putzer’s methodfor calculating

7.12 Exercises

In each of Exercises 1 through 4, (a) express A- l , and all higher powers of A as a linear combination of Z and A. (The Cayley-Hamilton theorem can be of help.) (b) Calculate

5. (a) If A =

, prove that

ab

(b) Find a corresponding formula for

when A =

, a, b real.

-b a

t t-l

6. If F(t) =

prove that

t,

= 1 t and show that the result is not equal to either of the two products A(t) 00

7. If A(t) is a scalar function of

Compute the derivative of when

the derivative of

is

or

In each of Exercises

and express in terms of A, (b) Calculate .

10, (a) calculate

, express

as a linear combination of

A,

0 1 0 xy

e e is not always true for matrix exponentials. Compute each of the matrices

13. This example shows that the equation

= note that the three results are distinct.

when A=

IO]

7.13 method for calculating

The Cayley-Hamilton theorem shows that the nth power of any x n matrix A can be

as a linear combination of the lower powers A, ..., It follows that

Systems of

equations

I, A, ..., Therefore, in the infinite series defining each term with k

n is a linear combination of

Hence we can expect that

should be expressible as a polynomial in A of the form

where the scalar coefficients

Putzer developed two useful methods for expressing

depend on

as a polynomial in A. The next theorem describes the simpler of the two methods.

a sequence of polynomials in A as follows:

THEOREM 7.9. ...,

be the eigenvalues of an n n matrix A, and

n. Then we have

for

where the scalar r,(t), . . . , r,(t) are determined recursively from the system of linear

equations

n-l). Note: Equation (7.29) does not express

directly in powers of A as indicated in but as a linear combination of the polynomials

..., These

polynomials are easily calculated once the eigenvalues of A are determined. Also the multipliers

..., r,(t) in (7.30) are easily calculated. Although this requires solving

a system of linear differential equations, this particular system has a triangular matrix and the solutions can be determined in succession.

Proof. Let . . . , r,(t) be the scalar functions determined by (7.30) and define a matrix function F by the equation

Note that F(0) =

by showing that F satisfies the same differential equation as

We will prove that F(t) =

namely, F’(t) = AF(t) . Differentiating (7.31) and using the recursion formulas (7.30) we obtain

207 where r,,(t) is defined to be 0. We rewrite this in the form

Putzer’s method for calculating

n-2

k=O + k=O

then subtract =

to obtain the relation

n-2

= k=O + .

But from (7.28) we see that

Therefore Equation (7.32) becomes

The Cayley-Hamilton theorem implies that P,(A) = 0, so the last equation becomes F’(t)

from which we find F’(t) =

the uniqueness theorem (Theorem 7.7) shows that

Since F(0) =

= . EXAMPLE I. Express

as a linear combination of Zand A if A is a 2 x 2 matrix with both its eigenvalues equal to il.

Solution. Writing = = we are to solve the system of differential equations

Solving these first-order equations in succession we find

= te”‘.

Since = Z and P,(A) =A

is (7.33)

the required formula for

Systems of

equations

EXAMPLE 2. Solve Example 1 if the eigenvalues of A are and where Solution. In this case the system of differential equations is

r,(O) = 0.

Its solutions are given by

Since P,(A) Z and P,(A)

A the required formula for

is

If the eigenvalues are complex numbers, the exponentials and will also be complex numbers. But if and are complex conjugates, the scalars multiplying Z and A in (7.34) will be real. For example, suppose

Then so Equation (7.34) becomes

=e

(cos

+ i sin + (A

The terms involving i cancel and we get

cos

a sin

+ sin A}.

7.14 Alternate methods for calculating

in special cases

Putzer’s method for expressing as a polynomial in A is completely general because it is valid for all square matrices A. A general method is not always the simplest method to use in certain special cases. In this section we give simpler methods for computing

in

when all the eigenvalues of A are distinct, and (c) when A has two distinct eigenvalues, exactly one of which has multiplicity 1.

three special cases: (a) When all the eigenvalues of A are equal,

Alternate methods for calculating in special cases 209 THEOREM 7.10. A is an n x n matrix with all its eigenvalues equal to then we have

Proof. Since the matrices

and t(A

commute we have

(A

The Cayley-Hamilton theorem implies that (A

n , so the theorem is proved.

= 0 for every k

THEOREM 7.11. If A is an n x n matrix with n distinct eigenvalues ..., , then we have

where is a polynomial in A of degree n 1 given by the formula

Note: The polynomials L,(A) are called Lagrange interpolation Proof. We define a matrix function F by the equation

and verify that F satisfies the differential equation F’(t) = AF(t) and the initial condition F(0) = I. From (7.37) we see that

AF(t)

F’(t) =

0 for each k, so F satisfies the differential equation F’(t) = AF(t) . To complete the proof we need to show that F satisfies the initial condition F(0) = which becomes

By the Cayley-Hamilton theorem we have (A

A proof of (7.38) is outlined in Exercise 16 of Section 7.15. The next theorem treats the case when A has two distinct eigenvalues, exactly one of

which has multiplicity 1.

Systems of

equations

THEOREM 7.12. Let A be an n x n matrix (n 3) with two distinct eigenvalues and where has multiplicity n

1 and has multiplicity 1. Then we have

(A

k!

Proof. As in the proof of Theorem 7.10 we begin by writing

h=O

k=O

(A

r) , (A

k=O

Now we evaluate the series over r in closed form by using the Cayley-Hamilton theorem. Since

A =A

we find (A

= (A

The left member is 0 by the Cayley-Hamilton theorem so

(A

Using this relation repeatedly we find

(A

Therefore the series over r becomes

This completes the proof. The explicit formula in Theorem 7.12 can also be deduced by applying Putzer’s method,

but the details are more complicated.

The explicit formulas in Theorems 7.10, 7.11 and 7.12 cover all matrices of order n 3.

Since the 3 x 3 case often arises in practice, the formulas in this case are listed below for easy reference.

CASE 1. If a 3 x 3 matrix A has eigenvalues

then

+ t(A

Exercises

CASE 2. If a 3 x

3 matrix A has distinct eigenvalues

v, then

CASE

3. If a 3 x 3 matrix A has eigenvalues

with

then

EXAMPLE . Compute

when A=

Solution. The eigenvalues of A are 1, 1, 2, so the formula of Case 3 gives us

Z)} + By collecting powers of A we can also write this as follows,

{(t +

and perform the indicated operations in (7.39) or (7.40) to write the result as a 3 x 3 matrix,

At this stage we can calculate (A

or

-(t + + +

(3t +

(3t +

(3t +

-(t + +

7.15 Exercises

For each of the matrices in Exercises 1 through 6, express as a polynomial in A.

7. (a) A 3 x 3 matrix A is known to have all its eigenvalues equal to 1. Prove that =

+ 2)Z +

+ 2t)A +

(b) Find a corresponding formula if A is a 4 x

4 matrix with all its eigenvalues equal to

Systems of

equations

In each of Exercises 8 through 15, solve the system Y’ = A Y subject to the given initial condition.

I 0 0 0 4’

16. This exercise outlines a proof of Equation (7.38) used in the proof of Theorem 7.11, I1 Let 1

be the polynomial in of degree 1 defined by the equation

where . . . , I, are distinct scalars. (a) Prove that

be arbitrary scalars, and let

Prove that is the only polynomial of degree 1 which satisfies the equations

for

n.

(c) Prove that = 1 for every and deduce that for every square matrix A we have

where Z is the identity matrix.

Nonhomogeneous linear systems with constant 213

7.16 Nonhomogeneous linear systems with constant coefficients

We consider next the nonhomogeneous initial-value problem (7.41)

=A

+ Q(t),

Y(a)

B,

on an interval Here A is an n x n constant matrix, Q is an n-dimensional vector function (regarded as an x I column matrix) continuous on

and a is a given point in We can obtain an explicit formula for the solution of this problem by the same process used to treat the scalar case.

First we multiply both members of (7.41) by the exponential matrix and rewrite the differential equation in the form

Y’(t)

The left member of (7.42) is the derivative of the product Therefore, if we integrate both members of (7.42) from a to x, where x

we obtain

dt .

Multiplying by we obtain the explicit formula (7.43) which appears in the following theorem.

THEOREM 7.13. Let A be an n x n constant matrix and let Q be an n-dimensional vector function continuous on an interval J. Then the initial-value problem

=A

Y(a) = B,

has a unique solution on J given by the explicit formula

Y(x) =

dt.

As in the homogeneous case, the difficulty in applying this formula in practice lies in the calculation of the exponential matrices. Note that the first term,

is the solution of the homogeneous problem Y’(t) =

A Y(t), Y(a) = The second term is the solution of the nonhomogeneous problem

=A

Y(a) = 0.

We illustrate Theorem 7.13 with an example.

EXAMPLE .

Solve the initial-value problem

Y(0) = B,

Systems of

equations

on the interval (- , + co), where

Solution. According to Theorem 7.13, the solution is given by (7.44)

dt . The eigenvalues of A are 2, 2, and 4. To calculate

Y(x) =

dt =

we use the formula of Case 3, Section 7.14, to obtain

2x

We can replace x by

Therefore the integrand in (7.44) is

t in this formula to obtain e

= + (x t)(A 2Z)

2(x t)

Integrating, we find

Y(x) dt =

Since we have

we find

1. Let be a solution of the nonhomogeneous system

Z’(r) = AZ(r) + Q(r),

on an interval J with initial value Z(a). Prove that there is only one solution of the non- homogeneous system

Y’(t) =

on J with initial value Y(a) and that it is given by the formula

Y(t) =

Y(a)

Special methods are often available for determining a particular solution Z(t) which resembles the given function Q(t). Exercises 2, 3, 5, and 7 indicate such methods for Q(t) = C, Q(r) = =

where C and D are constant vectors. If the particular solution Z(t) so obtained does not have the required initial value, we modify Z(t) as indicated in Exercise 1 to obtain another solution

with the required initial value.

matrix, and C constant n-dimensional vectors. Prove that the solution of the system

2. (a) Let A be a constant

Y’(f) =

c,

Y(a) = B,

on + is given by the formula

C.

Y(x) =

(b) If A is nonsingular, show that the integral in part (a) has the value (c) Compute Y(x) explicitly when

3. Let A be an n x n constant matrix, let B and C be n-dimensional constant vectors, and let

be a given scalar. (a) Prove that the nonhomogeneous system Z’(t) = AZ(t) + Z(t) =

if, and only if,

A)B = C.

(b) If a is not an eigenvalue of A, prove that the vector can always be chosen so that the system in (a) has a solution of the form Z(t) =

(c) If is not an eigenvalue of A, prove that every solution of the system Y’(t) = Y(i) + has the form Y(t) =

Y(0)

where B = (al

4. Use the method suggested by Exercise 3 to find a solution of the nonhomogeneous system Y’(t) =

+ with

Systems of

equations

5. Let A be an x constant matrix, let B and C be n-dimensional constant vectors, and let

be a positive integer. (a) Prove that the nonhomogeneous system Y’(t) = A Y(t) + PC, Y(0) = B, has a particular solution of the form

Y(f) =

are constant vectors, if and only if 1

Determine the coefficients

for such a solution.

(b) If A is nonsingular, prove that the initial vector B can always be chosen so that the system in (a) has a solution of the specified form.

6. Consider the nonhomogeneous system

(a) Find a particular solution of the form Y(f) =

+ (b) Find a solution of the system with

7. Let A be an

constant matrix, let B, C, D be n-dimensional constant vectors, and let

be a given real number. Prove that the nonhomogeneous system Y’(t) = A Y(t) + (cos

Y(0) = B, has a particular solution of the form

+ (sin

Y(f) = (cos

+ (sin

where E and Fare constant vectors, if and only if

= -(AC +

C for such a solution. Note that if + is nonsingular, the initial vector B can always be chosen so that the system has a solution of the specified form.

Determine E and Fin terms of

A, B,

8. (a) Find a particular solution of the nonhomogeneous system

(b) Find a solution of the system with

217 In each of Exercises 9 through 12, solve the nonhomogeneous system Y’(t) = A Y(t) + Q(t)

The general linear system Y’(t) = P(t) Y(t) + Q(t)

subject to the given initial condition.

7.18 The general linear system Y’(t) =

+ Q(t)

Theorem 7.13 gives an explicit formula for the solution of the linear system

=A

= B,

where A is a constant

1 column matrices. We turn now to the more general case

matrix and Q(t), Y(t) are

Y(a) = B,

where the x matrix

is not necessarily constant.

If and Q are continuous on an open interval J, a general existence-uniqueness theorem which we shall prove in a later section tells us that for each a in J and each initial vector B there is exactly one solution to the initial-value problem (7.45). In this section we use this result to obtain a formula for the solution, generalizing Theorem 7.13.

In the scalar case = 1) the differential equation (7.45) can be solved as follows. We let

to rewrite the differ- ential equation in the form

then multiply both members of (7.45) by

Y’(t)

Now the left member is the derivative of the product Y(t). Therefore, we can integrate both members from a to x, where a and x are points in J, to obtain

Multiplying by we obtain the explicit formula

(7.47) Y(x) = e

Systems of

equations

The only part of this argument that does not apply immediately to matrix functions is the statement that the left-hand member of (7.46) is the derivative of the product

Y(t). At this stage we used the fact that the derivative of

In the scalar case this is a consequence of the following formula for differentiating exponential functions :

is

If E(t) =

then

E’(t) =

Unfortunately, this differentiation formula is not always true when A is a matrix function.

1 t For example, it is false for the 2 x 2 matrix function A(t)

Section 7.12.) Therefore a modified argument is needed to extend Equation (7.47) to the 1

0 0 . (See Exercise 7 of

matrix case. Suppose we multiply each member of (7.45) by an unspecified x n matrix

This gives us the relation

Now we add to both members in order to transform the left member to the derivative of the product F(t) Y(t). If we do this, the last equation gives us

r(t)}’ = {F’(t) +

If we can choose the matrix

on the right is the zero matrix, the last equation simplifies to

so that the sum

Integrating this from a to x we obtain

dt .

If, in addition, the matrix is nonsingular, we obtain the explicit formula

F(t)Q(t) dt . This is a generalization of the scalar formula (7.47). The process will work if we can find a

= F(x)-‘F(a) Y(a) +

n x matrix function which satisfies the matrix differential equation

and which is nonsingular.

Note that this differential equation is very much like the original differential equation (7.45) with Q(t) =

except that the unknown function is a square matrix instead of a column matrix. Also, the unknown function is multiplied on the right by

instead of on the left by

We shall prove next that the differential equation for F always has a nonsingular solution. The proof will depend on the following existence theorem for homogeneous linear systems.

219 THEOREM 7.14. Assume A(t) is an

The general linear system Y’(t) = P(t) Y(t) + Q(t)

x n matrix function continuous on an open interval J. If a J and if B is a given n-dimensional vector, the homogeneous linear system

Y’(t) = A(t) Y(t),

Y(a) = B,

has an n-dimensional vector solution Y on J.

A proof of Theorem 7.14 appears in Section 7.21. With the help of this theorem we can prove the following.

matrix function P, continuous on an open interval and given any point a in J, there exists an n x n matrix function F which

THEOREM 7.15. Given an n x

the matrix equation

on J with initial value F(a) = I. Moreover, F(x) is nonsingular for each x in J. Proof. Let Y,(x) be a vector solution of the differential equation

on J with initial vector Y,(a) = , where is the kth column of the x n identity matrix I. Here

denotes the transpose of P(x). Let G(x) be the n x n matrix whose kth column is Y,(x). Then G satisfies the matrix differential equation

G’(x) =

on J with initial value G(a) = Now take the transpose of each member of (7.50). Since the transpose of a product is the product of transposes in reverse order, we obtain

{G’(x)}” =

Also, the transpose of the derivative G’ is the derivative of the transpose Therefore the matrix F(x) =

satisfies the differential equation (7.49) with initial value F(a) =

Now we prove that F(x) is nonsingular by exhibiting its inverse. Let H be the x n matrix function whose kth column is the solution of the differential equation

Y’(x) = P(x) Y(x)

with initial vector

= I,, the kth column of

Then H satisfies the initial-value problem

H’(x) =

H(a) = I,

on J. The product

has derivative

Systems of

equations

for each in J. Therefore the product

= = so H(x) is the inverse of

is constant,

This completes the proof.

The results of this section are summarized in the following theorem.

THEOREM 7.16. Given an n x matrix function P and an n-dimensional vector function Q, both continuous on an open interval J, the solution of the initial-value problem

r’(x) = P(x) Y(x) +

Y(a) = B,

on J is given by the formula

Y(x) = F(x)-‘Y(a) + dt.

matrix F(x) is the transpose of the matrix whose kth column is the solution of the initial-value problem

The

Y’(x) =

where I, is the kth column of the identity matrix I.

Although Theorem 7.16 provides an explicit formula for the solution of the general linear system

the formula is not always a useful one for calculating the solution because of the difficulty involved in determining the matrix function F. The determination of F requires the solution of n homogeneous linear systems (7.53). The next section describes a power-series method that is sometimes used to solve homogeneous linear systems.

We remind the reader once more that the proof of Theorem 7.16 was based on Theorem

7.14, the existence theorem for homogeneous linear systems, which we have not yet proved.

7.19 A power-series method for solving homogeneous linear systems

Consider a homogeneous linear system

Y’(x) =

Y(x)

Y(0) = B,

has a power-series expansion in convergent in some open interval containing the origin, say

in which the given x matrix

where the coefficients A,, A,, . . . are given x matrices. Let us try to find a power-series solution of the form

221 with vector coefficients

the initial condition will be satisfied by taking

Since Y(0) =

the prescribed initial vector. To determine the remaining coefficients we substitute the power series for Y(x) in the differential equation and equate coefficients of like powers of x to obtain the following system of equations:

for

These equations can be solved in succession for the vectors . . . . If the resulting power series for Y(x) converges in some interval

then Y(x) will be a solution of the initial-value problem (7.54) in the interval

where = min

for 1, so the system of equations in (7.55) becomes

For example, if A(x) is a constant matrix A, then A,

and

for

Solving these equations in succession we find

for

Therefore the series solution in this case becomes

This agrees with the result obtained earlier for homogeneous linear systems with constant coefficients.

7.20 Exercises

1. Let be a real-valued function and Q an x 1 matrix function, both continuous on an interval Let be an

constant matrix. Prove that the initial-value problem

Y(u) =

has the solution

on J, where

2. Consider the special case of Exercise 1 in which is nonsingular, a = 0, = 2x, and Q(x) =

where C is a constant vector. Show that the solution becomes

Let Q(t), Y(r), and be x 1 column matrices. Assume that

3. Let

be an

matrix function and let E(t) =

Systems of

equations

on an open interval J. If a and if A’ and Q are continuous on prove that the initial-value

problem

Y’(t)

Y(u) = B,

has the following solution on J:

Let E(t) = This exercise describes examples of matrix functions A(t) for which E’(t) = (a) Let A(t) =

constant matrix and r is a positive integer. Prove that E’(t) =

where A is an

on (b) Let A(t) be a polynomial in t with matrix coefficients, say

A(t) =

PA,.,

where the coefficients commute, = for all r and Prove that E’(t) = on

(c) Solve the homogeneous linear system

Y’(r) =

+ tA)Y(t),

Y(0) = B

on the interval (

where A is an x constant matrix.

Assume that the x n matrix function A(x) has a power-series expansion convergent for

Develop a power-series procedure for solving the following homogeneous linear system of second order:

Y’(0) = c. Consider the second-order system Y”(x) +

Y”(x) =

with Y(0) = B,

B, Y’(0) = where A is a constant x matrix. Prove that the system has the power-series solution

A Y(x) =

with Y(0) =

convergent for x

7.21 Proof of the existence theorem by the method of successive approximations

In this section we prove the existence and uniqueness of a solution for any homogeneous linear system

Y’(t) = A(t) Y(t),

where A(t) is an n x n matrix function, continuous on an open interval We shall prove that for any point a in and any given initial-vector there exists exactly one solution Y(t) on satisfying the initial condition Y(a) = .

Proof of the existence theorem by the method of successive approximations

We shall use the method of successive approximations, an iterative method which also has applications in many other problems. The method was first published by Liouville in 1838 in connection with the study of linear differential equations of second order. It was later extended by J.

in 1864, L. Fuchs in 1870, and G. in 1888 to the study of linear equations of order n. In 1890

extended the method to encompass nonlinear differential equations as well. In recognition of his fundamental contributions, some writers refer to the method as

method. The method is not only of theoretical interest but can also be used to obtain numerical approximations to solutions in some cases. The method begins with an initial guess at a solution of the equation (7.56). We take as initial guess the given initial vector although this is not essential. We then substitute this guess in the right-hand member of the equation and obtain a new differential equation,

Y’(t) =

In this equation the right-hand member no longer contains the unknown function, so the equation can be solved immediately by integrating both members from a to x, where x is an arbitrary point in J. This equation has exactly one solution

on J satisfying the initial condition Y,(a) =

namely

=B+

A(t)B dt.

Now we replace Y(t) by in the right-hand member of the original differential equation (7.56) to obtain a new differential equation

Y’(t) =

This equation has a unique solution on J with Y,(a) (7.57)

dt.

We then substitute in the right-hand member of (7.56) and solve the resulting equation to determine

with

and so on. This process generates a sequence of functions where = and where

is determined from by the recursion formula

for

Our goal is to prove that the sequence of functions so defined converges to a limit function

is a solution of the differential equation (7.56) on Jand which also satisfies the initial condition Y(a) = B. The functions

, , . . . are called successive approxi- mations to Y. Before we investigate the convergence of the process we illustrate the method with an example.

EXAMPLE . Consider the initial-value problem Y’(t) = A Y(t), Y(0) = B, where A is a constant x matrix. We know that the solution is given by the formula Y(x) = for all real x. We will show how this solution can be obtained by the method of successive

Systems of

equations

The initial guess is Y,(x) = B. The recursion formula (7.58) gives us

By induction we find

=B+

The sum on the right is a partial sum of the series for

Therefore when k

we find

lim

for all x. Thus, in this example we can show directly that the successive approximations converge to a solution of the initial-value problem on

co, +

Proof of convergence of the sequence of successive approximations. We return now to the general sequence defined by the recursion formula (7.58). To prove that the sequence converges we write each term

as a telescoping sum,

k-l

To prove that

tends to a limit as k

co we shall prove that the infinite series

eonverges for each x in For this purpose it suffices to prove that the series

II

converges. In this series we use the matrix norm introduced in Section 7.3; the norm of a matrix is the sum of the absolute values of all its entries.

containing a. We shall prove that for every x in

Consider a closed and bounded subinterval

of

the series in (7.61) is dominated by a convergent series of constants inde- pendent of x. This implies that the series converges uniformly on

To estimate the size of the terms in (7.61) we use the recursion formula repeatedly. Initially, we have

A(t)B dt.

Proof of the existence theorem by the method of successive approximations

For simplicity, we assume that a < x . Then we can write

Since each entry of A(t) is continuous on J, each entry is bounded on the closed bounded interval

is the sum of the bounds of all the entries of A(t) on the interval

Therefore

where

Therefore the integrand in (7.62) is bounded by

(The number

depends on

so we have

for all x > a in Now we use the recursion formula once more to express the difference

in terms of

and then use the estimate just obtained for to obtain

a)

for all x > a in By induction we find

for m = 0,

and for all x > a in If x < a a similar argument gives the same inequality with appearing instead of (x

a). If we denote by L the length of the interval then we have L for all in

so we obtain the estimate

for

and for all in Therefore the series in (7.61) is dominated by the convergent series

This proves that the series in (7.61) converges uniformly on

The foregoing argument shows that the sequence of successive approximations always converges and the convergence is uniform on

Let Y denote the limit function. That is, define Y(x) for each x in

by the equation

= lim

Systems of

equations

We shall prove that Y has the following properties: (a) Y is continuous on (b) Y(x) = B +

for all in

(c) Y(a) = B and

Y’(x) =

for all x in

Part (c) shows that Y is a solution of the initial-value problem on Proof of (a). Each function

is a column matrix whose entries are scalar functions, continuous on

Each entry of the limit function Y is the limit of a uniformly convergent sequence of continuous functions so, by Theorem 11.1 of Volume I, each entry of Y is also continuous on

Therefore Y itself is continuous on Proof of(b). The recursion formula (7.58) states that

dt .

Therefore Y(x) = lim

The interchange of the limit symbol with the integral sign is valid because of the uniform convergence of the sequence

on

Proof of (c). The equation Y(a) = B follows at once from (b). Because of (a), the integrand in (b) is continuous on

so, by the first fundamental theorem of calculus, Y’(x) exists and equals A(x) Y(x) on The interval

was any closed and bounded subinterval of J containing a. If is enlarged, the process for obtaining Y(x) doesn’t change because it only involves integration from a to x. Since for every x in J there is a closed bounded subinterval of J containing a and x, a solution exists over the full interval J.

A(t) is continuous on an open interval J, the

THEOREM 7.17. UNIQUENESS THEOREM FOR HOMOGENEOUS LINEAR SYSTEMS.

equation Y’(t) = A(t)

has at most one solution on J satisfying a given initial condition Y(a) = B . Proof. Let Y and Z be two solutions on J. Let

be any closed and bounded subinterval of J containing a. We will prove that

= Y(x) for every x in This implies that Z = Y on the full interval J. Since both Y and Z are solutions we have

Z’(t)

Y(t)}.

The method of successive approximations applied tojrst-order nonlinear systems 227 Choose x in

and integrate this equation from a to x to obtain

Z(x)

Y(x) =

Y(t)} dt.

This implies the inequality (7.63) where Mis an upper bound for

be an upper bound for the continuous function

Let

on

Then the inequality (7.63) gives us

(7.64) Using (7.64) in the right-hand member of (7.63) we obtain

dt =

By induction we find (7.65)

When m co the right-hand member approaches 0, so Z(x) = Y(x). This completes the proof.

The results of this section can be summarized in the following existence-uniqueness theorem.

THEOREM 7.18. Let A be an n x n matrix function continuous on an open interval J. If a J and if B is any n-dimensional vector, the homogeneous linear system

Y’(t) = A(t) Y(t),

Y(a)

B,

has one and only one n-dimensional vector solution on J.

7.22 The method of successive approximations applied to first-order nonlinear systems

The method of successive approximations can also be applied to some nonlinear systems. Consider a first-order system of the form

Y’ =

where Fis a given n-dimensional vector-valued function, and Yis an unknown n-dimensional vector-valued function to be determined. We seek a solution Y which satisfies the equation

Systems of

equations

for each in some interval and which also satisfies a given initial condition, say Y(a) = where a and is a given n-dimensional vector. In a manner parallel to the linear case, we construct a sequence of successive approxi- mations

in terms of by the recursion formula

. . . , by taking

= B and defining

= + for k = 0, 1, 2, . . . .

Under certain conditions on F, this sequence will converge to a limit function Y which will satisfy the given differential equation and the given initial condition.

Before we investigate the convergence of the process we discuss some one-dimensional examples chosen to illustrate some of the difficulties that can arise in practice.

EXAMPLE 1. Consider the nonlinear initial-value problem y’ = + withy = 0 when x = 0. We shall compute a few approximations to the solution. We choose

=0 and determine the next three approximations as follows:

dt =

It is now apparent that a great deal of labor will be needed to compute further approxi- mations. For example, the next two approximations

will be polynomials of degrees 3 1 and 63, respectively.

and

The next example exhibits a further difficulty that can arise in the computation of the successive approximations.

EXAMPLE 2. Consider the nonlinear initial-value problem y’ = 2x + , withy = 0 when = 0. We begin with the initial guess

= 0 and we find

Here further progress is impeded by the fact that the last integral cannot be evaluated in terms of elementary functions. However, for a given it is possible to calculate a numerical approximation to the integral and thereby obtain an approximation to

Because of the difficulties displayed in the last two examples, the method of successive approximations is sometimes not very useful for the explicit determination of solutions in practice. The real value of the method is its use in establishing existence theorems.

Proof of an existence-uniqueness theorem for nonlinear systems

7.23 Proof of an existence-uniqueness theorem for first-order nonlinear systems

We turn now to an existence-uniqueness theorem for first-order nonlinear systems. By placing suitable restrictions on the function which appears in the right-hand member of the differential equation

we can extend the method of proof used for the linear case in Section 7.21. Let Jdenote the open interval over which we seek a solution. Assume a J and let B be a given n-dimensional vector. Let denote a set in (n + 1)-space given by

where h > 0 and k > 0. [If

= 1 this is a rectangle with center at (a, B) and with base 2h

and altitude We assume that the domain of F includes a set S of this type and that F is bounded on S, say

(7.68) for all (x, Y) in

where

is a positive constant.

Next, we assume that the composite function G(x) = Y(x)) is continuous on the

interval (a

h , a + h) for every function Y which is continuous on (a

h , a + h) and

which has the property that (x, Y(x)) for all x in (a

h,a + h). This assumption guarantees the existence of the integrals that occur in the method of successive approxi- mations, and it also implies continuity of the functions so constructed.

Finally, we assume that F satisfies a condition of the form