Introduction DETERMINATION OF PARAMETER ON LINEAR REGRESSION

INDOMS,14 April 2012 1 VARIOUS APPLICATIONS OF LINEAR ALGEBRA H.A. Parhusip Satya Wacana Christian University Jl. Diponegoro 55 -60 Salatiga 50711 hannaariniparhusipyahoo.co.id ABSTRACT Various applications of linear algebra are presented here. Undergraduate students in mathematics are mainly have no enough background in integrating linear algebra from different subjects and have difficulties to deal with data for applying linear algebra. On the other hand, students have learnt many properties in linear algebra but students are lacking to work with in practical sense. Therefore this paper shows some guidelines to handle this problem through some examples taken from some researchs on linear algebra by using data from surrounding. Multivariate regression is introduced which is used to fit closing stock prices as an example. QR decomposition is recalled in order to solve a linear system with no solution in the classical linear algebra lectures. Modelling on stevioside with a two dimensional quadratic function, logistic model of crown diameter of Kailan, discriminant analysis of foods, protein content of beans, Belousov- Zhabotinsky BZ reaction as a model on differential equations are some examples shown on this paper. These applications are mainly dealt with parameters determination that lead to linear system and nonlinear system. Least square and Newton method are the basic used methods in this paper which are solved by fmincon and lsqnonlin and provided by MATLAB. One needs to know a particular software language such as MATLAB, R to reinvent the results in this paper. Keywords: least square, positive definite, convex-nonconvex functions,discriminant analysis, matrix covariance, eigenvalues-eigenvectors.

I. INTRODUCTION

One of big branches in mathematics is linear algebra which must be taken by each student in undergraduate mathematics. It may also be an obligation in computer science and other sciences where a numerical computation is necessary. In multivariate analysis, one will meet lots of linear algebra that we will also present in this paper. The used data are taken from Middle Java and its surroundings such that undergaduate students may also apply to their practical problems which may be tasked by their lecturers. The other problem can also be obtained as an analysis of given models by other researchers in applied sciences. This paper will focus on linear systems and nonlinear systems that obtained from parameters determination. Compared to some existed journals in applied of linear algebra, this paper is very narrow. There are some collections of informations collected by Axelsson and Chen, 2011 that dealing with matrix computations and nonlinear equations require lots of numerical methods hence many linear algebra involve. However, using data typically from Salatiga and its surroundings for implementing linear algebra is still lacking. This paper provides some examples of using linear algebra in this sense.

II. DETERMINATION OF PARAMETER ON LINEAR REGRESSION

2.1 Introduction

This part is mostly taken from Peressini,et.all, 1988, Chapter 4. Assume we have a set of data presented on , ,..., , 1 1 n n s t s t and we assume that there exists a continuous function f such that s=ft in the form of polynomial on order k k t x t x x t p s      ... 1 . 1 INDOMS,14 April 2012 2 In this case the independent variable is t. The variables k x x ,..., are the polynomial coefficients based on the given data. We need to find k x x ,..., such that the deviation of i i t p s  as small as possible. The function must fit to all points i=1,...,n and hence the problem becomes a minimization problem in the sense of least square. This means that we have to minimize                     n i k j j i j i n i i i k t x s t p s x x x R 1 2 1 2 1 ,..., , . 2a To proceed further, one writes the residual function 2a in the matrix-vector notation. We may write the Eq. 2a using norm vector by introducing                k n n n k k t t t t t t t t t A 2 2 2 2 2 1 2 1 1 1 1 1       ,              n s s s b   2 1 ,              k x x x x   1 . We have                     n i k j j i j i n i i i k t x s t p s x x x R 1 2 1 2 1 ,..., , =     x A b x A b x A b            2 = x A x A x A b b b            2 = x A A x x b A b b T T            2 . Minimizing R means that we need to solve R  which denotes the derivative of R with respect to each variable. Note that the unknown in this case is k x x x ,..., , 1 . Thus k k x R x R R             ,..., =   x A A x x b A b b T T             2 = x A A b A T T   2 2   . 2b Analogously in Calculus, the minimizer of R is the value of x x    which satisfies   R . From Eq. 2b and using x x    , we get 2 2       x A A b A T T or x A A T  = b A T  . 2c. Thus the determination of parameter in this case the coefficients of polynomial in Eq. 1 is deduced into a problem of solving linear system 2c. The solution x x    becomes a minimizer if only if A A T is positive definite. Additionally, the matrix A A T is invertible if only if A A T is positive definite. Therefore we get x  = 1  A A T b A T  . 2d 2.2 Furthermore in Regression Assume we have a dependent variable y  =[ T n y y ] ,..., 1 and p independent variables. We put our data as a matrix random variable X with p variables X 1 , . . . , Xp where each of its coloums contains n observations.            p p X X X y ... 2 2 1 1  . 1 Each observation satisfies i pi p i i i i x x x x y              ... 1 3 1 2 1 1 , i=1,...,n . 2 In matrix-vector notation, Eq. 2 becomes 1 1 1 1 1 nx x p p x n nx Z Y       . 3 INDOMS,14 April 2012 3 Consider that 1  p x n Z contains 1 on the first column and the second column up to p+1-th column are the vector-vector X 1 , . . . , Xp. Regression vector 1 1 x p    =   T p     1 is obtained by employing the least square, i.e we have to minimize     2 1 1              n i p j ij j i x y R     . 4a In lecture of linear algebra , we have norm notation, and hence the Eq.4a can be presented as     R =              Z y Z y Z y      2 =          Z Z Z y y y      2 . 4b Minimizing 5b means that we have to derive the first derivatives of 5b with respect to each variable in this case, the variables are p    ,... , 1 . We have T p R R R                 ,..., =            Z Z Z y y y       2 = 2 2        Z Z y Z T T . Thus, the estimated   is obtained by solving    Z Z y Z T T  , or   y Z Z Z T T   1    . 5 Note that the invers matrix does not always exist as shown in linear algebra lecture. Therefore we need to add the condition that   can be obtained if only if   1  Z Z T exists which means that the minimizer R exists as long as the matrix Z Z T is positive definite. Additionally, the Z Z T is obtained from R R     = Z Z T which is the Hessian matrix of R and Z has full rank p+1 n  in order   1  Z Z T exists John and Wichern, 2007,page 290 for this proof. Example 1. Suppose we have the closing stock prices of 45 companies in LQ45 Index in the period of February-July 2011 Pratama,et.all. 2011 which partly shown in Table 1. In this study, we have p= 49 variables and each variable contains n = 246 samples. Assume that LQ45 index is the respond variable Y, and we assume the LQ45 index depends linearly the closing stock prices. Thank to MATLAB that it provides us to solve Eq.5 and hence we get the linear regression which is shown in Figure 1. The least square approximation shows a good approximation. Which company is allowed to join LQ45 Index to the next period ?a practical purpose. Therefore, one needs information which company contributes significantly basing on linear regression. If the number of variables is too large, one may employ the principal component analysis to select dominant variables. This method is based on working with the largest eigenvalue of the covariance matrix and linear combination of original variables where its weights are obtained by each component of eigen vector of the largest eigenvalue. One example of using this method is stock prices analysis from 8 sectors Parhusip, et.all. 2010. How do we solve Eq.5 if we do not have a positive definite matrix Z ?In the case that we have no solution, but we have to find its approximation. One way to deal with this problem is called QR decomposition Peressini,et.all, 1988. This method basicly decomposes Z=A = QR and denote     x and y A y Z b T T      such that A b x    or    Z Z y Z T T  . We use A b x    to have a familiar notation. This method relies on Remember : 2 1 2 . x x x x x x n i T i      INDOMS,14 April 2012 4 orthogonalization Gram-Schmidt to construct of matrix A where A A T is not positive definite. Table 1. Some of the closing stock prices of 49 companies in LQ45 Index left. The related index LQ45 right, 7-th coloumn. Source of data : Fika and Parhusip,2011 Figure 1. Linear approximation of LQ45 Index in the period February 2011-July 2011:data; o: approximation. Let Q be the matrix with its column-column vectors are 1 ,..., m u u   obtained by orthogonalization Gram-Schmidt. Since Q is an orthonormal matrix m x n , hence I Q Q T  . Additionally, Q can be considered a reduced matrix A becomes Q using a relation Q=AL with L is a triangular matrix n x n. The matrix L has an invers since Q contains independent linear coloumn-coloum vectors and denote R= 1  L to be an upper triangular matrix n x n or A Q R T  since I Q Q T  . We will use these explanations to solve 5 , i.e x  = 1  A A T b A T  , with Z A  , y Z b T    and     x . We employ Z=QR such that we obtain x  = 1  Z Z T b Z T  =       b QR QR QR T T  1  =   b Q R QR Q R T T T T  1  = b IQ I R T   1 = b Q R T  1  since I Q Q T  6 Note that Eq.6 can be solved by back substitution since R is an upper triangular matrix . The last problem in the linear system is that the system contains infinitely many solutions underdetermined system. Old fashion of linear algebra ends up to this statement. The norm minimum shows a possible solution which is considerable good enough but this topic is not shown on this paper see Peressini, et.all, 1988, page 145-149 for detail. Note that the norm minimum relies on the invers of Gramm-matrix T AA . If A contains big numbers or rows let A have its dimension n x p, hence T A has dimension p x n. Suppose n=246 and p=2 , then the dimension of G is n x n = 246 x 246. Example 2. Let us assume we have the closing stock prices from 2 companies given from Example 1. The dependent variable is also the LQ45 Index. Thus in this case, n = 246 with 2 independent variables. We have a linear system with Z has dimension n x 1 + 2 and the right hand side vector has dimension n x 1. We employ least square and QR decomposition to this set of data, and we obtain the error 3.0339 and 3.5426 respectively. Both have shown nearly the same result as shown in Figure 2. INDOMS,14 April 2012 5 Figure 2a . The linear regression of closing stock prices from 2 companies given from Example 1 with the dependent variable is the LQ45 Index using least square. Figure 2b . The linear regression of closing stock prices from 2 companies given from Example 1 with the dependent variable is also the LQ45 Index using QR decompotition .

III. PARAMETERS DETERMINATION ON NONLINEAR LEAST SQUARE