PROS Riska YF, Suhartono, Santi PR Parameter estimation fulltext

Proceedings of the IConSSE FSM SWCU (2015), pp. MA.8–12

MA.8

ISBN: 978-602-1047-21-7

Parameter estimation of kernel logistic regression
Riska Yanu Fa’rifah*, Suhartono, Santi Puteri Rahayu
Department of Statistics, Institut Teknologi Sepuluh Nopember, Sukolilo, Surabaya 60111, Indonesia

Abstract
Logistic regression (LR) is a nonlinear classification method, often used for binary data
sets. Over fitting the training data sets may arise in LR, especially when the data sets are
used have high-dimensional. One of approaches to reduce over fitting is through
regularized LR method. Regularized LR can be defined as log-likelihood function of LR
adding with regularization parameter. There are regularized optimization problem,
because the Loss function (deviance) in regularized LR nonlinear. To minimize this
problem, need a linear combination method of regularized LR, known as kernel logistic
regression (KLR). KLR is a nonlinear classifier. KLR provide higher classification accuracy
of small to medium sample size data sets when compared with LR. With truncated
newton method, estimation KLR using maximum likelihood estimation (MLE) can be

optimum.
Keywords kernel logistic regression, logistic regression, MLE, regularized logistic
regression, truncated Newton

1. Introduction
Regression is one of statistical method that described the causal relationship between
response and predictors (Draper & Smith, 1998). If the response is categorical data
(nonmetric), then the analysis which can be used is a classification method, such as logistic
regression (LR). Over fitting the training data sets may arise, especially if the data sets have
a high-dimensional (Hosmer & Lemeshow, 2000). One of approaches to reduce over fitting is
a quadratic regularization, known as regularized LR (Maalouf, 2009). Regularized LR can be
formed from adding a regularization parameter on log-likelihood function of LR. If the
analysis using the small to medium sample size, loss function (deviance) produced not
minimum, because deviance is nonlinear in its parameters. This situation due to parameter
estimation of KLR using MLE is not optimum. It can be solved by making a linear combination
of regularized LR, known as kernel logistic regression (Maalouf, 2009).
KLR is a nonlinear classifier method. KLR is combination of regularized LR and kernel.
Parameter estimation of KLR using MLE not close form, so to optimize the parameter
estimation use numerical method. This method named newton raphson (Minka, 2003).
However, newton raphson does not provides an optimum estimation. It caused by highdimensional of hessian matrix. Thus, MaaLouf et al. (2010) adding conjugate gradient

algorithm on truncated newton method. This method was first used (Komarek & Moore,
2005) to get the parameter estimation method regularized LR.

*

Corresponding author. Tel.: +62 812 4944 4853; E-mail address: riska.yanu@gmail.com

SWUP

R.Y. Fa’rifah, Suhartono, S.P. Rahayu

MA.9

2.

Materials and methods

2.1 Logistic regression
LR is one of linear classifier method. LR is used to determine the relationship between
the categorical dependent variable with one or more predictors both of categorical and

continuous data (Agresti, 2002). Given the LR model:
ž Ÿ

=

¡¢ Ÿ„ ,
" ¡¢ Ÿ„ ,

.

(1)

From Eq. (1) obtained logit function can be defined as
logitVž Ÿ W = Ÿ , = ,T + ,• Ÿ + ,¤ Ÿ ¥ + ,¦ Ÿ ¢ ,
for = 1,2, … , , and § = 1,2, … , , where
and
are respectively the number of
observations and the number of predictors.
Parameter estimates obtained from the MLE method, the likelihood function:
©


7

< , =¨

= ¨ •ž Ÿ

ª#•

and objective function (log-likelihood):
ln<

7

#

= « ¬¨ •ž Ÿ
#

©


…„

…„

-1 − ž Ÿ .

-1 − ž Ÿ .

o…„

o…„



–-

= ®4Eª Ÿ , − ln-• + exp Ÿ , .5
ª#•


Parameter estimates obtained from the derivative of the log-likelihood function. Results of
the first derivative is,
H•€ I ,
H,J

=

H ∑°
„±›4…„ Ÿ „ , o•€-•" ¡¢ Ÿ „ , .5
H^ J

(2)
² , =
´ Ÿ W = K/ 9 − µ
-Ÿ .V − π
Eq. (2) also known as the gradient vector. That equation still contains parameters. Thus the
settlement requires numerical methods. The method used is the Newton–Raphson, by the
equation:
B ¶" = ,
B ¶ + · ¶ , ¸ ¹ℎ · » = −’-, » .o• ‘-, ¶ ..

,
(3)
Based on the Eq. (3), then to obtain parameter estimation in the LR using Newton–Raphson
iterations, hessian matrix required. Hessian matrix can be defined as
H” •€I ,
H,J H,

∑7#

/

= ∑¼# -Ÿ / .4žC Ÿ -1 − žC Ÿ .5 Ÿ

From equation (3) and (4) obtained:

B
,

B
=,

B
=,

= 3 / ½3

(4)

+¾ ¶
o

− ’-, ¶ . ‘-, ¶ .
= 3 / ½3 o 3 / ½¿ ¶ ,


o
B +½ 9−µ
´ . Over fitting the training may occur when high-dimensional
with ¿ = ,
data sets are used. Over fitting can be reduced by using regularized LR (Maalouf, 2009) and
(Maalouf et al., 2010).

¶"



2.2 Regularized logistic regression
Regularized LR is classification method derived from the addition of the regularization
parameter in log-likelihood function LR. Log-likelihood function on regularized LR can be
defined as

SWUP

Parameter estimation of kernel logistic regression

ln < ,



7

= ®4

#

MA.10

G
Ÿ ,∗ − ln-1 + exp Ÿ , .5 − ‖,∗ ‖ ,
2
À

where, G is regularization parameter and ‖,∗ ‖ is a regularization (penalty) term. By using
the MLE method and iteratively re-weighted least square (IRLS) or Newton–Raphson method
the parameter estimation of regularized can be obtained:
B∗ ¶" = 3 / ½3 + GN o 3 / ½¿ ∗ ¶ ,
,



o
B
´ .

with ¿ = ,
+½ y−µ
Produced deviance function is nonlinear in parameter, so that parameter estimation
using the MLE results did not maximum. To overcome these problems (Maalouf, 2009) and
(Maalouf, Trafalis, and Adrianto, 2010) adding conjugate gradient method on Newton
Raphson iteration and named truncated-iteratively reweighted least square (TR-IRLS) or
truncated newton.
3.

Results and discussion
Kernel logistic regression is a combination of regularized LR and the kernel function.
Besides adding a conjugate gradient method on iteratively reweighted least square, Maalouf
(2009) and Maalouf, Trafalis, and Adrianto (2010) explains that deviance is nonlinear to be
solved by using a linear combination of. Form linear combinations are:
,⊗ = 3 / Ã = ∑7# m¥ Ÿ .
So the logit form of KLR is
logitVž Ä W = ∑7Å# mÅ -Æ Ÿ , Æ ŸÅ .
= ∑7Å# mÅ Ç- Ÿ , ŸÅ .
= Ä Ã,
where Ä is an element of the kernel on the i rows in the kernel matrix Ç- Ÿ , ŸÅ ..
The log-likelihood function of KLR is
À
(5)
ln < m = ∑7# lnVž Ä W + ∑7# 1 − lnV1 − ž Ä W − Ã/ ÇÃ.
To get the parameter estimation, Eq. (5) is derived to as much as two times, i.e., the first
derivative to obtain the gradient vector. Here are the results of the first derivative:
HÅ7I Ã
H
À
= J p∑7# VÄ ÃW − ∑7# lnV1 + exp Ä Ã W − Ã/ ÇÃU
J




= ∑7## O -Ä / . −
= ∑7# -Ä / . È
∑7#
/



-Ä J
„ . ¡¢ Ä „ Ã

" ¡¢-Ä J
„ Ã.
¡¢-Ä J
„ Ã.

P − GÇÃ

É − GÇÃ

" ¡¢-Ä J
„ Ã.

=
-Ä .- − ž Ä . − GÇÃ
´ .
´ − GÇÃ
‘ Ã = Ç 9−µ
The first derivative is not close form, because it still contains parameters. So the numerical
methods required. The method is called truncated newton. At this iteration method, there
are two algorithms, the first algorithm is the outer loop and the second is the inner loop.
Outer iteration loop is to obtain estimates of the parameters with the IRLS method, can be
defined as:
´ ¶o = Ã
´ ¶ +· ¶
Ã
o
´ ¶ − -’ Ã ¶ . ‘ Ã ¶

(6)
To get the hessian matrix, then Eq. (6) is derived twice against. The second derivative is
/

SWUP

R.Y. Fa’rifah, Suhartono, S.P. Rahayu

MA.11

‡ ln < Ã
‡ ‡Æ Ã
=
Ê
Ë
/
‡Ã ‡Ã/
‡Ã‡Ã
=

=

H
ÄJ
/
7
„ ¡¢ Ä „ Ì
Ä

S∑
#

" ¡¢ Ä „ Ì
¡¢ Ä Ì
− ∑7# Ä / • " ¡¢ „Ä Ì – •1

− ∑7# Ä / žC Ä -1 − žC Ä
/

− GÇÃÍ



¡¢ Ä „ Ì
" ¡¢ Ä „ Ì

– Ä − GÇ

=
.Ä − GÇ
= −Ç ½Ç − GÇ,
where ½ is a diagonal of žC Ä 1 − žC Ä , with = 1,2, … , . The estimation of KLR is
´ ¶" = Ã
´ ¶ +· ¶
Ã
´


Î

− •’-Ã



o

.–

‘-Ã ¶ .

´¶
´ ¶ — −Ç / ½Ç − GÇ o Ç / 9 − µ
´ − GÇÃ

´¶.
´ ¶ + Ç / ½Ç + GÇ o Ç / 9 − µ
´ − GÇÃ


/
o
/

´ = Ç ½Ç + GÇ
´ , then
Ç ½Ç + GÇ Ã
With Ã
/
o
/
¶"
´ ¶ + Ç / ½Ç + GÇ o Ç / 9 − µ
´¶
´
´ − GÇÃ
= Ç ½Ç + GÇ
Ç ½Ç + GÇ Ã
Ã


/
o
/
/
´ +Ç 9−µ
´ Ñ
´ − GÇÃ
= Ç ½Ç + GÇ Ð Ç ½Ç + GÇ Ã


/
o
/
/
´¶Ñ
´ + GÇÃ
´ +Ç 9−µ
´ − GÇÃ
= Ç ½Ç + GÇ ÐÇ ½ÇÃ
´ ¶ + Ç/ 9 − µ
´ Ñ
= Ç / ½Ç + GÇ o ÐÇ / ½ÇÃ

/
o
/
o
´ +½ 9−µ
´ Ñ
= Ç ½Ç + GÇ Ç ½ÐÇÃ
(7)
From (7), the parameter estimation KLR using MLE can be written:
´ ¶" = Ç / ½Ç + GÇ o Ç / ½¿ ¶ .
Ã
Based on outer loop algorithm above, the parameter estimation of KLR does not optimum. It
is caused the hessian matrix has a high dimension, making it difficult to get the inverse. To
solve the problem numerically, the algorithm followed by the inner loop, which uses linear
conjugate gradient method (CG) to obtain the optimum parameter estimation. This method
has the equation:
1 ¶"
´
´ ¶" − Ã
´ ¶" Ç / ½¿ ¶ .
{ ¶" = Ã
Ç / ½Ç + GÇ Ã
2

4. Conclusion and remarks

´ ¶" =
Parameter estimation of KLR using MLE and truncated Newton method is Ã
o
/

Ç ½Ç + GÇ Ç ½¿ .To get the parameter estimation of KLR, this research use two
algorithms, they are outer loop and inner loop. Outer loop is the solution of parameter
estimation using MLE witch not close form. This algorithm named IRLS. The second algorithm
is conjugate gradient (CG). The result of this research can be applied to analyze of binary and
multinomial classification.
/

Acknowledgment
The first author thanks to Directorate of Higher Education, Ministry of Education and
Culture of the Republic of Indonesia which has provided financial support through
scholarships BPPDN 2013-2015.

References
Agresti, A. (2002). Categorical data analysis (3rd ed.). John Wiley & Sons, Inc., New York.

SWUP

Parameter estimation of kernel logistic regression

MA.12

Draper, N.R., & Smith, H. (1998). Applied regression analysis (3rd ed.). John Wiley & Sons, Inc., New
York.
Hosmer, D.W., & Lemeshow, S. (2000). Applied logistic regression (2nd ed.). John Wiley & Sons, Inc.,
New York.
Komarek, P., & Moore, A. (2005). Making logistic regression a core data mining tool with TR-IRLS.
Proceedings of the Fifth IEEE International Conference on Data Mining, 685–688.
Maalouf, M. (2009). Robust weighted kernel logistic regression in imbalanced and rare events data.
Dissertation for the Degree of Doctor of Philosophy, University of Oklahoma, Oklahoma.
Maalouf, M., Trafalis, T.B., & Adrianto, I. (2010). Kernel logistic regression using truncated Newton
method. Springer, Berlin.
Minka, T.P. (2003). A comparison of numerical optimizers for logistic regression. Tech. Rep.,
Department of Statistics, Carnegie Mellon University.

SWUP