Credit risk modeling using logistic ridge regression

CREDIIT RISK MODELI
M
ING
USING
G LOGIS
STIC RID
DGE REG
GRESSIO
ON

R
RAKHMA
AWATI

DEPARTM
D
MENT OF
F STATISTICS
FACUL
LTY OF MATHEM
M

MATICS
S AND NA
ATURAL SCIENC
CES
BOGO
OR AGRIICULTUR
RAL UNIIVERSIT
TY
2011
1

ABSTRACT
RAKHMAWATI. Credit Risk Modeling Using Logistic Ridge Regression. Supervised by AAM
ALAMUDI and DIAN KUSUMANINGRUM.
The growth of credit of national banking may cause a greater risk faced by banks. One thing
we must highlight is a way to determine whether the new applicant will be good in loan
repayments. A well known and widely used method for classifying the new applicant of credit is
Logistic Regression. Multicollinearity is a problem that is frequently encountered in model
building. Usually, variable selection method is used for handling this problem. But sometimes it
creates a new problem when the important variable does not enter to the model. Logistic Ridge

Regression could be an alternative in logistic regression when multicollinearity exists. The
advantage of this method is that it can handle multicollinearity without deleting any predictor
variables. This research compared the performance of logistic ridge regression and logistic
regression with variable selection to predict the collectability status of new applicants of credit.
There were 1000 observations of German Credit data set. The 740 observations were used for
modeling and 260 observations were used for validation. Backward was the best among other
selection variable methods which had the highest c statistic and the model was fit by Hosmer and
Lemeshow Goodness-of-Fit Test. By using backward logistic regression, it showed that among 17
variables there were eight variables which were significant in the wald test. There were many
significant correlations among the predictors but the highest correlation coefficient was 0.628
which exist between duration of credit (V1) and credit amount (V2).The ridge parameter or λ was
0.001. The optimal cut point of backward logistic regression was 0.680, while for logistic ridge
regression was 0.677. By comparing the c statistic and the total correctly predicted cases, we can
see that the logistic ridge regression was better than backward logistic regression in training data.
However, with testing data (validation), backward logistic regression was better. To have a better
understanding of the model with higher correlation values between V1 and V2, V2* was generated
to replace V2 and logistic regression with variable selection and ridge were also built. The result
pointed out that logistic ridge regression has a little higher capability to predict the new applicant’s
collectability status than logistic regression with variable selection.
Key words: credit risk modeling, logistic ridge regression, multicollinearity


CREDIT RISK MODELING
USING LOGISTIC RIDGE REGRESSION

RAKHMAWATI
G14061721

Thesis as a requirement for Bachelor Degree in Statistics

DEPARTMENT OF STATISTICS
FACULTY OF MATHEMATICS AND NATURAL SCIENCES
BOGOR AGRICULTURAL UNIVERSITY
2011

Title
: Credit Risk Modeling Using Logistic Ridge Regression
Author : Rakhmawati
NIM
: G14061721


Approved by :
Advisor I

Advisor II

Ir. Aam Alamudi, M.Si
NIP. 19650112 199103 1 001

Dian Kusumaningrum, S.Si, M,Si

Acknowledged by :
Head of Department Statistics

Dr. Ir. Hari Wijayanto, M.Si
NIP. 19650421 199002 1 001

Graduation date:

BIOGRAPHY
Rakhmawati was born in Salatiga on June, 16th 1988 as the daughter of Paijo Nurhadi Santoso

and Siti Naimah. She has a big brother named Nur Rahman Istianto and a little brother named
Aulia Kharis Kurniawan. After graduating from SMA Negeri 1 Salatiga in 2006, she continued her
studies in Bogor Agricultural University through USMI. She took Statistics as her major in Bogor
Agricultural University. She chose Information System as her Minor Subject and also some
supporting courses from Department of Mathematics.
She was a staff of Database and Computational Department in Gamma Sigma Beta, an
organization of statistics student in Bogor Agricultural University. On the 8th semester, she had a
chance to take an internship program at PT Ganesha Cipta Informatika. There, she with her partner
made a SAS program about risk management to calculate Value at Risk of market risk.

ACKNOWLEDGEMENTS

Alhamdulillah, thanks to Allah SWT Who gives me love, opportunity, health, and capability
in finalizing my research which is entitled Credit Risk Modeling Using Logistic Ridge Regression.
I recognized that the completion of my research would not be done without help from other
people. I want to say thanks to Mr. Aam Alamudi and also Mrs. Dian Kusumaningrum as my
advisors, for their critics, ideas, and also their patience. Thanks to Miss Indah Permatasari, my
internship advisor, who gave consideration then finally I got the topic of my research. For Defri
Ramadhan Ismana and Yulia Triwijiwati, thank you for the discussion. Special thanks to my
beloved family for the love and supports.

Last, I hope this thesis would be beneficial.
Bogor, February 2011

Rakhmawati

TABLE OF CONTENTS
Page
LIST OF TABLES ···········································································································
LIST OF FIGURES ·········································································································
LIST OF APPENDICES ····································································································

viii
viii
viii

INTRODUCTION
Background
·················································································································
Objective ······················································································································


1
1

LITERATURE REVIEW
Credit Risk ························································································································
The Cramer Statistic ··········································································································
Logistic Regression ···········································································································
Logistic Ridge Regression ·································································································
Optimal Cut Point··············································································································
Model Evaluation ··········································································································

1
1
2
2
3
3

METHODOLOGY
Data Source

·················································································································
Method
······················································································································

4
4

RESULT AND DISCUSSION
Data Exploration ···············································································································
Logistic Regression with Variable Selection ·····································································
Logistic Ridge Regression ·································································································
Comparison of Backward and Logistic Ridge Regression ·················································
Comparison of Logistic Regression with Variable Selection and Ridge with Generated V2*

4
6
7
7
9


CONCLUSION

··············································································································

RECOMMENDATION

10

···································································································

10

··················································································································

10

APPENDIX ·······················································································································

11


REFERENCE

LIST OF TABLES
Page
Table 1
Table 2
Table 3
Table 4
Table 5
Table 6
Table 7
Table 8
Table 9
Table 10
Table 11

Pearson correlation coefficient of numeric variables ···················································
Spearman correlation coefficient of ordinal variables ··················································
Cramer coefficient of nominal variables ······································································
Comparison of backward, forward, and stepwise logistic regression ···························

Parameter estimate by using backward logistic regression···········································
Classification table of backward logistic regression by using a cut point of 0.680 ·······
Classification table of logistic ridge regression by using a cut point of 0.677 ··············
Parameter estimate by using logistic ridge regression ··················································
Comparison of c statistic between backward and logistic ridge regression ··················
Odds ratio estimate of V8 (credit history)····································································
Parameter existence on the logistic regression model with variable selection ··············

5
5
5
6
6
7
7
7
8
9
10

LIST OF FIGURES
Page
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7

Plot of sensitivity and specificity versus all possible cut points ····································· 3
Classification table ········································································································ 3
ROC curve ···················································································································· 3
Plot of percentage of good debtors in each group of credit amount (V2) ······················· 4
Proportion of good debtors in credit history (V8) ·························································· 5
Classification rate of backward and logistic ridge regression on each optimal cut point 8
Validation’s classification rate of backward and logistic ridge regression on each
optimal cut point ··········································································································· 8
Figure 8 Observed collectability status on P(Y=1) by using backward logistic regression ··········· 8
Figure 9 Observed collectability status on P(Y=1) by using logistic ridge regression ·················· 8
Figure 10 Comparison of C statistic between logistic regression with variable selection and logistic
ridge regression of data set with generated V2* ···························································· 9
Figure 11 Comparison of total correctly predicted cases between logistic regression with variable
selection and logistic ridge regression with generated V2*············································ 9

LIST OF APPENDICES
Page
Appendix 1 Description of variables used in analysis ································································· 12
Appendix 2 Proportion of good debtor on each variable······························································ 14
Appendix 3 Odds ratio of backward logistic regression and logistic ridge regression ·················· 16
Appendix 4 Comparison of C statistic and the correctly predicted cases between logistic regression
with variable selection and logistic ridge regression of data set with generated V2* 17

1

 
INTRODUCTION

LITERATURE REVIEW

Background
Credit risk is one of the eight risks that
banks must consider. It is important to make a
measurable, documented, and developable
credit risk system. Logistic regression,
discriminant analysis, and artificial neural
network are some methods that are used in
credit risk model. They are useful to predict
whether a new applicant will become a good
or bad debtor if he or she receives a loan.
Multicollinearity is a common problem in
credit risk modeling. Usually, the solution for
this problem is using variable selection
method (forward, backward, and stepwise).
But this solution may cause missing
information about the response variable if the
deleted predictor variable is an important one.
Ridge regression is another statistical
procedure for dealing with the problem of
multicollinearity (Ravinshanker & Dey 2001).
With
logistic
ridge
regression,
the
multicollinearity is expected to be handled
without deleting any variables and there will
be no missing information from the data that
has been collected.
Bank of Indonesia noted that the growth of
credit of national banks in January 2010 was
10%. Until the end of August 2010, the credit
of banking industry grew and reached 20.3%
(Purnomo 2010a, 2010b). This may conduce
on a greater risk that has not been faced by
banks before. Hence, it is important to build a
more accurate credit scoring model to decide
whether a new applicant is credible enough to
get a loan.

Credit Risk Model
Banks loan to individuals, first by asking
to fill out a loan application. The customer is
asked to submit several documents that the
bank needs in order to evaluate the loan
request. There are six aspects of the loan
application to determine whether a new
applicant is creditworthy or not, The Six
Basic Cs of Lending are namely character,
capacity, cash, collateral, condition, and
control. Character is the data about the
personality. Capacity is the capacity to borrow
money. Cash is related to the borrower
income and balance in saving account.
Collateral is the adequacy of the borrower to
provide adequate support for the loan. Age
and degree of specialization of the borrower's
assets are the example of collateral. Condition
is the prospect of business associated with
economics conditions. Correctly prepared
loan document is the example of control.
The basic theory of credit scoring is that
the bank can identify the financial, economic,
and motivational factors that separate the
good debtors from the bad ones by observing
a large group of people who have borrowed in
the past. Credit scoring systems are usually
based on discriminant models or related
techniques such as logit or probit models or
neural networks. If the applicant’s score
exceeds a critical cutpoint level, he or she is
more likely to be approved for credit. Among
the most important variables used in
evaluating consumer’s loan are age, marital
status, number of dependents, home
ownership, telephone ownership, type of
occupation, and length of employment in a
current job.

Objectives
The objectives of this research are:
1. To build a credit risk model using logistic
regression with variable selection and
logistic ridge regression.
2. To determine the optimal probability
cutpoint.
3. To compare the classification rate and the
c statistic of logistic regression with
variable selection and logistic ridge
regression.

The Cramer Statistic
The chi-square test of independence is
used to conclude whether there is an
association between two categorical variables.
When the number of rows and columns of the
contingency table are unequal, Cramer
coefficient is the measure of the strength of
this association. The value is between 0 and 1.
The Cramer coefficient is defined as:

Where X2 is the chi-square statistic, n is the
total sample size, and t is either the number of

2

 
rows or the number of columns in the
contingency table, whichever is smaller.
Logistic Regression
Let the conditional probability that the
outcome is present be denoted by
P(Y=1|x)=π(x). The logit of the multiple
logistic regression is given by the equation
where
log

in which case the logistic regression model is
When the type of independent variable is
categorical, dummy variable is needed. In
general, if a categorical variable (nominal or
ordinal scale) has k values, then k-1 design
variable will be needed. Thus, the logit for a
model with p variables and the jth variable
being categorical would be

Maximum likelihood estimators to logit
model are obtained by maximizing β of the
likelihood function
log

log

After getting the model, we begin the process
of model assessment. The significance of the
covariates could be assessed by G test statistic
and Wald test. G test statistic is a likelihood
ratio test and measures the significance of the
parameters on the overall model. Hypothesis
of G test statistic:
H0: β1 = β2 = ... = βp = 0
H1: at least one βi ≠ 0, i = 1, 2, ..., p
G-test Statistic could be formulated as

where L0 = Likelihood without covariates, and
Lp=Likelihood with p covariates. Under the
null hypothesis, the distribution of G is chisquare χ2 with p degrees of freedom.
If the null hypothesis is rejected and
conclude that at least one and perhaps all p
coefficients are different from zero, the Wald
test could be used to assess the significance of
each covariate.
H 0: β i = 0
H1: βi ≠ 0 where i = 1, 2, ..., p

W=

)
β
) i)
SE ( β i )

                      
Under the null hypothesis, W statistic will
follow a standard normal distribution (Hosmer
& Lemeshow 2000).
Coefficient interpretation in logistic
regression is by using the odds ratio that
indicates how much more likely, with respect
to odds, a certain event occurs in one group
relative to its occurrence in another group.
The odds ratio defined as
exp
. For
numeric variable, the odds ratio indicates that
for every increase of one measurement of the
predictor, the risk of the outcome increases
times.
Multicollinearity can cause unstable
estimates and inaccurate variances which
affects hypothesis test (Hoerl & Kennard
1980, in Shen & Gao 2002). In regression,
there are some approaches to handle
multicollinearity, which are variable selection
method (forward, backward, and stepwise)
and using ridge regression. Forward selection
adds terms sequentially until further additions
do not improve the model. Backward
elimination begins with a complex model and
sequentially removes terms. Stepwise
procedure starts off by choosing the equation
containing the most important variable and
then attempts to build up with subsequent
additions of variable one at a time as long as
these additions are worthwhile.
Logistic Ridge Regression
Unstable parameter estimates occur when
the number of covariates is relatively large or
when the covariates are correlated. An
alternative procedure to obtain more stable
estimates is to specify a restriction on the
parameters. Consider the maximization of the
log-likelihood function with a penalty on the
norm of β:
/

|| ||


where | |
, the norm of the
parameter vector β. The ridge parameter λ
controls the amount of shrinkage of the norm
of β. When λ=0 the solution will be the
ordinary MLE. For a good choice of λ, the
estimate
is expected to be an average
closer to the real value of β than the ordinary
MLE, i.e. MSE( ) < MSE( ) (Cessie &
Houwelingen 1990).
The estimate parameter of logistic ridge
regression is calculated in the following ways:

3

 
1. Fit the logistic regression model using
maximum likelihood, leading to the
estimate of
. Construct standardized
coefficients by defining
j=1,2,…,p

making a decision. Explanation for these
errors can be seen in the next session. In this
research, the cutpoint score will just be
attained from the plot of sensitivity and
specificity versus all possible cutpoints.

where is the standard deviation of β in
the training data for the jth predictor.
2. Construct the Pearson statistic

where
g = the number of covariate patterns
mk = the number of subjects with x=xk
yk = the number of positive responses
(y=1) among the mk subjects
= probability that the outcome is
present in x=xk
This is a measure of the difference
between the observed and the fitted values.
3. Define the ridge parameter (λ)


4. Let NZp be the matrix of centered and
scaled predictors, with

, where NVN is
̂
.
Let    equal    with the intercept   
omitted.  Then the ridge regression estimate
,
, where
equals

Figure 1 Plot of sensitivity and specificity
versus all possible cutpoints
Model Evaluation
In model assessment, a classification table
is most appropriate when classification is a
stated goal of the analysis. Figure 1 is the
classification table. It is a two way frequency
table between actual data and the prediction.
Correct classification rate (CCR) consists of
percentage of true positive and true negative,
while misclassification rate (MCR) consist of
percentage of false positive and false
negative.

̂

and

Optimal Cutpoint
Optimal cutpoint for the purpose of
classification can be obtained from the plot of
sensitivity and specificity versus all possible
cutpoints (Hosmer & Lemeshow 2000). The
plot can be seen in Figure 1. The optimal
cutpoint is not the only criteria for deciding
whether a new applicant is acceptable or not
to get a loan. Although the correct
classification rate is high based on the optimal
cutpoint, the number of false positive should
be considered because the loss caused by this
error is extremely large relative to the false
negative. Each bank has its own criteria for

Actual

Let

0
1

Predicted
0
1
True Negative
False Positive
(TN)
(FP)
False Negative
True Positive
(FN)
(TP)

Figure 2 Classification Table
Sensitivity or true positive (TP) is the
number of observation that have category 1
and was correctly predicted. Specificity or
true negative (TN) or is the number of
observation that have category 0 and was
correctly predicted. False positive is the
number of observation that have category 0
but predicted as category 1. False negative is
the number of observation that have category
1 but predicted as category 0.

Figure 3 ROC curve

4

 

where
nc : the number of concordant
nd : the number of discordant
t : the number of total pairs
As general rule:
C = 0.5
: no discrimination
0.7 ≤ C < 0.8 : acceptable discrimination
0.8 ≤ C < 0.9 : excellent discrimination
C ≥ 0.9
: outstanding discrimination
(Hosmer & Lemeshow 2000).
METHODOLOGY

Data Source
The data used in this research was the
German Credit data set which was available at
http://ftp.ics.uci.edu/pub/machine-learningdatabases/statlog/. It contains observations on
1000 past credit applicants. Each applicant
was rated as “good” (700 cases) or “bad” (300
cases). There were 17 variables used in this
research after considering The Six Basic C of
Lending which consist of 3 numeric variables,
6 ordinal variables, 7 nominal variables, and 1
binary variable. Description of the variables
can be seen in Appendix 1.
Method
Procedures used in this research were:
1. Divide the data into training data (740) for
modeling and testing data (260) for
validation. Each data set has the same
pattern of good/bad debtors with the full
data set, which comprise of 70% good
debtors and 30% bad.
2. Data exploration.
3. Modeling the data by using stepwise,
forward, and backward logistic regression.
The probability modeled was Y=1 (the
debtor had a good collectability status).
Then choose one of those three models by
considering the fit of the model and the
model having the highest c statistic.

4. Modeling data using logistic ridge
regression.
5. Determine optimal cutpoint from the
intersection of sensitivity and specificity.
6. Model validation with testing data.
7. Comparing the classification rate and the c
statistic between logistic ridge regression
and the logistic regression with variable
selection.
8. Generate V2* that had some specific
correlation with V1. Then do step 3 until
step 7 with new data (by replacing V2 with
V2*) to see the performance of logistic
regression with variable selection and
logistic ridge regression as the correlation
between V1 and V2* increases.
RESULT AND DISCUSSION
Data Exploration
There were no outliers and missing values
in the full data set, so all of the 1000
observations were included in the analysis.
Allocation of the data into modeling and
validation was based on the proportion of bad
and good cases of the overall data set. Each
had 70% of good and 30% of bad which was
appropriate with the full data set.
The variables of V1 (duration of credit)
and V2 (credit amount) had a decreasing trend
to the response variable. Figure 4 showed that
as the amount of the credit increased, the
proportion of debtors with good collectability
status decreased. The debtors with high
installment rate (V4) tend to be bad debtors.
The difference of good debtors for each
occupation category was not significant. The
group
of
debtors
who
were
unemployed/unskilled-nonresident had the
highest proportion of good debtor compared
to the unskilled-resident, official, and officer.
1
Proportion of good debtor

Figure 3 shows ROC curve. It plots the
probability of false positive (1-specificity)
against true positive (sensitivity). The area
under the ROC curve (AUR), which ranges
from 0 to 1, provides measure of the model
ability to discriminate between those subjects
who experience the outcome of interest versus
those who don’t. The measure of AUR is cstatistic.
.5

0.8
0.6
0.4
0.2
0
0

5000

10000
Credit amount

15000

20000

Figure 4 Plot of percentage of good debtors in
each group of credit amount (V2)
It can be seen in Appendix 2 that based on
age (V3), the group of debtor aged 20 years
old until 50 years old had a positive trend to
the proportion of good debtor. As the age

5

 
increased the proportion of good debtors
increased until the age of 50 years old. The
group of debtors that were 66 until 75 years
old had the lowest proportion of good debtors.
Debtors with two dependents had higher
proportion of good debtors than those who
had one dependent (V6). As the status of
checking account (V7) increased, debtors tend
to be good debtors. Home ownership status
(V12) also had a positive trend, the proportion
of good debtors increased as the home
ownership status changed from free, rent, and
own.
There was no pattern of the proportion of
good debtors as the time of working
experience in current job (V10) and the time
living in their present residence (V11)
increased. The figure can be seen in Appendix
2. Group of debtors that have been working
four until seven years in their current job had
the highest proportion of good debtors.
Debtors that have been working more than
seven years in their current job tend to be
good debtors compared to those with less than
four years of working experience in their
current job. The unemployed debtors had
higher proportion of good debtor than those
with less than a year of working experience in
their current job. For variable time of living in
present residence, debtors with less than or
equal to one year and also debtors with two
until three years tend to be good debtors than
the others.

a guarantor (V15) tend to be good debtors
than those who had a co-applicant. Those who
had property (V16) tend to be good debtors
than those with no property. Those having a
credit purpose (V17) for used cars, furniture,
or radio/television had higher proportion of
good debtors than the other purposes. The
lowest proportion of good debtors was those
with education as the purpose of taking credit.
Debtors who had a telephone number under
his or her name (V19) tend to be good debtors
than those who did not have telephone
number under his or her name. The figure of
percentage of good debtors in each variable
can be seen in Appendix 2.
Evaluation on the correlation between
predictor variables can be seen in Table 1 and
Table 2. It can be concluded that there were
many significant correlations but there is only
one high correlation coefficient. It exists
between V1 and V2. Table 3 showed the
Cramer statistic as the measure of association
between the nominal variables.
Table 1 Pearson correlation coefficient of
numeric variables
V2
V1

V3
0.015
-0.052

Table 2 Spearman correlation coefficient of
ordinal variables
V4
V6
V10
V11
V12

1
Proportion of good debtor

V1
0.628

V6
-0.054

V10
0.135
0.097

V11
0.046
0.057
0.237

V12
-0.029
0.035
0.017
0.327

V13
0.098
-0.092
0.078
-0.005
0.082

0.8

Table 3 Cramer coefficient
variables

0.6
0.4
0.2
0
no credit  all credits  existing  delay in 
critical 
taken
paid back  credit paid  paying off  account
duly
back duly in the past
Credit history

Figure 5 Proportion of good debtors in credit
history (V8)
Figure 5 shows the credit history (V8)
have a positive trend. Debtors that have not
even credit taken before had the lowest
proportion of good debtors. Those with high
average balance in savings account (V9) tend
to be good debtors. The difference between
marital statuses (V14) was not significant,
although the single males and married males
had higher proportion of good debtor than
females and divorced males. Debtors who had

V7
V8
V9
V14
V15
V16
V17

V8
0.152

V9
0.276
0.089

V14
0.146
0.110
0.057

V15
0.162
0.085
0.098
0.061

of
V16
0.106
0.102
0.070
0.134
0.158

nominal
V17
0.267
0.170
0.111
0.130
0.137
0.202

V19
0.108
0.077
0.124
0.101
0.097
0.181
0.175

Between the numeric predictors, the only
significant correlation occurs between V1
(duration of credit) and V2 (credit amount),
with a correlation coefficient of 0.628 which
is shown in Table 1. By using spearman
coefficient of correlation shown in Table 2,
0.327 was the largest correlation which
occurred between V11 (time in present
resident) and V12 (housing). Variable V10
(time in current job) had significant
correlation with all other ordinal variables
except with variable V12 (home ownership).

6

 
The strength of association between
nominal variables was measured by Cramer
coefficient and can be seen in Table 3.
Variable V17 (purpose of credit) had
significant correlation with all other nominal
variables. The highest correlation in nominal
predictor exist between V16 (property owned)
and V17 (purpose of credit), which was 0.218.
Logistic Regression
With Variable Selection
Logistic regression model using forward,
backward, and stepwise variable selection
methods were built. Forward logistic
regression gave the same result with stepwise
logistic regression. Among the three selection
methods, backward was the method which
had the highest c statistic. By using Hosmer
Lemeshow Goodness-of-Fit Test as proposed
in Hosmer & Lemeshow (2000), backward
logistic regression model was considered fit
with a p-value of 0.724.
Table 4 Comparison of backward, forward,
and stepwise logistic regression
Hosmer and Lemeshow GoodnessC
of-Fit Test
statistic
Chi-Square
P-value
Backward
5.311
0.724
0.817
Method

Forward
Stepwise

15.621
15.621

0.048
0.048

0.813
0.813

By using backward logistic regression,
eight significant predictors were selected from
17 predictor variables, which were credit
duration (V1), credit amount (V2), installment
rate (V4), checking account status (V7), credit
history (V8), balance in saving account (V9),
marital status (V14), and the purpose of credit
(V17). The parameter estimates of backward
logistic regression were shown in Table 5.
Variable V1 and V2 which had 0.628 of
correlation coefficient both entered the model.
It showed that the correlation of 0.628 was not
high enough. All the parameter estimates were
appropriate with the data exploration. For
example V8, which had been explained above
that debtors with no credits/history (the first
dummy variable (V81)), had the lowest
proportion of good debtors. We can see from
Table 5 that the parameter estimate of V81
was the lowest compared to the V82, V83 and
V84. The value of the 3rd dummy
(representing debtor whose paid dully of the
existing credit) was higher than 4th dummy
(representing debtor whose delay in paying

off in the past) which was also appropriate
with the data exploration.
Table 5 Parameter
estimate
by
backward logistic regression

using

Parameter

Estimate

SE

Wald

P-value

Intercept

3.7947

0.5776

43.1685