Ontology based approach to Bayesian stud

Elsevier Editorial System(tm) for Expert Systems With Applications
Manuscript Draft

Manuscript Number: ESWA-D-13-00547
Title: Ontology Based Approach to Bayesian Student Model Design
Article Type: Full Length Article
Keywords: Intelligent tutoring systems, e-learning, Knowledge modeling, Probabilistic algorithms,
Bayesian network, conditional probabilities
Corresponding Author: Dr. Ani Grubišić, Ph.D.

Corresponding Author's Institution: Faculty of Science, University of Split
First Author: Ani Grubišić, Ph.D.

Order of Authors: Ani Grubišić, Ph.D.; Slavomir Stankov, Full Professor; Ivan Peraić

Manuscript
Click here to view linked References

Ontology Based Approach to Bayesian
Student Model Design
Abstract

Probabilistic student model based on Bayesian network enables making conclusions about
the state of student‟s knowledge and further learning and teaching process depends on
these conclusions. To implement the Bayesian network into a student model, it is necessary
to determine "a priori" probability of the root nodes, as well as, the conditional probabilities of
all other nodes. In our approach, we enable non-empirical mathematical determination of
conditional probabilities, while “a priory” probabilities are empirically determined based on the
knowledge test results. The concepts that are believed to have been learned or not learned
represent the evidence. Based on the evidence, it is concluded which concepts need to be
re-learned, and which not. The study described in this paper has examined 15 ontologically
based Bayesian student models. In each model, special attention has been devoted to
defining "a priori" probabilities, conditional probabilities and the way the evidences are set in
order to test the successfulness of student knowledge prediction. Finally, the obtained results
are analyzed and the guidelines for ontology based Bayesian student model design are
presented.

Keywords
Intelligent tutoring systems, e-learning, Knowledge modeling, Probabilistic algorithms,
Bayesian network, conditional probabilities

1. Introduction

Today, there is the ubiquitous need and desire to improve the quality and availability of
various educational systems. The education has become a lifelong process and need, and its
quality became an impetus. It became clear that the latter cannot be achieved without the
appropriate and effective use of information and communication technology (ICT) in the
learning and teaching process. The use of ICT in learning and teaching enabled a concept
called e-learning. High-quality implementation of e-learning in a form of e-learning systems,
brings many advantages in the learning and teaching process and enables the desired new,
modern and quality education. The introduction of these technologies and innovations in the
field of education not only reduces the cost-effectiveness of pedagogical theory application,
but also opens opportunities to explore models from different fields (Millán & Pérez, 2002).
One special class of e-learning systems are Intelligent Tutoring Systems (ITS), which, in
contrast to the traditional systems that support the learning and teaching process, have the
ability to adapt to each student. It is this ability to adapt to each student that allows the
improvement of learning and teaching process, because it was shown that the best approach
is one-on-one tutoring (Bloom, 1984).
The intelligent tutoring systems are a generation of computer systems intended for the
support and enhancement of learning and teaching process in the selected domain
knowledge, thereby respecting the individuality of those who teach and those who are
1


teaching ((Wenger, 1987), (Ohlsson, 1986), (Sleeman & Brown, 1982)). The intelligent
tutoring systems become a student‟s personal "computer teacher". A computer teacher on
one side is always cheerful, shows no negative emotions, while a student, on the other hand
has no need to hide ignorance and can communicate freely.
The intelligent tutoring systems can adapt the content and manner of presentation of certain
topics to different student abilities. In this sense, knowledge is the key to intelligent behavior,
and therefore intelligent tutoring systems have the following basic knowledge: (i) knowledge
that the system has about the domain knowledge (expert module), (ii) teaching principles and
methods for applying these principles (teacher module), and (iii) methods and techniques for
modeling student‟s acquiring knowledge and skills (student module).
Nowadays, ontology is commonly used to formalize knowledge in the ITSs (Lee, Hendler &
Lassila, 2001). Ontology describes a conceptual model of a domain, that is, it represents
objects, concepts and other entities that are believed to exist, and relations among them
(Genesereth and Nilsson, 1987, according to (Gruber, 1993)). The main structural elements
of the conceptual model are the concepts and relations. Consequently, every area of human
endeavor can be presented with a set of properly related concepts that correspond to
appropriate domain knowledge. Ontological description of the domain knowledge provides a
simple formalization of declarative knowledge using various tools that support working with
concepts and relations.
The component of an ITS that represents the student's current state of knowledge and skills

is called the student model. The student model is a data structure, and diagnosis is a
process that manipulates it. The student model as such represents a key component of ITS.
The design of these two components is called the student modeling problem (VanLehn,
1988). If a student model is "bad" to the extent that it does not even closely describe the
student‟s characteristics, then all the decisions of other ITS components that are based on
this model are of poor quality. Therefore, considerable research is carried out in the field of
student modeling.
Design and implementation of intelligent tutoring systems systematically contributed and still
contributes to the development of methods and techniques of artificial intelligence (AI). An
artificial intelligence, as the area that connects computers and intelligent behavior, occurred
at the end of 50 - and early 60-ies of last century with pioneers Alan Turing, Marvin Minsky,
John McCarthy and Allen Newell (Urban-Lurain, 1996). The AI is essentially oriented on
knowledge representation, natural language understanding and problem solving, all of which
is equally important for the development of the intelligent tutoring concept (Beck, Stern &
Haugsjaa, 1996)
One of the techniques widely used in different areas of artificial intelligence are Bayesian
networks. The idea of Bayesian networks is not the latest as they began to engage in the 80ies of the last century in the field of expert systems. The true extent of this area began in the
90-ies of the last century, probably due to the increase in computer speed and renewed
interest in distributed systems. Large computational complexity is one of the biggest barriers
to a wider use of Bayesian networks.

Unlike traditional expert systems, where the main purpose is modeling the experts‟
knowledge and replacing them in the process of planning, analyzing, learning and decision
making, the purpose of Bayesian network is modeling a particular problem domain. Thus
2

they become the help for experts while studying the causes and consequences of the
problems they model (Charniak, 1991).
It is extremely important to put emphasis on the domain modeling, as the most important
feature of Bayesian networks. The domain modeling refers to collecting and determining all
necessary values for Bayesian network initialization. Specially, it refers to modeling
dependencies between variables. Dependencies are modeled using a network structure and
a set of conditional probabilities (Charniak, 1991).
Integration of student models with Bayesian networks in the ITSs is one way to facilitate
student learning. Specifically, this model allows making conclusions about the actual student
knowledge. Also, it enables a computer tutor to guide the learning and teaching process
towards the learning of only those concepts that the student has not already learned.
The aim of this paper is to design student model based on Bayesian networks and
ontologies, and to compare the results of its predictions with actual student knowledge. All
probabilities in the majority of Bayesian network are determined empirically, and that
presents the biggest problem in their design. Therefore, novel methods for parameter

estimation in Bayesian networks are an important research endeavor, given the utility of the
Bayesian approach for student modeling. In our approach, we enable non-empirical
mathematical determination of conditional probabilities, while “a priory” probabilities are
empirically determined based on the knowledge test results.
In the second chapter attention is paid to the theoretical background underlying Bayesian
networks. The third chapter describes fifteen probabilistic student models that differ in the
way the conditional probabilities are defined and in the way the evidences are set. Finally,
the obtained results are analyzed and the guidelines for ontology based Bayesian student
model design are presented.

2. Application of Bayesian theory in student modeling
One major difficulty that arises in student modeling is uncertainty. The ITSs needs to build a
student model based on the small amounts of very uncertain information, because the
certainty of information can be obtained only based on students' activities in the system. If
these activities do not occur, the diagnosis must be carried out on the basis of uncertain
information. Moreover, because the ITS base its decisions on a student model, the
uncertainty from the student model contributes to poorly adaptive learning and teaching
process. The student model is built based on observations that ITS makes about student.
Student model can be viewed as a compression of these observations: raw data are
combined, some of them are ignored, and the result is a summary of beliefs about the

student.
Powerful general theory of decision making are developed and designed specifically for
managing uncertainty. One of them is a Bayesian probability theory ((Bayes, 1763), (Cheng
& Greiner, 2001), (Mayo, 2001)), which deals with reasoning under uncertainty. Bayesian
networks are one of the current approaches of solving uncertain modeling ((Mayo, 2001),
(Conati et all, 1997), (VanLehn et all, 1998), (Conati, Gertner & Vanlehn, 2002), (Gamboa &
Fred, 2002)). This technique combines the strict formalism of probability with a graphical
representation and efficient inference mechanisms.

3

The Bayesian network is a probabilistic graphical model that displays dependencies between
nodes (Pearl, 1988). It is a directed acyclic graph in which nodes represent variables and
edges represent their interdependence. A node is a parent of a child, if there is an edge from
the former to the latter. In Bayesian network nodes that have no parents are called roots, and
these variables are first placed in the Bayesian network. The roots are not influenced by any
node, while they affect their children. When we put all nodes that do not have children in the
Bayesian network, the structure of the Bayesian network is defined (Korb & Nicholson,
2011).
After the structure of Bayesian network is defined, it is necessary to define the possible

values that each node can take and the values of conditional probabilities of nodes (Korb &
Nicholson, 2011). For nodes without parents only “a priori” probabilities have to be defined.
“A priori” probabilities for all other nodes can be defined using the corresponding conditional
probabilities tables designed based on the values of the “a priori” probabilities of their
parents. Therefore, it is superfluous to explicitly specify “a priori” probabilities for the nodes
that have parents (Korb & Nicholson, 2011).
The dimension of the conditional probability table is determined by the parents of a node. In
the case of discrete binary variables, for the node with n parents, the conditional probability
table has 2n rows. Namely, the conditional probabilities table of the non-root node has 2n
rows. Each row contains one of 2n combinations of values T and F, that is, each row contains
t values T. We will use this number of values T in the conditional probability table rows for
enabling non-empirical mathematical determination of probabilities.
The conditional probability function has two well-marked values: (i) a value that a student
does not know the parents of the node, although he/she knows the concept itself - unlucky
slip, (ii) and a value that a student knows all the parents of the node, although he/she does
not know the node itself - lucky guess. In the literature, these well-marked values are equal
to 0.1 (Mayo, 2001). Consequently, the probability of truthful knowing is 1-0.1=0.9. That
means, if all the parents are known, we predict that the concept itself is known with the
probability 0.9.
The Bayesian network can be used for probabilistic inference on the probability of any node

in the network if conditional probabilities tables are known. Based on the Bayesian network,
the ITS can calculate the expectation (probability) of all unknown variables based on the
known variables (evidence) (Charniak, 1991).

3. Ontological Approach to the Bayesian Student Model Design
In this section we describe an approach to Bayesian student model design in an intelligent
tutoring system that has an expert knowledge presented in a form of ontology. We observe
only the model, not the diagnosis process itself.
For a student who has never learned domain-specific knowledge, we believe that he/she
knows concepts from the domain knowledge graph with the very small probability, that is, we
draw conclusions about his/her knowledge without testing it. We consider, in this case, that
the student knows a concept from the domain knowledge with a probability 0. Likewise, if we
have tested students' knowledge about some domain knowledge, and determined with
certainty that the student knows all the concepts from that domain knowledge, then we can
argue that the student knows all concepts from that domain knowledge with probability 1.
4

The problem is how to determine probabilities between 0 (not knowing) and 1 (knowing).
That is why we define an expert Bayesian student model (Mayo, 2001) over domain
knowledge concepts combined with the overlay model (Carr & Goldstein, 1977). For each

student, and domain knowledge for each concept we define the probability of a student
knowing that concept.
When a student model is created, all probabilities are 0. After each question from the test
that examines the knowledge about one or more concepts and relations, the conditional
probabilities of the concepts involved in those relations change. The correct answers
increase, while the incorrect answers reduce the probability of knowing the concepts
involved.
The teacher uses the probabilities from the student model to determine which concepts the
student knows with high probability (e.g., more than 0.8 – (Bloom, 1976)) so that the system
does not bother the students with learning and teaching concepts the student already knows.
In this way, the Bayesian student model serves as a "sieve" that passes to the learning and
teaching process only those concepts that the student does not know with high probability.
We have developed a methodology for determining the most suitable way of calculating
conditional probabilities, as well as, for determining the most suitable way of setting evidence
in such environment. We explain the structure of the domain knowledge ontology, the design
of the Bayesian network and present the results of applied methodology for selecting the
most suitable Bayesian student model.
For the purpose of this research, we have used an adaptive e-learning system AdaptiveCourseware Tutor (AC-ware Tutor) (Grubišić, 2012). This system has ontological domain
knowledge, as well as, knowledge tests that enabled us to get instances of actual student‟s
knowledge before and after knowledge test.


3.1. Domain Knowledge Ontology
Domain knowledge is presented with concepts and relations between them. As we have to
indicate the direction of the relation between concepts, we use the terms child and parent. In
order to clearly indicate for each relation in the ontology which concepts it connects and what
the nature of that relation is, we introduce the following definition (Definition1):
Definition1: Let set ECON={K1,…,Kn}, n≥0,be a set of concepts, set
EREL={r1,…,rm}U{has_subtype, has_instance, has_part, slot, filler}, m≥0, a set of
relations and ØE an empty element. Domain knowledge DK is a set of triplets
(K1, r, K2) that define that the concepts K1 and K2 are associated with relation r. In
this way we define that the concept K1 is the parent of concept K2 and that
concept K2 is the child of concept K1.
Since the basic elements of the domain knowledge triples are concepts and relations
between them, we use a graph theory as a mathematical foundation for managing subsets
and elements of domain knowledge, as well as for domain knowledge visualization (Gross &
Yellen, 1998). Therefore, we define a directed domain knowledge graph on which all the
rules from the graph theory apply (Definition2).
Definition2: For domain knowledge DK we define directed domain knowledge graph
DKG=(V,A) where the set of vertices is V=ECON and a set of edges

5

A={(K1,K2)(K1,r,K2) Є DK, r≠ØE, K1≠K2} is equal to a set of ordered pairs of
those concepts from the domain knowledge that are related.
The set of concept Kx„s parents is a set ParentsKx={KЄECON(Kx,r,K) Є DK,
K≠Kx, r≠slot, filler, ØE} ={KЄV(Kx,K) Є A, K≠Kx}. The number pKx is equal to
the number of elements in the set ParentsKx and denotes the number of concept
Kx„s parents.
The set of concept Kx„s children is a set ChildrenKx={KЄECON(K,r,Kx) Є DK,
K≠Kx, r≠slot, filler, ØE} ={KЄV(K,Kx) Є A, K≠Kx}. The number cKx is equal to the
number of elements in the set ChildrenKx and denotes the number of concept
Kx„s children.
The vertex from DKG is called a root if it has no parents and has children.
The vertex from DKG is called a leaf if it has parents and has no children.
These different types of relationships among concepts of the ontology describe the semantic
of the related nodes, but are completely equal when it comes to domain knowledge graph
design. The only thing that matters is if the relation between two nodes exists or not, and
what is the direction of that relation.
We define a weight function XV:VDKG[0,1] on the domain knowledge graph, where XV(K x)
corresponds to the probability of a student knowing concept K x. The values of the
function XV are determined after each knowledge test and calculation of it‟s values depend
on the question score and certain concept‟s parents and children.
In our approach, the values of the function XV depend on another weight function defined in
Definition3:
Definition3: The function XA: ADKG{-1,0,1,…,max} defined by ∀KxKyЄADKG, XA(KxKy)=score
obtained by answering a question that applies to edge KxKy, is a weight function
on a set of edges of the domain knowledge graph. The following applies:
∀KxЄVDKG, ∀KxKyiЄADKG, ∀KyiKxЄADKG

When the student model initializes, all edges in the domain knowledge graph have the weight
-1, that is ∀KxKyЄADKG, XA(KxKy)=-1, which means that the knowledge about the relationship
between those two concepts has not been tested yet. The function XA allows assigning
weights to those edges that connect the concepts mentioned in certain question. Each edge
from A’ has the weight between 0 and max, where max is an integer that corresponds to a
maximum score that can be assigned to a question. Thus, domain knowledge graph with
weight function XA becomes edge-weighted graph where the weighting function values
change after each knowledge test.
Now, a mathematical definition of the function XV is given (Definition4):
Definition4: The function XV:VDKG[0,1] defined by: ∀KxЄVDKG, ∀KxKyiЄADKG, XA(KxKyi)≠-1,
∀KyiKxЄADKG, XA(KyiKx)≠-1
6

is a weight function on a set of vertices of the domain knowledge graph.
The value XV(Kx) represents the weighted sum of values of the function XA on the edges
towards parents of the concept Kx and towards children of concept Kx. We believe that the
student knows the concept Kx completely if and only if XV(Kx)=1, which is true only if all the
values of the function XA on the edges towards parents and children are max. Then the
probability of knowing the concept is the highest, that is, equal 1.

3.2. Bayesian Network Design
We define a Bayesian network BN over domain knowledge concepts as directed acyclic
graph where the vertices are variables KX (they correspond the nodes of DKG) and that can
take the values T (true, learned) and F (false, not learned) and directed edges between
random variables show how are they are related (they correspond the edges of DKG).
To implement and test a new approach to probabilistic student model design, we defined the
Bayesian network with 73 nodes (see Figure 1). These 73 nodes represent 73 concepts from
domain knowledge “Computer as a system” (Grubišić, 2012). It is important to emphasize
that there are four root nodes: Computer system, Computer system model, Programming
language and Logical gate. In this paper we used the Bayesian networks software package
GeNIe (Graphical Network Interface) which provides a graphical user interface for simple
construction of Bayesian networks (http://genie.sis.pitt.edu).

Figure 1. Bayesian network structure

7

An adaptive e-learning system AC-ware Tutor has ontological domain knowledge, as well as,
knowledge tests that enable realization of the functions XA and XV over domain knowledge
graph, as described in previous section. The usage of the AC-ware Tutor has enabled us to
get instances of actual student‟s knowledge before and after knowledge test.
In the previous section we indicated that the function XV is of a paramount importance for
determining "a priori" probabilities of root nodes, but also in evidence setting.
The Bayesian student model contains all concepts from the domain knowledge, as well as
the values of the function XV for each concept. When the student model initializes, the values
of the function XV for each concept are 0. These values can be changed only after a
knowledge test is conducted. Since learning and teaching process consists of multiple
learning-testing cycles, the student model has to be changed after each cycle, that is, after
each knowledge test.
We observe two instances of a particular student model. A Student_model_1 is an instance
taken at the end of one learning-testing cycle. A Student_model_2 is an instance taken at the
end of the following learning-teaching cycle. These two instances have the same structure,
but the values of the function XV are different for certain concepts that were involved in the
knowledge test (after knowledge test, the values of the function XV change). These two
instances are the basis for complex analysis that will be presented below.
Based on the domain knowledge graph and the values stored in Student_model_1, three
different Bayesian networks will be designed (BN1, BN2, BN3) that have equal nodes and
edges, but different calculations of conditional probabilities. These three different networks
will be tested on the basis of setting evidences in five different ways (Test1,..., Test5).
Finally, there will be a total of fifteen different Bayesian student models (Model1,...,
Model15). After applying the methodology for selecting the most suitable Bayesian student
model, according to its prediction effectiveness, it will be clear which one of these models
most accurately predicts student‟s knowledge on the basis of comparison with the actual
values stored in Student_model_2.
3.2.1. Calculating the "a priori" probabilities
Every Bayesian network is defined when “a priory” probabilities of root nodes and conditional
probabilities tables of its non-root nodes are defined. Therefore, in our approach “a priory”
probabilities of root concepts are defined based on values of weight function XV. Namely, “a
priory” probability P(KX) of root node KX (corresponds to root from DKG) is defined as
following: P(KX)=P(KX=T)=XV(KX) - the probability of knowing the concept KX, as well as,
P(KX=F)=1-XV(KX) - the probability of not knowing the concept KX. If XV(KX)=0, then
P(KX=T)=0.1, because of the possibility of lucky guess.
In Table 1 there is a part of the values stored in Student_model_1. It is obvious, from the
above formulas, that the root nodes have the following “a priori” probabilities: Computer
system (T=0.33, F=0.67), Computer system model (T=0.083, F=0.917), Programming
language (T=0.1, F=0.9- lucky guess), Logical gate (T=0.0416, F=9584).

8

Table 1. Part of the student model instance
KX
1.44MB
Application software
Arithmetic operation
Arithmetic-logic unit
Assembler
Basic
C
Central unit
Central processing unit
Disjunction
Diskette
DOS
Fortran
I gate
OR gate
Information
Instruction
Interpreter
Output unit
Language translators
Capacity
Compact disc
Compiler
Conjunction
Logical operation

XV(KX)
0.125
0.375
0
0.375
0
0
0.25
0.5
0.5
0.083
0.25
0
0.125
0
0.5
0.125
0.4375
0
0.2917
0
0.125
0.25
0
0
0.25

3.2.2 Conditional probabilities calculation methods
The most important feature of our approach is the usage of non-empirical mathematical
determination of conditional probabilities. This is very important as this is the bottleneck for a
wider use of this complex technique for predicting student‟s knowledge. Automating this
segment simplifies the Bayesian student model design. The conditional probabilities, in our
approach, depend only on the domain knowledge ontology, that is, only on the structure of
the domain knowledge graph DKG.
To specify the Bayesian student model that provides better and more accurate results, the
conditional probabilities will be calculated in three ways using the structure of the domain
knowledge graph DKG. In the first calculation method, the conditional probabilities depend
only on the number of parents, in the second method they depend on the number of parents
and children, while in the third method they depend only on the number of children.
Common to all three calculation methods are equal "a priori" probabilities of the root nodes.
What makes the difference in these three approaches is the determination of the “weight” of
the truth (knowing). The probability of truthful knowing 0.9 in each approach is divided with
different quantifiers (number of parents, number of children and parents, number of children)
and this value is the "weight" of truth in the conditional probabilities tables.
The first method for calculating conditional probabilities is a variation of the leaky AND
(Conati, Gertner & Vanlehn, 2002) lays on the fact that the fewer parent concepts are known,
the lower the probability of the target node is (and so belief that the student knows the
corresponding concept).
The Bayesian network we use for student modeling is derived from domain knowledge
ontology. The ontology includes semantically defined relationships among concepts that can
be bidirectional. Since the original Bayesian networks consider only parent nodes for
conditional probabilities calculations, in order to facilitate those bidirectional relations, we
9

have to fragment original Bayesian network to include a forest of nodes, in order to ignore
the non-directed dependencies encoded in the original Bayesian network. In this way we
transform serial connections (PXC) from node‟s parents (P) to node (X) and to node‟s
children (C) into converging connections (PXC) from node‟s parents (P) to node (X) and
from node‟s children (C) to node (X) (Korb & Nicholson, 2011).
Therefore, the second and the third method, the Bayesian network first has to be
fragmented, in order to enable conditional probabilities calculations using childe nodes as
well. For example, if domain knowledge ontology includes triples (Memory, has_subtype,
Mass memory), (Mass memory, has_subtype, Floppy Disk), (Mass memory, has_instance,
Hard Disk), (Mass memory, has_instance, Compact Disc), we would like to see how the fact
that the student knows concepts Floppy Disk, Hard Disk and Compact Disc (child nodes)
influences the prediction of knowing the concept Mass memory. Furthermore, we would like
to see how the fact that the student knows the previously mentioned concepts combined with
the concept Memory (child and parent nodes together) influences the prediction of knowing
the concept Mass memory.
3.2.2.1 Conditional probabilities based on the number of node’s parents
In the first approach (BN1), the conditional probabilities table of the non-root node KXis
defined based on the number of its parents - pKx. The number and percentage of nodes with
certain number of parents can be seen in Table 2.
Table 2. The structure of Bayesian network 1
Number of parents
roots
1
2
3
4

Total number of nodes
4
53
12
3
1

Percentage of nodes
5.48%
72.60%
16.44%
4.11%
1.37%

In this approach, each value T from conditional probability table, has “weight” 0.9/pKx.
Therefore, row “weight” is t*0.9/pKx. This row “weight” defines conditional probability of the
non-root concept Kx: P(KX=TKy Є ParentsKX, Ky=T ν Ky=F) = t*0.9/pKx. In the same way we
define P(KX=FKy Є ParentsKX, Ky=T ν Ky=F) = 1-t*0.9/pKx.
For example, let us analyze the determination of conditional probability of node "Mass
Memory" that has two parents ("Central Unit" and "Memory"). In this case, each T value in
the conditional probabilities table has "weight" 0.9/2 = 0.45. The conditional probability of
concept "Mass Memory" is given in Figure 2.
Central Unit
Memory
P(Mass memory=TCentral Unit, Memory)
P(Mass memory=FCentral Unit, Memory)

T
T
0.9
0.1

Figure 2. Co ditio al pro a ilities ased o the u

10

F
0.45
0.55

F
T
0.45
0.55

F
0.1
0.9

er of ode’s pare ts

3.2.2.2 Conditional probabilities based on the number of node’s parents and children
In the second approach (BN2), the conditional probabilities table of the non-root node KX is
defined based on the number of its parents and children - pKx+cKx. The number and
percentage of nodes with certain number of parents and children can be seen in Table 3.
Table 3. The structure of Bayesian network 2
Number of parents and children
roots
1
2
3
4
5
6

Total number of nodes
4
26
17
10
11
3
2

Percentage of nodes
5.48%
35.62%
23.29%
13.69%
15.07%
4.11%
2.74%

In this approach, each value T from conditional probability table, has “weight” 0.9/(pKx+cKx).
Therefore, row “weight” is t*0.9/(pKx+cKx). This row “weight” defines conditional probability
of the non-root concept Kx: P(KX=TKy Є ParentsKX U ChildrenKX, Ky=T ν Ky=F) =
t*0.9/(pKx+cKx). In the same way we define P(KX=FKy Є ParentsKX U ChildrenKX, Ky=T ν
Ky=F) = 1-t*0.9/(pKx+cKx).
For example, let us analyze the determination of conditional probability of node"Mass
Memory" that has two parents ("Central Unit" and "Memory") and three children ("Floppy
Disk", "Hard Disk" and "Compact Disc"). In this case, each T value in the conditional
probabilities table has "weight" 0.9/(2 +3)=0.18. The conditional probability of concept "Mass
Memory" is given in Figure 3.
Central Unit
Memory
Floppy disk
Hard disk
Compact disk
P(Mass memory=T Central
unit, Memory, Floppy Disk, Hard
Disk, Compact Disk)
P(Mass memory=F Central
unit, Memory, Floppy Disk, Hard
Disk, Compact Disk)
Central Unit
Memory
Floppy disk
Hard disk
Compact disk
P(Mass memory=T Central
unit, Memory, Floppy Disk, Hard
Disk, Compact Disk)
P(Mass memory=F Central
unit, Memory, Floppy Disk, Hard
Disk, Compact Disk)

T
T

F

T

F

T

F

T

T

F

F

T

F

T

F

T

F

T

F

T

F

T

F

T

F

T

F

T

F

T

F

0,9

0,72

0,72

0,54

0,72

0,54

0,54

0,36

0,72

0,54

0,54

0,36

0,54

0,36

0,36

0,18

0,1

0,28

0,28

0,46

0,28

0,46

0,46

0,64

0,28

0,46

0,46

0,64

0,46

0,64

0,64

0,82

F
T

F

T

F

T

F

T

T

F

F

T

F

T

F

T

F

T

F

T

F

T

F

T

F

T

F

T

F

T

F

0,72

0,54

0,54

0,36

0,54

0,36

0,36

0,18

0,54

0,36

0,36

0,18

0,36

0,18

0,18

0,1

0,28

0,46

0,46

0,64

0,46

0,64

0,64

0,82

0,46

0,64

0,64

0,82

0,64

0,82

0,82

0,9

Figure 3. Co ditio al pro a ilities ased o the u

er of ode’s pare ts a d hildre

3.2.2.3 Conditional probabilities based on the number of node’s children
In the third approach (BN3), the conditional probabilities table of the non-root node KX is
defined based on the number of its children - cKx. The number and percentage of nodes with
certain number of children can be seen in Table 4.
11

Table 4. The structure of Bayesian network 3
Number of children
roots
leafs
1
2
3
4

Total number of nodes
4
31
12
14
9
3

Percentage of nodes
5.48%
42.46%
16.44%
19.18%
12.33%
4.11%

In this approach, each value T from conditional probability table, has “weight” 0.9/cKx.
Therefore, row “weight” is t*0.9/cKx. This row “weight” defines conditional probability of the
non-root concept Kx: P(KX=TKy Є ChildrenKX, Ky=T ν Ky=F) = t*0.9/cKx. In the same way we
define P(KX=FKy Є ChildrenKX, Ky=T ν Ky=F) = 1-t*0.9/cKx.
The only problem in this approach are the nodes that have no children. For those nodes cKx
is 0, and we cannot calculate the “weight” of truth according to the above formula. Therefore,
we determine that each value T from conditional probability table, has “weight” 0.5.
For example, let us analyze the determination of conditional probability of node "Mass
Memory" that has three children ("Floppy Disk", "Hard Disk" and "Compact Disc"). In this
case, each T value in the conditional probabilities table has "weight" 0.9/3=0.3. The
conditional probability of concept "Mass Memory" is given in Figure 4.
Floppy Disk
Hard Disk
Compact Disk
P(Mass memory=T Floppy Disk, Hard Disk, Compact Disk)
P(Mass memory=F Floppy Disk, Hard Disk, Compact Disk)

T

F

T
T
0.9
0.1

F
F
0.6
0.4

Figure 4. Co ditio al pro a ilities ased o the u

T
0.6
0.4

T
F
0.3
0.7

T
0.6
0.4

F
F
0.3
0.7

T
0.3
0.7

F
0.1
0.9

er of ode’s hildre

3.2.3 Setting the pieces of evidence
The importance of the function XV is not only in determining the "a priori" probabilities of root
nodes, but it is also used for setting the pieces of evidence. We will observe five different
ways of setting the pieces of evidence (five different values of the function XV used as
threshold) in order to examine their efficiency and reliability. These various threshold values
are defined completely heuristically, and the following analysis is done in order to determine
which one of these heuristic values is the best for setting the pieces of evidence. It is
important to observe gained predictions and compare them with the actual values from the
instance of the real student model Student_model_2, the gold standard.
3.2.3.1 Test 1
Let Kx be any node. If XV(Kx) ≥ 0.9, then we set the evidence on the node Kx on
truth. Similarly, if 1-XV(Kx) ≥ 0.9, then we set the evidence on the node Kx on false.
Example 1: In the instance Student_model_1 exists the value XV(Computer system
model)=0.083. It is clear that 1-XV(Computer system model)=0.917 which is greater than 0.9.
Therefore, we set the evidence on the node Computer system model on false.
In this way, we set the false evidence on four nodes (5% of all nodes are pieces of
evidence).
12

3.2.3.2 Test 2
Let Kx be any node. If XV(Kx) ≥ 0.8, then we set the evidence on the node Kx on
truth. Similarly, if 1-XV(Kx) ≥ 0.8, then we set the evidence on the node Kx on false.
Example 2: In the instance Student_model_1 exists the value XV(Fortran)=0.125. It is clear
that 1-XV(Fortran)=0.875 which is greater than 0.8. Therefore, we set the evidence on the
node Fortran on false.
In this way, we set the false evidence on twelve nodes (16% of all nodes are pieces of
evidence).
3.2.3.3 Test 3
Let Kx be any node. If XV(Kx) ≥ 0.75, then we set the evidence on the node Kx on
truth. Similarly, if 1-XV(Kx) ≥ 0.75, then we set the evidence on the node Kx on false.
Example 3: In the instance Student_model_1 exists the value XV(Central Unit)=0.25. It is
clear that 1-XV(Central Unit)=0.75 which is equal to 0.75. Therefore, we set the evidence on
the node Central Unit on false.
In this way, we set the true evidence on four nodes and false evidence on seventeen nodes
(29% of all nodes are pieces of evidence)
3.2.3.4 Test 4
Let Kx be any node. If XV(Kx) ≥ 0.65, then we set the evidence on the node Kx on
truth. Similarly, if 1-XV(Kx) ≥ 0.65, then we set the evidence on the node Kx on false.
Example 4: In the instance Student_model_1 exists the value XV(Input Unit)=0.312. It is clear
that 1-XV(Input Unit)=0.688 which is greater than 0.65. Therefore, we set the evidence on the
node Input Unit on false.
In this way, we set the true evidence on four nodes and false evidence on nineteen nodes.
The results would be the same if we have observed a limit 0.7 (32% of all nodes are pieces
of evidence).
3.2.3.5 Test 5
Let Kx be any node. If XV(Kx) ≥ 0.6, then we set the evidence on the node Kx on
truth. Similarly, if 1-XV(Kx) ≥ 0.6, then we set the evidence on the node Kx on false.
Example 5: In the instance Student_model_1 exists the value XV(Application
Software)=0.375. It is clear that 1-XV(Application Software)=0.625 which is greater than 0.6.
Therefore, we set the evidence on the node Application Software on false.
In this way, we set the true evidence on four nodes and false evidence on twenty six nodes
(41% of all nodes are pieces of evidence).

13

By comparing the way the pieces of evidence are set, the differences are obvious if we
observe only the total number of set pieces of evidence. In Test1, only four pieces of
evidence are set. The same four pieces of evidence occur in all other ways of evidence
setting. It is logical to assume that there are differences in making predictions between
setting only four pieces of evidence in the Bayesian network (Test1) and setting thirty pieces
of evidence (Test5). It is also logical to assume that the more evidences are set, the more
accurate prediction model we have. These assumptions will be refuted furthermore.
It will be shown that it is essential to set evidence in a quality manner and that the quantity of
evidence does not play the most important factor for accuracy of prediction. Moreover,
mentioned five ways of setting evidence will be used on all three models of Bayesian
networks. This way we test the prediction effectiveness of, in total, 3x5=15 Bayesian student
models (Model1,..., Model15).
3.2.4 Testing the Bayesian student model prediction effectiveness
Student‟s knowledge after the knowledge test is contained in an instance of a student
model Student_model_2. That instance contains the actual student‟s knowledge. So, based
on the actual knowledge, it is known which concepts has student mastered, and mentioned
15 models will be analyzed to show which one of them best predicts this actual student‟s
knowledge.
The comparative analysis included only those nodes whose values of the function XV differ in
the student model instances Student_model_1 and Student_model_2. The nodes that are
evidences were excluded from the comparative analysis.
For each model, the percentage of overlapping in relation to an instance of the model
student Student_model_2 is given. If a value of prediction for a given node and its value of
the function XV differ less or equal to 0.1, then we have a prediction match. If a value of
prediction for a given node and its value of the function XV differ more than 0.1 and less or
equal to 0.2, then we have a prediction indication. If a value of prediction for a given node
and its value of the function XV differ more than 0.2, then we have a prediction miss. These
values are determined heuristically and have no support in the literature, therefore have to be
verified in future experiments. The results of an analysis are presented in Tables 5, 6, 7.

Table 5. Results of Bayesian student model prediction testing
Model

Bayesian network

Evidence setting

Number of compared nodes

Model1
Model2
Model3
Model4
Model5
Model6
Model7
Model8
Model9
Model10
Model11
Model12
Model13
Model14
Model15

BN1
BN2
BN3
BN1
BN2
BN3
BN1
BN2
BN3
BN1
BN2
BN3
BN1
BN2
BN3

Test1
Test1
Test1
Test2
Test2
Test2
Test3
Test3
Test3
Test4
Test4
Test4
Test5
Test5
Test5

41
41
41
33
33
33
23
23
23
21
21
21
14
14
14

14

Match
≤0.1
36%
32%
15%
12%
18%
9%
22%
22%
26%
9%
28%
14%
14%
28%
7%

Indication
0.1≤0.2
32%
17%
12%
27%
18%
12%
17%
17%
17%
19%
24%
33%
28%
14%
21%

Miss
>0.2
32%
51%
73%
61%
64%
71%
61%
61%
57%
72%
48%
52%
58%
58%
72%

Table 6. Average results regardless of evidence setting
Bayesian network
BN1
BN2
BN3

Match
≤0.1
19%
26%
14%

Indication
0.1≤0.2
25%
18%
17%

Miss
>0.2
56%
56%
64%

Table 7. Average results regardless of Bayesian network
Evidence setting

Number of pieces of evidence

Test1
Test2
Test3
Test4
Test5

4
12
21
23
30

Match
≤0.1
28%
13%
23%
17%
16%

Indication
0.1≤0.2
20%
19%
17%
25%
21%

Miss
>0.2
52%
68%
60%
58%
63%

Observing the results from the mentioned tables, it is not difficult to conclude that the
network BN3 has the "worst" results (the highest percentage in the last column of Table 6 64%). This result can be attributed to the setting the conditional probability value on 0.5
for all nodes without children. When we compare the networks BN1 and BN2, we can
conclude that the BN1 has better results in Test1 and Test5, while in Test3 they have
identical results. The BN2 has shown better results in Test2 and Test4. Overall the BN2 has
the most matches (the highest percentage in the second column of Table 6 - 26%) and,
therefore, it can be considered the best.
Looking at the Table 7 and trying to answer which evidence setting is the best for knowledge
prediction, it is not hard to see that this is Test1 (the highest percentage in the third column
of Table 7 - 28%). We conclude that it is essential to set evidences in a quality manner and
that the quantity of evidence does not play the most important factor for accuracy of
prediction.
If we observe individual results in Table 5, we conclude that the model that has the most
overlap with actual student‟s knowledge is Model1 where the conditional probabilities were
determined based on the number of parents (BN1) and evidence were set for nodes whose
values of the function XV were greater or equal to 0.9 (Test1). This model has at least
prediction misses, and in relation to other models, very high number of prediction matches.
Therefore, this model stands as an appropriate Bayesian student model for predicting
student knowledge in ontology based environments.

4. Conclusion
The intelligent tutoring systems need to build a model based on uncertain information
received from the students. This information can be variously interpreted, therefore, the role
of probabilistic models is especially important. To build a model that, given the small number
of high-quality information to make conclusions about student‟s knowledge and to adapt to it,
requires a lot of effort. Bayesian network theory provides the above, but is particularly
important to find the best way how to implement Bayesian networks in student model design
process.

15

The desire to provide a new, modern and quality education requires a lot of research. This
paper describes a Bayesian student model, as a new way of modeling students in ontology
based intelligent tuoring systems. Development of this model is illustrated through empirical
research that included comparative analysis of fifteen potential models where we looked for
the one that the best predicts the student‟s knowledge.
The most important feature of this model is its non-empirical mathematical determination of
conditional probabilities, while “a priory” probabilities are empirically determined based on the
knowledge test results. This is very important as this is the bottleneck of using Bayesian
networks. Automating this segment will eventually lead to a wider use of this complex
technique for predicting student‟s knowledge, as the conditional probabilities depend only on
the structure of domain knowledge ontology.
The basis of this study was to find the best way to design a Bayesian student model.
Numerous deployments were observed and a special emphasis placed on determining the
conditional probabilities and evidence setting. Believing that the most important aspect is
determination of conditional probabilities, it was proven that nothing less important is the
setting of evidence. The model that, among all tested models, represents the best actual
student‟s knowledge is a model where the conditional probabilities were determined based
on the number of parents (BN1) and evidence were set for nodes whose values of the
function XV were greater or equal to 0.9 (Test1). In the future, we should find answers why
this model was wrong in 32% cases and eliminate these prediction misses.
It turned out that a small, but well selected, number of evidence enable better prediction of
the student‟s knowledge than many unfounded evidence. In further studies related to
Bayesian student model design, we will conduct broader research on a larger sample of
instances of actual student models and see what is the percentage of selected Bayesian
student model that accurately predict student‟s knowledge. Furthermore, we will test this
model on different domain knowledge to conclude about the model‟s generality and
independence of domain knowledge. There are several aspects that should be involved in
the extension of the presented work: in depth sensitivity analysis, real-time usage of the
network updated as a result of student actions in order to find out about its accuracy.

Acknowledgements
This paper describes the results of research being carried out within project 177-03619941996 Design and evaluation of intelligent e-learning systems within the program 036-1994
Intelligent Support to Omnipresence of e-Learning Systems, funded by the Ministry of
Science, Education and Sports of the Republic of Croatia.

6. References
[1] Bayes, R. (1763). An essay toward solving a problem in the doctrine of chances. Philos.

Trans. R. Soc. London, 53, pp. 370-418.
[2] Beck, J., Stern, M. & Haugsjaa, E. (1996). Applications of AI in Education. Crossroads,
3(1), pp. 11-15.
[3] Bloom, B. S. (1984). The 2 Sigma Problem: The Search for Methods of Group Instruction

as Effective as One-to-One Tutoring. Educational Researcher, 13(6), pp. 4-16.

16

[4] Bloom, B.S. (1976). Human Characteristics and School Learning. New York: McGraw-Hill

Book Company
[5] Carr, B., Goldstein, I.P. (1977). Overlays. A theory of modeling for computer-aided
instruction, AI Lab Memo 406, Massachusetts Institute of Technology, Cambridge,
Massachusetts
[6] Charniak, E. (1991). Bayesian Networks without tears, AI magazine, 12(4), pp. 50–63.
[7] Cheng, J. & Greiner, R. (2001). Learning bayesian belief network classifiers: Algorithms
and system. Advances in Artificial Intelligence, pp. 141-151.
[8] Conati, C., Gertner, A. & Vanlehn, K. (2002). Using Bayesian networks to manage
uncertainty in student modeling. User Modeling and User-Adapted Interaction, 12(4), pp.
371-417.
[9] Conati, C., Gertner, A. S., Vanlehn, K. & Druzdzel, M. J. (1997). On-line student modeling
for coached problem solving using Bayesian networks. User Modeling: Proceedings of
the Sixth International Conference, UM97, pp. 231-242.
[10] Gamboa, H. & Fred, A. (2002). Designing intelligent tutoring systems: a bayesian
approach. Enterprise information systems III, 1, pp. 452-458.
[11] Gross, J. L. & Yellen, J. (1998). Graph Theory and Its Applications (1st ed.). CRC Press.
[12] Gruber, T. R. (1993). A translation approach to portable ontology specifications.

Knowledge acquisition, 5(2), pp. 199-220.
[13] Grubišić, A. (2012). Adaptive student's knowledge acquisition model in e-learning
systems. PhD Thesis, Faculty of Electrical Engineering and Computing, University of
Zagreb, Croatia (in Croatian).
[14] Korb, K.B., Nicholson, A.E. (2011). Bayesian Artificial Intelligence. Chapman & Hall/CRC
Press, 2nd edition
[15] Lee, T. B., Hendler, J. & Lassila, O. (2001). The semantic web. Scientific American,
284(5), pp. 34-43.
[16] Mayo, M. J. (2001). Bayesian Student Modelling and Decision-theoretic Selection of
Tutorial Actions in Intelligent Tutoring Systems, PhD Thesis, University of Canterbury,
Christchurch, New Zealand.
[17] Millán, E., Pérez-De-La-Cruz, J.L. (2002). A Bayesian Diagnostic Algorithm for Student
Modeling and its Evaluation, User Modeling and User-Adapted Interaction 12(2-3), pp.
281-330.
[18] Ohlsson, S. (1986). Some principles of intelligent tutoring, Instructional Science, 14, pp.
293–326.
[19] Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems, San Mateo: Morgan
Kaufmann
[20] Sleeman, D. & Brown, J. S. (1982). Introduction: Intelligent Tutoring Systems: An
Overview. Intelligent Tutoring Systems (Sleeman, D.H., Brown, J.S.), pp. 1-11. Academic
Press, Burlington, MA.
[21] Urban-Lurain, M. (1996). Intelligent tutoring systems: An historic review in the context of
the development of artificial intelligence and educational psychology.In: Technical Report,
Department of Computer Science and Engineering, Michigan State University.
[22] VanLehn, K. (1988). Student Modeling. In Foundations of Intelligent Tutoring Systems, M.
C. Polson, J. J. Richardson, Eds., Lawrence Erlbaum Associates Publishers, pp.

Dokumen yang terkait

Hubungan pH dan Viskositas Saliva terhadap Indeks DMF-T pada Siswa-siswi Sekolah Dasar Baletbaru I dan Baletbaru II Sukowono Jember (Relationship between Salivary pH and Viscosity to DMF-T Index of Pupils in Baletbaru I and Baletbaru II Elementary School)

0 46 5

Institutional Change and its Effect to Performance of Water Usage Assocition in Irrigation Water Managements

0 21 7

Analisa studi komparatif tentang penerapan traditional costing concept dengan activity based costing : studi kasus pada Rumah Sakit Prikasih

56 889 147

Actantial and funsctional schemes analysis on the last samurat' film based on graimas' structural theory

1 23 49

The Effectiveness of Computer-Assisted Language Learning in Teaching Past Tense to the Tenth Grade Students of SMAN 5 Tangerang Selatan

4 116 138

Teaching preposition of location based on total physical responce : an experimental study at SMP Islamiyah Darul Irfan Sawangan Depok

0 8 39

the Effectiveness of songs to increase students' vocabuloary at second grade students' of SMP Al Huda JAkarta

3 29 100

The effectiveness of classroom debate to improve students' speaking skilll (a quasi-experimental study at the elevent year student of SMAN 3 south Tangerang)

1 33 122

Kerjasama ASEAN-China melalui ASEAN-China cooperative response to dangerous drugs (ACCORD) dalam menanggulangi perdagangan di Segitiga Emas

2 36 164

The Effect of 95% Ethanol Extract of Javanese Long Pepper (Piper retrofractum Vahl.) to Total Cholesterol and Triglyceride Levels in Male Sprague Dawley Rats (Rattus novergicus) Administrated by High Fat Diet

2 21 50