Datalog Programs and Their Safety

26.5.6 Datalog Programs and Their Safety

There are two main methods of defining the truth values of predicates in actual Datalog programs. Fact-defined predicates (or relations) are defined by listing all the combinations of values (the tuples) that make the predicate true. These corre- spond to base relations whose contents are stored in a database system. Figure 26.14 shows the fact-defined predicates EMPLOYEE , MALE , FEMALE , DEPARTMENT , SUPERVISE , PROJECT , and WORKS_ON , which correspond to part of the relational database shown in Figure 3.6. Rule-defined predicates (or views) are defined by being the head (LHS) of one or more Datalog rules; they correspond to virtual rela-

Figure 26.14

EMPLOYEE(john).

MALE(john).

Fact predicates for

EMPLOYEE(franklin).

MALE(franklin).

part of the database

EMPLOYEE(aIicia).

MALE(ramesh).

from Figure 3.6.

EMPLOYEE(jennifer).

MALE(ahmad).

EMPLOYEE(ramesh).

MALE(james).

EMPLOYEE(joyce). EMPLOYEE(ahmad).

FEMALE(alicia).

EMPLOYEE(james).

FEMALE(jennifer). FEMALE(joyce).

SALARY(john, 30000). SALARY(franklin, 40000).

PROJECT(productx).

SALARY(alicia, 25000).

PROJECT(producty).

SALARY(jennifer, 43000).

PROJECT(productz).

SALARY(ramesh, 38000).

PROJECT(computerization).

SALARY(joyce, 25000).

PROJECT(reorganization).

SALARY(ahmad, 25000).

PROJECT(newbenefits).

SALARY(james, 55000).

WORKS_ON(john, productx, 32).

DEPARTMENT(john, research).

WORKS_ON(john, producty, 8).

DEPARTMENT(franklin, research).

WORKS_ON(ramesh, productz, 40).

DEPARTMENT(alicia, administration).

WORKS_ON(joyce, productx, 20). DEPARTMENT(jennifer, administration). WORKS_ON(joyce, producty, 20). DEPARTMENT(ramesh, research).

WORKS_ON(franklin, producty, 10).

WORKS_ON(franklin, productz, 10). DEPARTMENT(ahmad, administration). WORKS_ON(franklin, computerization, 10). DEPARTMENT(james, headquarters).

DEPARTMENT(joyce, research).

WORKS_ON(franklin, reorganization, 10). WORKS_ON(alicia, newbenefits, 30).

SUPERVISE(franklln, john).

WORKS_ON(alicia, computerization, 10).

SUPERVISE(franklln, ramesh)

WORKS_ON(ahmad, computerization, 35).

SUPERVISE(frankin , joyce).

WORKS_ON(ahmad, newbenefits, 5).

SUPERVISE(jennifer, aIicia).

WORKS_ON(jennifer, newbenefits, 20).

SUPERVISE(jennifer, ahmad).

WORKS_ON(jennifer, reorganization, 15).

SUPERVISE(james, franklin).

WORKS_ON(james, reorganization, 10).

26.5 Introduction to Deductive Databases 979

SUPERIOR(X, Y ) :– SUPERVISE(X, Y ). SUPERIOR(X, Y ) :– SUPERVISE(X, Z ), SUPERIOR(Z, Y ).

SUBORDINATE(X, Y ) :– SUPERIOR(Y, X ). SUPERVISOR(X ) :– EMPLOYEE(X ), SUPERVISE(X, Y ).

OVER_40K_EMP(X ) :– EMPLOYEE(X ), SALARY(X, Y ), Y >= 40000. UNDER_40K_SUPERVISOR(X ) :– SUPERVISOR(X ), NOT(OVER_40_K_EMP(X )). MAIN_PRODUCTX_EMP(X ) :– EMPLOYEE(X ), WORKS_ON(X, productx, Y ), Y >=20. PRESIDENT(X ) :– EMPLOYEE(X), NOT(SUPERVISE(Y, X ) ).

Figure 26.15

Rule-defined predicates.

tions whose contents can be inferred by the inference engine. Figure 26.15 shows a number of rule-defined predicates.

A program or a rule is said to be safe if it generates a finite set of facts. The general theoretical problem of determining whether a set of rules is safe is undecidable. However, one can determine the safety of restricted forms of rules. For example, the rules shown in Figure 26.16 are safe. One situation where we get unsafe rules that can generate an infinite number of facts arises when one of the variables in the rule can range over an infinite domain of values, and that variable is not limited to rang- ing over a finite relation. For example, consider the following rule:

BIG_SALARY (Y ) :– Y>60000 Here, we can get an infinite result if Y ranges over all possible integers. But suppose

that we change the rule as follows: BIG_SALARY (Y ) :– EMPLOYEE (X), Salary (X, Y ), Y>60000 In the second rule, the result is not infinite, since the values that Y can be bound to

are now restricted to values that are the salary of some employee in the database— presumably, a finite set of values. We can also rewrite the rule as follows:

BIG_SALARY (Y ) :– Y>60000, EMPLOYEE (X ), Salary (X, Y ) In this case, the rule is still theoretically safe. However, in Prolog or any other system

that uses a top-down, depth-first inference mechanism, the rule creates an infinite loop, since we first search for a value for Y and then check whether it is a salary of an employee. The result is generation of an infinite number of Y values, even though these, after a certain point, cannot lead to a set of true RHS predicates. One defini- tion of Datalog considers both rules to be safe, since it does not depend on a partic- ular inference mechanism. Nonetheless, it is generally advisable to write such a rule in the safest form, with the predicates that restrict possible bindings of variables placed first. As another example of an unsafe rule, consider the following rule:

980 Chapter 26 Enhanced Data Models for Advanced Applications

REL_ONE(A, B, C ). REL_TWO(D, E, F ). REL_THREE(G, H, I, J ).

SELECT_ONE_A_EQ_C(X, Y, Z ) :– REL_ONE(C, Y, Z ). SELECT_ONE_B_LESS_5(X, Y, Z ) :– REL_ONE(X, Y, Z ), Y< 5. SELECT_ONE_A_EQ_C_AND_B_LESS_5(X, Y, Z ) :– REL_ONE(C, Y, Z ), Y<5

SELECT_ONE_A_EQ_C_OR_B_LESS_5(X, Y, Z ) :– REL_ONE(C, Y, Z ). SELECT_ONE_A_EQ_C_OR_B_LESS_5(X, Y, Z ) :– REL_ONE(X, Y, Z ), Y<5.

PROJECT_THREE_ON_G_H(W, X ) :– REL_THREE(W, X, Y, Z ). UNION_ONE_TWO(X, Y, Z ) :– REL_ONE(X, Y, Z ).

UNION_ONE_TWO(X, Y, Z ) :– REL_TWO(X, Y, Z ). INTERSECT_ONE_TWO(X, Y, Z ) :– REL_ONE(X, Y, Z ), REL_TWO(X, Y, Z ). DIFFERENCE_TWO_ONE(X, Y, Z ) :– REL_TWO(X, Y, Z ) NOT(REL_ONE(X, Y, Z ). CART PROD _ONE_THREE(T, U, V, W, X, Y, Z ) :–

REL_ONE(T, U, V), REL_THREE(W, X, Y, Z ). NATURAL_JOIN_ONE_THREE_C_EQ_G(U, V, W, X, Y, Z ) :–

REL_ONE(U, V, W ), REL_THREE(W, X, Y, Z ).

Figure 26.16

Predicates for illustrating relational operations.

Here, an infinite number of Y values can again be generated, since the variable Y appears only in the head of the rule and hence is not limited to a finite set of values. To define safe rules more formally, we use the concept of a limited variable. A vari- able X is limited in a rule if (1) it appears in a regular (not built-in) predicate in the body of the rule; (2) it appears in a predicate of the form X=c or c=X or (c 1< <=X

and X<=c 2 ) in the rule body, where c, c 1 , and c 2 are constant values; or (3) it appears in a predicate of the form X=Y or Y=X in the rule body, where Y is a limited vari- able. A rule is said to be safe if all its variables are limited.