From trees to rules

Knowledge representation

Output:

Decision tables Decision trees Decision rules

Data Mining

Association rules Practical Machine Learning Tools and Techniques

Rules with exceptions Rules involving relations

Slides for Chapter 3 of Data Mining by I. H. Witten and E. Frank

Linear regression Trees for numeric prediction Instance-based representation Clusters

Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

1 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

2 03/04/06 03/04/06

Output: representing structural patterns _{✁ ✂} Decision tables

Many different ways of representing patterns Simplest way of representing output:

♦ _{✁ ✂} Decision trees, rules, instance-based, … Use the same format as input! _✁ Also called “knowledge” representation Decision table for the weather problem: Outlook Hum idity Play _✁ Representation determines inference method

Sunny High No

Understanding the output is the key to Sunny Norm al Yes

Overcast High Yes _✁ understanding the underlying learning methods Overcast Norm al Yes

Different types of output for different learning

Rainy High No

problems (e.g. classification, regression, …) _✂ Rainy Norm al No Main problem: selecting the right attributes

03/04/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 3 03/04/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

4 _✂ Decision trees Nominal and numeric attributes _✂ “Divide-and-conquer” approach produces tree _✂

Nominal: Nodes involve testing a particular attribute _✂ number of children usually equal to number values Usually, attribute value is compared to constant _✂ ⇒ attribute won’t get tested more than once

Other possibilities: _✂ Other possibility: division into two subsets Comparing values of two attributes

Numeric: test whether value is greater or less than constant _✂ Using a function of one or more attributes attribute may get tested several times

⇒

Leaves assign classification, set of classifications, Other possibility: three-way split (or multi-way split) _{✂ Integer: less than, equal to, greater than} or probability distribution to instances ✄ _✄

Unknown instance is routed down the tree

Real: below, within, above 03/04/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 5 03/04/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

_✁ Missing values Classification rules _✁

Popular alternative to decision trees _✁ Does absence of value have some significance? _✁

Antecedent (pre-condition): a series of tests (just _✁ Yes ⇒ “missing” is a separate value like the tests at the nodes of a decision tree)

No ⇒ “missing” must be treated in a special way ✁ Tests are usually logically ANDed together (but

♦

Solution A: assign instance to most popular branch may also be general logical expressions)

♦

Solution B: split instance into pieces _✄ _✁

Consequent (conclusion): classes, set of classes, or Pieces receive weight according to fraction of training instances that go down each branch _✄ _✁ probability distribution assigned by rule

Classifications from leave nodes are combined using the

Individual rules are often logically ORed together

weights that have percolated to them ♦

Conflicts arise if different conclusions apply

Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

7 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

8 03/04/06 03/04/06 _{✁ ✂} From trees to rules From rules to trees

Easy: converting a tree into a set of rules More difficult: transforming a rule set into a tree

♦

One rule for each leaf: Tree cannot easily express disjunction between rules _✄ _✂

Antecedent contains a condition for every node on the

Example: rules which test different attributes _✄ path from the root to the leaf _✁ Consequent is class assigned by the leaf

If a and b then x

Produces rules that are unambiguous

If c and d then x ♦ _✁ Doesn’t matter in which order they are executed ✂

Symmetry needs to be broken But: resulting rules are unnecessarily complex ✂

Corresponding tree contains identical subtrees

♦

Pruning to remove redundant tests/rules ( ⇒ “replicated subtree problem”)

03/04/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 9 03/04/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 1 0 A tree for a simple disjunction The exclusive-or problem

If x = 1 and y = 0 then class = a If x = 0 and y = 1 then class = a If x = 0 and y = 0 then class = b If x = 1 and y = 1 then class = b 03/04/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 1 1 03/04/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 1 2 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 1 3 03/04/06 A tree with a replicated subtree

If x = 1 and y = 1 then class = a If z = 1 and w = 1 then class = a Otherwise class = b Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 1 4 03/04/06

“Nuggets” of knowledge _✁

Are rules independent pieces of knowledge? (It seems easy to add a rule to an existing rule base.) _✁ Problem: ignores how rules are executed _✁ Two ways of executing a rule set:

♦

Ordered set of rules (“decision list”) _✄

Order is important for interpretation ♦

Unordered set of rules _✄

Rules may overlap and lead to different conclusions for the same instance Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 1 5 03/04/06

Interpreting rules _✁

Special case: boolean class ✁

♦ Give no conclusion at all? ♦

♦

95% for weather data) If temperature = cool then humidity = normal

2 and confidence ≥

Normally: minimum support and confidence pre-specified (e.g. 58 rules with support ≥

Support = 4, confidence = 100% _✂

⇒

Support: number of instances predicted correctly _✂ Confidence: number of correct predictions, as _✂ proportion of all instances that rule applies to Example: 4 cool days with normal humidity

Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 1 8 03/04/06 Support and confidence of a rule _✂

Output needs to be restricted to show only the most predictive associations ⇒ only those with high support and high confidence

Problem: immense number of possible associations ♦

… are not intended to be used together as a set _✂

… can predict any attribute and combinations of attributes

Go with rule that is most popular on training data?

♦

Association rules _✂ Association rules…

If x = 1 and y = 1 then class = a If z = 1 and w = 1 then class = a Otherwise class = b Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 1 7 03/04/06

Order of rules is not important. No conflicts! _✁ Rule can be written in disjunctive normal form

What if two or more rules conflict?

♦ … Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 1 6 03/04/06

Go with class that is most frequent in training data?

♦ Give no conclusion at all? ♦

What if no rule applies to a test instance?

♦ … _✁

Assumption: if instance does not belong to class “yes”, it belongs to class “no” _✁ Trick: only learn rules for class “yes” and use default rule for “no” _✁

Rules with exceptions _✂ Interpreting association rules _✂ Interpretation is not obvious: _✂ Idea: allow rules to have exceptions

If windy = false and play = no then outlook = sunny Example: rule for iris data and humidity = high If petal-length ≥ 2.45 and petal-length < 4.45 then Iris-versicolor _✂ is not the same as New instance:

If windy = false and play = no then outlook = sunny Sepal Sepal Pet al Pet al Type

If windy = false and play = no then humidity = high lengt h wi dth lengt h wi dth _✂ _✂

5.1

3.5

2.6

0.2 Iri s- setosa It means that the following also holds: Modified rule:

If petal-length ≥ 2.45 and petal-length < 4.45 then Iris-versicolor If humidity = high and windy = false and play = no EXCEPT if petal-width < 1.0 then Iris-setosa then outlook = sunny Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 1 9 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

20 03/04/06 03/04/06 _✂ A more complex example Advantages of using exceptions _✂

Exceptions to exceptions to exceptions … Rules can be updated incrementally ♦

Easy to incorporate new data

default: Iris-setosa except if petal-length 2.45 and petal-length < 5.355 ♦

≥

Easy to incorporate domain knowledge

and petal-width < 1.75 ✂ then Iris-versicolor People often think in terms of exceptions except if petal-length 4.95 and petal-width < 1.55 ✂

≥ then Iris-virginica Each conclusion can be considered just in the else if sepal-length < 4.95 and sepal-width ≥

2.45 context of rules and exceptions that lead to it then Iris-virginica else if petal-length

3.35 ≥ ♦

Locality property is important for understanding

then Iris-virginica

large rule sets

except if petal-length < 4.85 and sepal-length < 5.95 then Iris-versicolor ♦

“Normal” rule sets don’t offer this advantage

03/04/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 2 1 03/04/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

22 More on exceptions _✂ Rules involving relations Default...except if...then... _✁

So far: all rules involved comparing an attribute-

is logically equivalent to _✁ value to a constant (e.g. temperature < 45) if...then...else

These rules are called “propositional” because they have the same expressive power as _✂ (where the else specifies what the default did) propositional logic

But: exceptions offer a psychological advantage ✁

What if problem involves relationships between

♦

Assumption: defaults and tests early on apply examples (e.g. family tree problem from above)? more widely than exceptions further down

♦

Can’t be expressed with propositional rules

♦

Exceptions reflect special cases

♦

More expressive representation required

03/04/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 2 3 03/04/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 2 4

A propositional solution

Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 2 5 03/04/06 The shapes problem

♦

Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 2 7 03/04/06

Comparing attributes with each other _✂ Generalizes better to new data _✂ Standard relations: =, <, > _✂ But: learning relational rules is costly _✂ Simple solution: add extra attributes (e.g. a binary attribute is width < height?)

If width > height then lying If height > width then standing Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 2 8 03/04/06

Rules with variables _✁

Using variables and multiple relations: _✁ The top of a tower of blocks is standing: _✁ The whole tower is standing: _✁ Recursive definition!

If height_and_width_of(x,h,w) and h > w then standing(x) If is_top_of(x,z) and height_and_width_of(z,h,w) and h > w and is_rest_of(x,y)and standing(y) then standing(x) If empty(x) then standing(x) If height_and_width_of(x,h,w) and h > w and is_top_of(x,y) then standing(x) Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 2 9 03/04/06

Inductive logic programming _✁

Recursive definition can be seen as logic program _✁ Techniques for learning logic programs stem from the area of “inductive logic programming” (ILP) _✁

But: recursive definitions are hard to learn

Also: few practical problems require recursion

♦ Thus: many ILP techniques are restricted to non-

recursive definitions to make learning easier

Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 3 0 03/04/06 Trees for numeric prediction _✂

Regression: the process of computing an _✂ expression that predicts a numeric quantity Regression tree: “decision tree” where each leaf predicts a numeric quantity

♦

Predicted value is average value of training instances that reach the leaf _✂

Model tree: “regression tree” with linear regression models at the leaf nodes

♦

Linear patches approximate continuous function

2 Sides Height Widt h If width ≥ 3.5 and height < 7.0 then lying If height ≥ 3.5 then standing

Target concept: standing up Shaded: standing Unshaded: lying

26 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 03/04/06

Lying

10 Lying

9 Standing Lying Standing Lying Standing Standing Class

A relational solution ✂

Linear regression for the CPU data

PRP =

56.1

0.049 MYCT
0.015 MMIN
0.006 MMAX
0.630 CACH

0.270 CHMIN

1.46 CHMAX

Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 3 2 03/04/06 Regression tree for the CPU data

Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 3 5 03/04/06 The distance function _✂

✂ Only those instances involved in a decision need to be stored _✂ Noisy instances should be filtered out _✂ Idea: only use prototypical examples

Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 3 6 03/04/06 Learning prototypes

Weighting the attributes might be necessary

♦

Several numeric attributes: normally, Euclidean distance is used and attributes _✂ are normalized Nominal attributes: distance is set to 1 if _✂ values are different, 0 if they are equal Are all attributes equally important?

Distance is the difference between the two attribute values involved (or a function thereof) _✂

Simplest case: one numeric attribute ♦

Similarity function defines what’s “learned” _✂ Instance-based learning is lazy learning _✂ Methods: nearest-neighbor, k-nearest- neighbor, …

Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 3 3 03/04/06 Model tree for the CPU data

Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 3 1 03/04/06

♦

The instances themselves represent the knowledge Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 3 7 03/04/06 Rectangular generalizations

♦

Training instances are searched for instance that most closely resembles new instance

Simplest form of learning: rote learning ♦

Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 3 4 03/04/06 Instance-based representation _✂

Also called instance-based learning _✂

✂ Nearest-neighbor rule is used outside rectangles _✂ Rectangles are rules! (But they can be more conservative than “normal” rules.) _✂ Nested rectangles are rules with exceptions

Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 3 8 03/04/06 Representing clusters I

Simple 2-D representation Venn diagram

Overlapping clusters Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 3 9 03/04/06

Representing clusters II

1 2 3 a 0.4 0 .1 0 .5 b 0.1 0 .8 0 .1 c 0 .3 0 .3 0 .4 d 0 .1 0 .1 0 .8 e 0.4 0 .2 0 .4 f 0.1 0 .4 0 .5 g 0 .7 0 .2 0 .1 h 0.5 0 .4 0 .1 …

Probabilistic assignment Dendrogram

NB: dendron is the Greek word for tree

From trees to rules

Special case: boolean class ✁

A propositional solution

A relational solution ✂

Linear regression for the CPU data

Representing clusters II

Dokumen yang terkait

Joyce Weinberg Everything Guide to Starting and BookFi

Stream-Aquifer Interactions Base Flow – Contribution to streamflow from groundwater

C. P. Kumar Scientist ‘F’ - Introduction to Groundwater Modeling

Washington’s Water Quality Management Plan to Control Nonpoint Sources of Pollution

Preparation shape of cervical edge of porcelain fused to metal crown affect incidence of gingivitis at maxillary central incisive teeth

BAB 9 SQL Using Subqueries to Solve Queries

Strategies and Steps to eBusiness Success

Chapter14 - Introduction to Multiple Regression

Intro to Network Security and Scanning

Instance: specific type of example Thing to be classified, associated, or clustered Individual, independent example of target concept Characterized by a predetermined set of attributes Input to learning scheme: set of instancesdataset Represented as a sin

Dukungan

Links

From trees to rules

Special case: boolean class ✁

A propositional solution

A relational solution ✂

Linear regression for the CPU data

Representing clusters II

Dokumen yang terkait

Joyce Weinberg Everything Guide to Starting and BookFi

Stream-Aquifer Interactions Base Flow – Contribution to streamflow from groundwater

C. P. Kumar Scientist ‘F’ - Introduction to Groundwater Modeling

Washington’s Water Quality Management Plan to Control Nonpoint Sources of Pollution

Preparation shape of cervical edge of porcelain fused to metal crown affect incidence of gingivitis at maxillary central incisive teeth

BAB 9 SQL Using Subqueries to Solve Queries

Strategies and Steps to eBusiness Success

Chapter14 - Introduction to Multiple Regression

Intro to Network Security and Scanning

Instance: specific type of example Thing to be classified, associated, or clustered Individual, independent example of target concept Characterized by a predetermined set of attributes Input to learning scheme: set of instancesdataset Represented as a sin

Dokumen yang Anda mencari sudah siap untuk unduhkan