Decision Tree for PlayTennis

Decision T ree Learning [read Chapter 3]

[recommended exercises 3.1, 3.4] Decision tree representation

ID3 learning algorithm Entropy, Information gain Over tting Decision T ree for P l ay T ennis

Outlook _Humidity _{Sunny Overcast} Rain _Wind Yes

_{No Yes No Yes}

High Normal Strong Weak A T ree to Predict C-Section Risk Learned from medical records of 1000 women Negative examples are C-sections

[833+,167-] .83+ .17- Fetal_Presentation = 1: [822+,116-] .88+ .12- | Previous_Csection = 0: [767+,81-] .90+ .10- | | Primiparous = 0: [399+,13-] .97+ .03- | | Primiparous = 1: [368+,68-] .84+ .16- | | | Fetal_Distress = 0: [334+,47-] .88+ .12- | | | | Birth_Weight < 3349: [201+,10.6-] .95+ .05-

| | | | Birth_Weight >= 3349: [133+,36.4-] .78+ .22-

| | | Fetal_Distress = 1: [34+,21-] .62+ .38- | Previous_Csection = 1: [55+,35-] .61+ .39- Fetal_Presentation = 2: [3+,29-] .11+ .89- Fetal_Presentation = 3: [8+,22-] .27+ .73-

Decision T rees Decision tree represen tation: _{Eac h in ternal no de tests an attribute} _{Eac h leaf no de assigns a classi cati on} Eac h branc h corresp onds to attribute v alue Ho w w ould w e represen t: _{^; _;} _{X OR} _{M of N} (A ^ B ) _ (C ^ :D ^ E )

When to Consider Decision T rees Instances describable by attribute{value pairs Target function is discrete valued Disjunctive hypothesis may be required Possibly noisy training data

Examples: Equipment or medical diagnosis Credit risk analysis Modeling calendar scheduling preferences

T op-Do wn Induction of Decision T rees Main loop: _A _node

1. the \best" decision attribute for next _A _node

2. Assign as decision attribute for _A

3. For each value of , create new descendant of _node

4. Sort training examples to leaf nodes

5. If training examples perfectly classi ed, Then STOP, Else iterate over new leaf nodes

Which attribute is best? _{[29+,35-] [29+,35-]} _A1=? _A2=?

t f t f [21+,5-] [8+,30-] [18+,33-] [11+,2-] En trop y

1.0 Entropy(S)

0.5 ₊

0.0 ^0.5 _p ^1.0 S _p is a sample of training examples _S _p is the proportion of positive examples in _S is the proportion of negative examples in _S

Entropy measures the impurity of _{E ntr opy S ?p p ? p p} ₂ ₂ ( ) log log En trop y E ntr opy

( ^S ) = expected number of bits needed to encode class ( or ) of randomly drawn member of ^S (under the optimal, shortest-length code)

Why? Information theory: optimal length code assigns _? log ² ^p bits to message having probability ^p .

So, expected number of bits to encode or of random member of ^S : _p ( ^? log ² ^p ) + ^p ( ^? log _{E ntr opy} ² ^p )

( ^S ) ^?p log ² ^{p ? p} log ² ^p Information Gain

Gain(S; _{sorting on A} ^{A) = exp ected reduction in en trop y due to} _Gain(S; _{A) E ntr opy (S ) ? E ntr opy (S )} _v _{2V al ues(A)} _X jS j _{jS j} _v _v

[29+,35-] [29+,35-] _{t f t f} _A1=? _A2=?

[21+,5-] [8+,30-] [18+,33-] [11+,2-] Training Examples

Day Outlook Temperature Humidity Wind PlayTennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes

D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No

Selecting the Next A ttribute

Which attribute is the best classifier? High Normal ^Humidity

[3+,4-] [6+,1-] ^Wind ^{Weak Strong} ^{[6+,2-] [3+,3-]} = .940 - (7/14).985 - (7/14).592 _{= .151} ^{= .940 - (8/14).811 - (6/14)1.0} _{= .048} ^{Gain (S, Humidity ) Gain (S, ) Wind} ^{=0.940 E} ^{=0.940 E} ^{=0.811 E =0.592 E =0.985 E} ^{=1.00 E}

^{[9+,5-] S: [9+,5-] S:}

Outlook Sunny Overcast Rain ^[9+,5−] {D1,D2,D8,D9,D11} {D3,D7,D12,D13} {D4,D5,D6,D10,D14} _{[2+,3−] [4+,0−] [3+,2−]}

Yes ^{{D1, D2, ..., D14}} ^? ^? Which attribute should be tested here? _{Ssunny = {D1,D2,D8,D9,D11}}

Gain (Ssunny , Humidity) _sunny _{Gain (S} _{, Temperature) = .970 − (2/5) 0.0 − (2/5) 1.0 − (1/5) 0.0 = .570} Gain (S sunny , Wind) = .970 − (2/5) 1.0 − (3/5) .918 = .019 ^{= .970 − (3/5) 0.0 − (2/5) 0.0 = .970}

Hyp othesis Space Searc h b y ^ID3 ...

_{+ +}

^A1
– + – Â2 _A3 ₊ ^... ^{+ – +} ^– Â2 _A4 _– ^{+ – + –} Â2 ^{+ – +} ... _... ^–

Hyp othesis Space Searc h b y ^ID3 Hypothesis space is complete! _{ Target function surely in there...

Outputs a single hypothesis (which one?) _{ Can't play 20 questions...

No back tracking _{ Local minima...

Statisically-based search choices _{ Robust to noisy data...

Inductive bias: approx \prefer shortest tree" Inductiv e Bias in ^ID3 H ^X _! Note is the power set of instances Unbiased? Not really...

Preference for short trees, and for those with high information gain attributes near the root _{pr efer enc e} Bias is a for some hypotheses, rather _{r estriction H} than a of hypothesis space Occam's razor: prefer the shortest hypothesis that ts the data Occam's Razor

Why prefer short hypotheses? Argument in favor: _! Fewer short hyps. than long hyps. a short hyp that ts data unlikely to be _! coincidence a long hyp that ts data might be coincidence

Argument opposed: There are many ways to de ne small sets of hyps e.g., all trees with a prime number of nodes that use attributes beginning with \Z" _size

What's so special about small sets based on of hypothesis?? Ov er tting in Decision T rees Consider adding noisy training example #15:

Sunny; Hot; Normal; Strong; PlayTennis No

What e ect on earlier tree? _Outlook _{Humidity Wind} _{Sunny Overcast} _Yes Rain

_{No Yes No Yes}

High Normal Strong Weak

Ov er tting Consider error of h yp othesis h o v er _{training data: er r or (h)} _{tr ain} _{Hyp othesis h} en tire distribution D of data: er r or (h) _{2 H o v er ts training data if there is} _D an alternativ e h yp othesis h _{er r or (h) < er r or (h )} _{tr ain tr ain} ^{2 H suc h that} and _{er r or (h) > er r or (h )} _{D D}

Ov er tting in Decision T ree Learning _0.85

0.9 _0.75

0.8 Accuracy

_0.65

0.7 _0.55

0.6 ^{On training data} _{On test data}

0.5 ₁₀ ₂₀ ₃₀ _{Size of tree (number of nodes)} ₄₀ ₅₀ ₆₀ ₇₀ ₈₀ _{90 100}

Av oiding Ov er tting How can we avoid over tting? stop growing when data split not statistically signi cant grow full tree, then post-prune

How to select \best" tree: Measure performance over training data Measure performance over separate validation data set MDL: minimize

( ) + ( ( ))

size tree size misclassifications tree Reduced-Error Pruning tr aining v al idation Split data into and set Do until further pruning is harmful: _{v al idation}

1. Evaluate impact on set of pruning each possible node (plus those below it)

2. Greedily remove the one that most improves _{v al idation} set accuracy produces smallest version of most accurate subtree What if data is limited?

E ect of Reduced-Error Pruning _0.85

0.9 _0.75

0.8 Accuracy

_0.65

0.7 _0.55

0.6 _{On test data (during pruning)} ^{On training data} _{On test data}

0.5 ₁₀ ₂₀ ₃₀ _{Size of tree (number of nodes)} ₄₀ ₅₀ ₆₀ ₇₀ ₈₀ _{90 100}

Rule P ost-Pruning

1. Convert tree to equivalent set of rules

2. Prune each rule independently of others

3. Sort nal rules into desired sequence for use _C4.5 Perhaps most frequently used method (e.g., )

Con v erting A T ree to Rules

Outlook _{Humidity Wind} _{Sunny Overcast} _Yes Rain

_{No Yes No Yes}

High Normal Strong Weak IF ( = ) ^ ( = ) _{THEN =} Outlook Sunny Humidity High

_{IF ( = ) ^ ( = )}

PlayTennis No _{THEN =} Outlook Sunny Humidity Normal PlayTennis Y es

:::

Con tin uous ^{V alued A ttributes} Create a discrete attribute to test con tin uous ₌ ₈₂ ₅ ₍ Temperature : _{72 3) =} Temperature > : t;f

: ⁴⁰ ⁴⁸ ⁶⁰ ⁷² ⁸⁰ ⁹⁰ Temperature _{: No No Y es Y es Y es No} PlayTennis A ttributes with Man y ^{V alues} Problem:

If attribute has many values, will select it

Gain

Imagine using = 3 1996 as attribute

Date Jun

One approach: use instead

GainRatio

( )

Gain S;A

( )

GainRatio S;A SplitInformation ( S;A ) _X _c _{j j j j} _{i i} _? S S ₂

( ) log

SplitInformation S;A _i=1 _{j j j j} _i S S _i

where is subset of for which has value

S S A v A ttributes with Costs Consider medical diagnosis,

BloodTest

has cost $150 robotics,

Width from

ft has cost 23 sec.

How to learn a consistent tree with low expected cost? One approach: replace gain by

Tan and Schlimmer (1990)

Gain ²

(

)

S;A

(

Cost

Nunez (1988)

2 ^Gain(S;A) ^?

1 (

Cost

(

) + 1) ^w where

w ²

;

1] determines importance of cost

) Unkno wn A ttribute ^{V alues} A

What if some examples missing values of ? Use training example anyway, sort through tree _{n A} _A If node tests , assign most common value of _n among other examples sorted to node _A assign most common value of among other examples with same target value _p _i _v _i _A assign probability to each possible value of _{{ p} i assign fraction of example to each descendant in tree

Classify new examples in same fashion

Decision Tree for PlayTennis

1.0 Entropy(S)

0.8 Accuracy

0.8 Accuracy

S;A

Dokumen yang terkait

Model Rule Penyebab Mahasiswa Perguruan Tinggi Pindah Dengan Metode Decision Tree

Analisis Perbandingan Algoritma Decision Tree Dengan Algoritma Random Tree Untuk Proses Pre Processing Data

0000015157 05 Classification Algoritma Decision Tree

Pemodelan Sistem Prediksi Tanaman Pangan Menggunakan Algoritma Decision Tree Crop Prediction System Using Decision Tree Algorithm

Sistem Deteksi Cacat Perangkat Lunak Berbasis Aturan Menggunakan Decision Tree

Klasifikasi Risiko Hipertensi Menggunakan Fuzzy Decision Tree Iterative Dichotomiser 3 (ID3)

David Bayu, et al., C45 Decision Tree Implementation 31

Sistem Pendukung Keputusan Seleksi Penerima Beasiswa Dengan Metode Decision Tree

Implementasi Metode Decision Tree untuk Kendali Pergerakan Lengan Robot Pengetik

Penerapan Algoritma Decision Tree Untuk Penilaian Agunan Pengajuan Kredit

Dukungan

Links

Decision Tree for PlayTennis

1.0 Entropy(S)

0.8 Accuracy

0.8 Accuracy

S;A

Dokumen yang terkait

Model Rule Penyebab Mahasiswa Perguruan Tinggi Pindah Dengan Metode Decision Tree

Analisis Perbandingan Algoritma Decision Tree Dengan Algoritma Random Tree Untuk Proses Pre Processing Data

0000015157 05 Classification Algoritma Decision Tree

Pemodelan Sistem Prediksi Tanaman Pangan Menggunakan Algoritma Decision Tree Crop Prediction System Using Decision Tree Algorithm

Sistem Deteksi Cacat Perangkat Lunak Berbasis Aturan Menggunakan Decision Tree

Klasifikasi Risiko Hipertensi Menggunakan Fuzzy Decision Tree Iterative Dichotomiser 3 (ID3)

David Bayu, et al., C45 Decision Tree Implementation 31

Sistem Pendukung Keputusan Seleksi Penerima Beasiswa Dengan Metode Decision Tree

Implementasi Metode Decision Tree untuk Kendali Pergerakan Lengan Robot Pengetik

Penerapan Algoritma Decision Tree Untuk Penilaian Agunan Pengajuan Kredit

Dokumen yang Anda mencari sudah siap untuk unduhkan