Yeni Herdiyeni Departemen Ilmu Komputer IPB

Uncertainty Yeni Herdiyeni Departemen Ilmu Komputer IPB

“The World is very Uncertain Place” Ketidakpastian

Presiden Indonesia tahun 2014 adalah Perempuan • Tahun depan saya akan lulus
Jumlah penduduk Indonesia sekitar 200 juta
Tahun depan saya akan naik jabatan
Jika gejalanya adalah demam tinggi maka penyakitnya adalah tipes
Dll • Ketidakpastian menyulitkan penyelesaian masalah

Naïve Bayes Classifier

We will start off with a visual intuition, before looking at the math… Thomas Bayes

1702 - 1761 A n te n n a Le n g th

Katydids Abdomen Length

9 Grasshoppers

7 8 9 10

Remember this example? Let’s get lots more data…

9 Katydids

Let us us two normal distributions for ease of visualization in the following slides…

We can leave the histograms as they are, or we can summarize them with two normal distributions.

Grasshoppers With a lot of data, we can build a histogram. Let us just build one for “Antenna Length” for now…

An te n n a Le n g th

8 9 10

We want to classify an insect we have found. Its antennae are 3 units long. How can we classify it?
We can just ask ourselves, give the distributions of antennae lengths we have seen, is it more probable that our insect is a Grasshopper or a Katydid .
There is a formal way to discuss the most probable classification…

p(c

| d) = probability of class c , given that we have observed d

j j

3 Antennae length is 3 p(c

| d) = probability of class c , given that we have observed d

j j 10 / (

10 + P( | 3 ) = 2 ) = 0.833 Grasshopper

2 / (

10 + P( | 3 ) = 2 ) = 0.166 Katydid

3 Antennae length is 3

7 Antennae length is 7

| 5 ) = 6 / ( 6 + 6 ) = 0.500 P( Katydid

, given that we have observed d

| d) = probability of class c

p(c j

| 5 ) = 6 / ( 6 + 6 ) = 0.500

6 P( Grasshopper

3 P( Grasshopper

, given that we have observed d

| d) = probability of class c

p(c j

| 7 ) = 9 / ( 3 + 9 ) = 0.750

| 7 ) = 3 / ( 3 + 9 ) = 0.250 P( Katydid

5 Antennae length is 5

Idiot Bayes • Naïve Bayes • Simple Bayes We are about to see some of the mathematical formalisms, and more examples, but keep in mind the basic idea.
Bayesian classifiers use Bayes theorem, which says

p(c _j
p(d | c _j
p(c
p(d) = probability of instance d occurring

This is just how frequent the class c _j , is in our database

j ) = probability of occurrence of class c

We can imagine that being in class c _j , causes you to have feature d with some probability

) = probability of generating instance d given class c

This is what we are trying to compute

| d) = probability of instance d being in class c _j ,

) p(d)

) p(c j

| d ) = p(d | c j

p (c j

Bayes Classifiers

Find out the probability of the previously unseen instance belonging to each class, then simply pick the most probable class.

That was a visual intuition for a simple case of the Bayes classifier, also called:

Bayes Classifiers

This can actually be ignored, since it is the same for all classes Assume that we have two classes

Drew Carey Drew Barrymore p

This is Officer Drew (who arrested me in 1997). Is Officer Drew a Male or Female ?

Drew Male Claudia Female Drew Female Drew Female Alberto Male Karin Female Nina Female Sergio Male

Officer Drew Name Sex

) p(d)

) p(c j

| d) = p(d | c j

(c j

What is the probability of being named “drew”? (actually irrelevant, since it is that same for all classes)

What is the probability of being called “drew” given that you are a male ? What is the probability of being a male ?

p(drew) (Note: “Drew can be a male or female name”)

( male | drew) = p(drew | male ) p( male )

We have a person whose sex we do not know, say “drew” or d. Classifying drew as male or female is equivalent to asking is it more probable that drew is male or female , I.e which is greater p( male | drew) or p( female | drew)

2 = female .

= male , and c

Luckily, we have a small database with names and sex. We can use it to apply Bayes rule…

( male | drew) = 1/3 * 3/8

= 0.125 3/8 3/8 p

( female | drew) = 2/5 * 5/8

= 0.250 3/8 3/8 Officer Drew p

(c j

| d) = p(d | c j

) p(c

)

p(d)

Name Sex

Drew Male Claudia Female Drew Female Drew Female Alberto Male Karin Female Nina Female Sergio Male

Officer Drew is more likely to be a Female .

Officer Drew IS a female!

Officer Drew p

( male | drew) = 1/3 * 3/8

= 0.125 3/8 3/8 p ( female | drew) = 2/5 * 5/8 = 0.250 3/8 3/8

To simplify the task, naïve Bayesian classifiers assume attributes have independent distributions, and thereby estimate

|c j

The probability of class c _j generating the observed value for feature 1, multiplied by..

The probability of class c _j generating instance d , equals….

)

|c j

) * ….* p(d n

|c j

) * p(d

Name Over 170 CM Eye Hair length Sex

) = p(d

p (d|c j

How do we use all the features?

So far we have only considered Bayes Classification when we have one attribute (the “antennae length”, or the “name”). But we may have many features.

) p(d)

) p(c j

| d) = p(d | c j

Drew No Blue Short Male Claudia Yes Brown Long Female Drew No Blue Long Female Drew No Blue Long Female Alberto Yes Brown Short Male Karin No Blue Long Female Nina Yes Brown Short Female Sergio Yes Blue Long Male

The probability of class c _j generating the observed value for feature 2, multiplied by..

To simplify the task, naïve Bayesian classifiers assume attributes have independent distributions, and thereby estimate

p (d|c ) = p(d |c ) * p(d |c |c ) ) * ….* p(d j 1 j 2 j n j p ( officer drew |c ) = p(over_170 = yes|c ) * p(eye =blue|c _{j cm j j} ) * ….

Officer Drew is blue-eyed, officer drew p ( | Female ) = 2/5 * 3/5 * …. over 170 _cm p ( officer drew | Male ) = 2/3 * 2/3 * …. tall, and has long hair

The Naive Bayes classifiers is often represented as this

c j

type of graph… Note the direction of the arrows, which state that each class causes certain features, with a certain probability

p … (d |c ) p(d |c ) p(d |c ) j j n j

2 Naïve Bayes is fast and space efficient We can look up all the probabilities with a single scan of the database and store them in a (small) table…

Sex Over190 _{Male Yes} _cm _0.15 _No _0.85 _{Female Yes} _0.01 _No _0.99 c j

… p

|c _j ) p(d

|c _j ) p(d _n |c _j ) _{Sex Long Hair} _{Male Yes} _0.05 No ^0.95 _{Female Yes} _0.70 _No _0.30 ^Sex ^Male _Female Naïve Bayes is NOT sensitive to irrelevant features...

Suppose we are trying to classify a persons sex based on several features, including eye color. (Of course, eye color is completely irrelevant to a persons gender) p (

Jessica

| Female ) = 9,000/10,000 * 9,975/10,000 * ….

p ( Jessica | Male ) = 9,001/10,000 * 2/10,000 * …. p ( Jessica |c _j

) = p(eye = brown|c _j ) * p( wears_dress = yes|c _j ) * ….

However, this assumes that we have good enough estimates of the probabilities, so the more data the better.

Almost the same! An obvious point. I have used a simple two class problem, and two possible values for each example, for my previous examples. However we can have an arbitrary number of classes, or feature values

Animal Mass >10 _{Cat Yes} _kg _0.15 _No _0.85 _{Dog Yes} _0.91 _No _0.09 Pig Yes ^0.99 _No _0.01 c j

Problem! Naïve Bayes assumes independence of features…

)

(d|c j

) p

|c j

) p(d n

|c j

) p(d

… p

|c j

Dog _Pig ^{Animal Color} ^{Cat Black} ^0.33 ^White ^0.23 _Brown _0.44 _{Dog Black} _0.97 _White _0.03 _Brown _0.90 Pig Black ^0.04 _White _0.01 _Brown _0.95 Naïve Bayesian Classifier p (d

|c _j ) p(d _n |c _j ) _Animal _Cat

|c _j ) p(d

Sex Over 6 _{Male Yes} _foot _0.15 _No _0.85 _{Female Yes} _0.01 No ^0.99 ^{Sex Over 200} ^pounds ^{Male Yes} ^0.11 ^No ^0.80 ^{Female Yes} ^0.05 ^No ^0.95

Naïve Bayesian Classifier p

Solution Consider the relationships between attributes…

)

(d|c j

) p

|c j

) p(d n

|c j

) p(d

|c j

Sex Over 6 _{Male Yes} _foot _0.15 _No _0.85 _{Female Yes} _0.01 No ^0.99 ^{Sex Over 200 pounds} ^{Male Yes and Over 6 foot} ^0.11 ^{No and Over 6 foot} ^0.59 ^{Yes and NOT Over 6 foot} ^0.05 ^{No and NOT Over 6 foot} ^0.35 _{Female Yes and Over 6 foot} _0.01 Naïve Bayesian Classifier p (d

Solution Consider the relationships between attributes…

)

(d|c j

) p

|c j

) p(d n

|c j

) p(d

|c j

But how do we find the set of connecting arcs??

The Naïve Bayesian Classifier has a quadratic decision boundary

8 9 10

Advantages/Disadvantages of Naïve Bayes

Advantages:

– Fast to train (single scan). Fast to classify
– Not sensitive to irrelevant features
– Handles real and discrete data
– Handles streaming data well

Disadvantages:

– Assumes independence of features

Naive Bayesian Classifier (II)  Given a training set, we can compute the probabilities _{O u tlo o k P N H u m id ity P N} _{su n n y 2 /9 3 /5 h ig h 3 /9 4 /5} _{o verc ast 4 /9 n o rm al 6 /9 1 /5} _{rain 3 /9 2 /5}

T em p reatu re W in d y _{h o t 2 /9 2 /5 tru e 3 /9 3 /5} _{m ild 4 /9 2 /5 false 6 /9 2 /5} _{c o o l 3 /9 1 /5} ^{Outlook Temperature Humidity W indy Class} ^{sunny hot high false N} ^{sunny hot high true N} ^{overcast hot high false P} ^{rain mild high false P} ^{rain cool normal false P} _{rain cool normal true N} _{overcast cool normal true P} _{sunny mild high false N} _{sunny cool normal false P} _{rain mild normal false P} _{sunny mild normal true P} overcast mild high true P _{overcast hot normal false P} _{rain mild high true N} Play-tennis example: estimating P(x i

|C) _{Outlook Temperature Humidity W indy Class} _{sunny hot high false N} _{sunny hot high true N} overcast hot high false P _{rain mild high false P} _{rain cool normal false P} _{rain cool normal true N} _{overcast cool normal true P} _{sunny mild high false N} _{sunny cool normal false P} rain mild normal false P _{sunny mild normal true P} _{overcast mild high true P} _{overcast hot normal false P} _{rain mild high true N} outlook P(sunny|p) = 2/9 P(sunny|n) = 3/5 P(overcast|p) = 4/9 P(overcast|n) = 0 P(rain|p) = 3/9 P(rain|n) = 2/5 temperature P(hot|p) = 2/9 P(hot|n) = 2/5 P(mild|p) = 4/9 P(mild|n) = 2/5 P(cool|p) = 3/9 P(cool|n) = 1/5 humidity P(high|p) = 3/9 P(high|n) = 4/5 P(normal|p) = 6/9 P(normal|n) = 2/5 windy P(true|p) = 3/9 P(true|n) = 3/5 P(false|p) = 6/9 P(false|n) = 2/5 P(p) = 9/14 P(n) = 5/14 Latihan

Yeni Herdiyeni Departemen Ilmu Komputer IPB

9 Grasshoppers

9 Katydids

7 Antennae length is 7

5 Antennae length is 5

Dokumen yang terkait

Prediksi Indeks Prestasi Kumulatif Mahasiswa Ilmu Komputer IPB Menggunakan Algoritma VFI5

Sistem Pengiriman Dokumen Digital Pada Petal (Sistem Peminjaman Koleksi Digital Perpustakaan Ilmu Komputer IPB)

Pengembangan Sistem Peminjaman Koleksi Digital Perpustakaan (Studi Kasus : Departemen Ilmu Komputer Ipb)

Modul Extract, Transform, Load (ETL) untuk Data Warehouse Akademik Departemen Ilmu Komputer IPB Menggunakan Kettle

Penambahan Jumlah Informasi Pada Qr Code Menggunakan Teknik Kompresi Data Lossless.

Departemen ESL FEM | IPB

DEPARTEMEN BIOLOGI FMIPA IPB 2010

Profil Departemen INTP Fakultas Peternakan IPB 2013

Yeni Herdiyeni Departemen Ilmu Komputer FMIPA IPB

Yeni Herdiyeni Departemen Ilmu Komputer IPB

Dukungan

Links

Yeni Herdiyeni Departemen Ilmu Komputer IPB

9 Grasshoppers

9 Katydids

7 Antennae length is 7

5 Antennae length is 5

Dokumen yang terkait

Prediksi Indeks Prestasi Kumulatif Mahasiswa Ilmu Komputer IPB Menggunakan Algoritma VFI5

Sistem Pengiriman Dokumen Digital Pada Petal (Sistem Peminjaman Koleksi Digital Perpustakaan Ilmu Komputer IPB)

Pengembangan Sistem Peminjaman Koleksi Digital Perpustakaan (Studi Kasus : Departemen Ilmu Komputer Ipb)

Modul Extract, Transform, Load (ETL) untuk Data Warehouse Akademik Departemen Ilmu Komputer IPB Menggunakan Kettle

Penambahan Jumlah Informasi Pada Qr Code Menggunakan Teknik Kompresi Data Lossless.

Departemen ESL FEM | IPB

DEPARTEMEN BIOLOGI FMIPA IPB 2010

Profil Departemen INTP Fakultas Peternakan IPB 2013

Yeni Herdiyeni Departemen Ilmu Komputer FMIPA IPB

Yeni Herdiyeni Departemen Ilmu Komputer IPB

Dokumen yang Anda mencari sudah siap untuk unduhkan