3 Studi Kasus Machine Learning dan Data Mining

MACHINE LEARNING

DAN DATA MINING SUPENO MARDI

Kelas Logistik dan Jadwal

36 Pertemuan • Software yang digunakan

– Python 3
– TensorFlow (TF) + Keras

Final Project + Presentasi

Daftar isi

Terminologi AI, Machine Learning dan data mining
Learning data untuk model
Tipe-tipe Tugas Belajar (Learning Tasks)
Pendefinisian tugas belajar (Learning Task)
Contoh-contoh kasus machine Learning • Data Mining

Terminologi

Sinonim

– Artificial Intelligence – Machine Learning – Data mining
– Pattern recognition
– Probability and Statistics – Information theory
– Numerical optimization
– Computational complexity theory
– Control theory (adaptive)

Machine Learning,Statistics dan Data Mining

Differences in terminology:

– Ridge regression = weight-decay
– Fitting = learning
– Held-out data = test data

The emphasis is very different:

– A good piece of statistics: Clever proof that a

relatively simple estimation procedure is asymptotically unbiased.

– A good piece of machine learning: Demonstration

that a complicated algorithm produces impressive results on a specific task.

Data-mining: Using machine learning techniques on very large databases.

“Learning” Data

Learning general models dari a data of particular examples
Data tersedia banyak dan murah(data warehouses, data marts); knowledge mahal dan jarang.
Contoh dalam retail: Customer transactions to consumer behavior:

People who bought “Da Vinci Code” also bought “The Five People You Meet in Heaven” (www.amazon.com)

Pembuatan model yang a good and useful approximation to the data.

Tipe-tipe Tugas Belajar (Learning Tasks)

Association • Supervised learning

– Learn to predict output when given an input vector

Reinforcement learning

– Learn action to maximize payoff _{• Payoff is often delayed} _{• Exploration vs. exploitation}

Online setting
Unsupervised learning

– Create an internal representation of the input e.g. form clusters; extract features _{• How do we know if a representation is good?} – Big datasets do not come with labels.

Learning Associations

Basket analysis:

P (Y | X ) probability that somebody who buys X also buys Y where X and Y are products/services.

Example: P ( chips | beer ) = 0.7

Classification

Example: Credit scoring
Differentiating between low-risk and

high-risk customers from their income and savings _{Discriminant:} _{IF income > θ AND savings > θ} ₁ ₂ THEN low-risk ELSE high-risk Aplikasi-aplikasi Classification

Aka Pattern recognition Pose, lighting, occlusion (glasses,
Face recognition:

beard), make-up, hair style • Character recognition: Different handwriting styles.

Temporal dependency.

Speech recognition: – Use of a dictionary or the syntax of the language.

– Sensor fusion: Combine multiple modalities; eg, visual (lip image) and acoustic for speech

Medical diagnosis: From symptoms to illnesses

₁₁

Face Recognition

Training examples of a person Test images

The Role of Learning Penggunaan Supervised Learning

Prediction of future cases: Use the rule to predict

the output for future inputs

Knowledge extraction: The rule is easy to

understand

Compression: The rule is simpler than the data it

explains

Exceptions that are not covered by the rule, e.g., fraud

Outlier detection:
Learning “what normally happens”
Clustering: Grouping similar instances
Example applications

– Customer segmentation in CRM (customer relationship management)
– Image compression: Color quantization
– Bioinformatics: Learning motifs

Displaying the structure of a set of documents

Contoh: Cancer Diagnosis

Application: automatic disease detection
Importance: this is modern/future medical diagnosis.
Prediction goal: Based on past patients, predict whether you have the disease
Data: Past patients with and without the disease
Target: Cancer or no-cancer
Features: Concentrations of various proteins in

your blood

Contoh: Zipcodes

Application: automatic zipcode recognition
Importance: this is modern/future delivery of small goods.
Goal: Based on your handwritten digits, predict what they are and use them to route mail
Data: Black-and-white pixel values
Target: Which digit
Features: ?

What makes a 2?

Contoh: Google

Application: automatic ad selection • Importance: this is modern/future advertising.
Prediction goal: Based on your search query, predict which ads you might be interested in
Data: Past queries
Target: Whether the ad was clicked
Features: ?

Contoh: Call Centers

Application: automatic call routing
Importance: this is modern/future customer service.
Prediction goal: Based on your speech recording, predict which words you said
Data: Past recordings of various people
Target: Which word was intended
Features: ?

Contoh: Stock Market

Application: automatic program trading • Importance: this is modern/future finance.
Prediction goal: Based on past patterns, predict whether the stock will go up
Data: Past stock prices
Target: Up or down
Features: ?

Contoh :Web-based

The web contains a lot of data. Tasks with very big datasets often use machine learning – especially if the data is noisy or non-stationary.
Spam filtering, fraud detection: – The enemy adapts so we must adapt too.
Recommendation systems:

– Lots of noisy data. Million dollar prize!

Information retrieval: – Find documents or images with similar content.

What is a Learning Problem?

Learning involves performance _{Develop methods, techniques and} improving _{– with experience E} _{– at some task T}

_{available data set of training}

_{the problem in combination with an}

_{learning machines, that can solve}

^{tools for building intelligent}

_{– evaluated in terms of performance measure}

examples>Example: learn to play checkers _{– Experience E: playing against itself} _{– Task T: playing checkers} _{over time, without reprogramming,} _{its performance at a given task} _{When a learning machine improves} _{– Performance P: percent of games won} _something. it can be said to have learned
What exactly should be learned? _{– What specific algorithm should be used?} _{– How might this be represented?}

Pendefinisian tugas belajar (Learning Task) Improve on task, T, with respect to performance metric, P, based on experience, E.
T: Playing checkers P: Percentage of games won against an arbitrary opponent E: Playing practice games against itself T: Recognizing hand-written words P: Percentage of words correctly classified
E: Database of human-labeled images of handwritten words T: Driving on four-lane highways using vision sensors P: Average distance traveled before a human-judged error
E: A sequence of images and steering commands recorded while observing a human driver.
T: Categorize email messages as spam or legitimate. P: Percentage of email messages correctly classified.
E: Database of emails, some with human-given labels

Desain sebuah Learning System
Pilih : training experience
• Pilih : what is too be learned, i.e. the target function .

Pilih: how to represent the target function.

Pilih: a learning algorithm to infer the target function from the experience.
Learner Environment/ Knowledge Experience
Performance Element

Komponen-komponen sebuah Learning Problem
• Task: the behavior or task that’s being improved, e.g. classification, object recognition, acting in an environment.
Data: the experiences that are being used to improve performance in the task.

Measure of improvements: How can the improvement be measured? Examples:
– Provide more accurate solutions (e.g. increasing the accuracy in prediction)

– Cover a wider range of problems

– Obtain answers more economically (e.g. improved speed)

– Simplify codified knowledge

– New skills that were not presented initially
What Experience E to Use? Direct or indirect? – Direct: feedback on individual moves
_{Teacher or not?} _•
– Indirect: feedback on a sequence of moves _{• e.g., whether win or not}

– Teacher selects board states _{• Can be more efficient} _{• Tailored learning} _Questions _•

– Learner selects board states _{• No teacher} _{– Does training experience represent distribution of outcomes in world?} – Is training experience representative of performance goal?
What Exactly Should be Learned?
_{Playing checkers:} _{– Choose moves using some function} _{– Alternating moves with well-defined rules} _{Target function (TF): function to be learned during a learning process} _{– ChooseMove: Board  Move} _•
– Call this function the Target Function _{ A key to successful learning is to choose appropriate target function:} _{ Strategy: reduce learning to search for TF}

– ChooseMove is difficult to learn, e.g., with indirect training examples _{Alternative TF for checkers:} _{– Measure “quality” of the board state} _{– V : Board} _{ R} •

– Generate all moves _{• choose move with largest value}
A Possible Target Function V For
Checkers
In checkers, know all legal moves

V ( b ) w w bp ( b ) w rp ( b ) w bk ( b ) w rk ( b ) w bt ( b ) w rt ( b )
_{            } ₁ ₂ ₃ ₄ ₅ ₆
– From these, choose best move in any situation
Possible V function for checkers:
– if b is a final board state that is win, then V(b) = 100

– if b is a final board state that is loss, then V(b) = -100

– if b is a final board state that is draw, then V(b) = 0

– if b is a not a final state in the game, then V(b) = V(b), where b is the
best final board state that can be achieved starting from b and playing _⌃ optimally until the end of the game
This gives correct values, but is not operational
– So may have to find good approximation to V – Call this approximation V

How Might Target Function be Represented?
_{Many possibilities (subject of course)}
– As collection of rules ?

– As neural network ? _{Example of linear function of board features:} _•

– As polynomial function of board features ?
– _{• bp(b) : number of black pieces on board b}
₁ ₂ rp(b) + w bk(b)+w rk(b)+w bt(b)+w rt(b)
₃
₄ ₅ ₆ _{• bt(b) : number of red pieces threatened by black (i.e., which can be taken on black's next turn)} _{• rk(b) : number of red kings on b} • bk(b) : number of black kings on b _{Generally, the more expressive the representation, the more difficult it is to estimate} _• • rt(b) : number of black pieces threatened by red
w + w bp(b) + w _{• rp(b) : number of red pieces on b}

Inductive and Deductive Learning
Inductive Learning: Reasoning from a set of
examples to produce a general rules. The rules
should be applicable to new examples, but there is no guarantee that the result will be correct.
Deductive Learning: Reasoning from a set of
known facts and rules to produce additional rules that are guaranteed to be true.

Assessment of Learning Algorithms
The most common criteria for learning algorithms assessments are:
– Accuracy (e.g. percentages of correctly classified +’s and –’s)

– Efficiency (e.g. examples needed, computational tractability)

– Robustness (e.g. against noise, against incompleteness)

– Special requirements (e.g. incrementality, concept drift)

– Concept complexity (e.g. representational issues – examples & bookkeeping)

– Transparency (e.g. comprehensibility for the human user)

Data mining dan algorithms
Data Mining
– The desired outcome from data mining is to create a model from a given
dataset that can have its insights generalized to similar datasets. A real-
world example of a successful data mining application can be seen in automatic fraud detection from banks and credit institutions.

– Data mining is the process of discovering predictive information from the
analysis of large databases. For a data scientist, data mining can be a vague and daunting task – it requires a diverse set of skills and knowledge of many data mining techniques to take raw data and successfully get insights from it. You’ll want to understand and

that can help you with data mining at
scale.
Teknik-teknik Data mining
Finding natural groupings of data objects based upon the known _{loan request.}
email as spam or legitimate, or looking at a person’s credit score and approving or denying a
characteristics of that data. An example could be seen in marketing, where analysis can reveal _decisions. _{customer groupings with unique behavior – which could be applied in business strategy}

Examining outliers to examine potential causes and reasons for said _{outliers. An example of which is the use of outlier analysis in fraud detection, and trying to} buy beer, so stores placed them close to each other to increase sales. determine if a pattern of behavior outside the norm is fraud or not.
Contoh pemakaian panda pada model regresi pada Python
Menghitung relasi linear diantara variabel yang tersedia, menggunakan data dari
from Kaggle. https://www.springboard.com/blog/data-mining-python-tutorial/
Python script
import pandas as pd import matplotlib.pyplot as plt import numpy as np import scipy.stats as stats import seaborn as sns from matplotlib import rcParams
df = pd.read_csv('/Users/python/kc_house_data.csv')
df.head()
Tampilan Hasil
id date price bedrooms bathrooms sqft_living sqft_lot _{7129300520 20141013T000000 221900.0} ₃ _{1.00 1180 5650} _{1 6414100192 20141209T000000 538000.0} ₃ _{2.25 2570 7242} 2 5631500400 20150225T000000 180000.0 ² ^{1.00 770 10000} _{3 2487200875 20141209T000000 604000.0} ₄ _{3.00 1960 5000} _{4 1954400510 20150218T000000 510000.0} ₃ _{2.00 1680 8080}
df.describe()
price bedrooms bathrooms sqft_living _{count 21613 21613 21613 21613} _{mean 540088.10} _3.37 _{2.11 2079.90} std 367127.20 ^0.93 ^{0.77 918.44} _{min 75000.00} _0.00 _{0.00 290.00} _{25% 321950.00} _3.00 _{1.75 1427.00} 50% 450000.00 ^3.00 ^{2.25 1910.00} _{75% 645000.00} _4.00 _{2.50 2550.00} _{max 7700000.00} _33.00 _{8.00 13540.00}

3 Studi Kasus Machine Learning dan Data Mining

Kelas Logistik dan Jadwal

Daftar isi

Face Recognition

Contoh: Cancer Diagnosis

Contoh: Zipcodes

Contoh: Google

Contoh: Call Centers

Contoh: Stock Market

What Experience E to Use? Direct or indirect? – Direct: feedback on individual moves

What Exactly Should be Learned?

Teknik-teknik Data mining

Python script

Tampilan Hasil

Dokumen yang terkait

Data Warehouse dan Data Mining untuk Sistem Pendukung Manajemen

Data Mining

Penerapan Data Mining untuk Memprediksi Mahasiswa Drop Out Menggunakan Support Vector Machine

Penerapan Teknik Data Mining dengan Metode Support Vector Machine (SVM) untuk Memprediksi Siswa yang Berpeluang Drop Out (Studi Kasus di SMKN 1 Sutera)

Peramalan Harga Saham Menggunakan Metode Extreme Learning Machine (ELM) Studi Kasus Saham Bank Mandiri

Conceptual Learning Data Machine Learning

Data Warehouse, Data Mart, OLAP, dan Data Mining

Mathematical Analysis for Machine Learning and Data Mining pdf pdf

Machine Learning Understand Applications Intelligence Ebook 3 (1) pdf pdf

Machine Learning Understand Applications Intelligence Ebook 3 pdf pdf

Dukungan

Links

3 Studi Kasus Machine Learning dan Data Mining

Kelas Logistik dan Jadwal

Daftar isi

Face Recognition

Contoh: Cancer Diagnosis

Contoh: Zipcodes

Contoh: Google

Contoh: Call Centers

Contoh: Stock Market

What Experience E to Use? Direct or indirect? – Direct: feedback on individual moves

What Exactly Should be Learned?

Teknik-teknik Data mining

Python script

Tampilan Hasil

Dokumen yang terkait

Data Warehouse dan Data Mining untuk Sistem Pendukung Manajemen

Data Mining

Penerapan Data Mining untuk Memprediksi Mahasiswa Drop Out Menggunakan Support Vector Machine

Penerapan Teknik Data Mining dengan Metode Support Vector Machine (SVM) untuk Memprediksi Siswa yang Berpeluang Drop Out (Studi Kasus di SMKN 1 Sutera)

Peramalan Harga Saham Menggunakan Metode Extreme Learning Machine (ELM) Studi Kasus Saham Bank Mandiri

Conceptual Learning Data Machine Learning

Data Warehouse, Data Mart, OLAP, dan Data Mining

Mathematical Analysis for Machine Learning and Data Mining pdf pdf

Machine Learning Understand Applications Intelligence Ebook 3 (1) pdf pdf

Machine Learning Understand Applications Intelligence Ebook 3 pdf pdf

Dokumen yang Anda mencari sudah siap untuk unduhkan