CONCEPTFRAMEWORK BIG DATA ANALYTICS ACTIVITIES
Program Studi: Manajemen Bisnis Telekomunikasi & Informatika Mata Kuliah: Big Data And Data Analytics Oleh: Tim Dosen
CONCEPT/FRAMEWORK BIG DATA ANALYTICS
ACTIVITIESCreating the great business leaders o
Review Data Analytics & Big Data (last week topics) o
Understanding Data o
Activity / Storytelling Based on Data Type (Model Based)
oAsking the Questions to Data Outline Definitions Analytics: “the systematic computational analysis of data or statistics” (Google Definition)
“the method of logical ” (Merriam – Webster) Analysis of data is a process of inspecting, cleaning, transforming, and modeling suggesting conclusions, and supporting decision- making. Data science is an interdisciplinary field about processes and systems to
extract in various forms, either structured or
unstructured, which is a continuation of some of the data analysis fields such as (From Wikipedia, by many references) and many more…..
1. Dhar, V. (2013). .
2. Jeff Leek (2013-12-12). Simply Statistics.
Creating the great business leaders Data Science
Based on aforementioned definitions, we can conclude that Data Analytics includes:
- Data engineering
- Scientific Method • Math • Statistics Data Engineering may includes:
- Data Gathering • Data Mining • Data Transformation • Data Cleansing • etc. Ref: many sources
Creating the great business leaders
Creating the great business leaders Our approach to Big Data
Recall
Big Data Approach Framework
Some people prefer 3Vs,
6Vs or 7Vs even 12Vs to explain big data. But the original “bigness” measurement metrics are volume, velocity, and variety.
For example 7Vs:
1. Volume
2. Velocity
3. Variety
4. Variability
5. Veracity
6. Visualitazion
7. Value
Creating the great business leaders
Creating the great business leaders
Creating the great business leaders Data Set Data Analytics Methods Knowledge
Data Analytics Workflow
Creating the great business leaders Big Data Analytics Constructors
High Dimensonal Data Analysis : Curse and Blessing of Dimensionality
37 AB Married …..
Curse : High space searching, Summarization, Reduction (PCA) Blessing : Comprehensive data knowledge
Dimension / Attributes / Properties High Dimensional Data, add up complexity problem to Big Data Analytics
Male
18 B Single …...
Zorro Jl. Dago 34 Student
Male …... …. …. …. ….. ….. ….. …..
Female Ben Jl. Diponegoro 12 Driver
Creating the great business leaders UNDERSTANDING DATA : High Dimensional Data
21 O Single …..
Male Beatrice Jl. Raya 27 Student
32 O Married …..
Male Andry Jl. Kucing 50 Lawyer
30 A Married …..
Name Address Occupation Age Blood Type Marital ….. Sex Agus Jl. Mawar 1 Artist
- – Donoho (2000)
Name Sex Age Number of Friend Agus Male
1
Non Network Data Network Data
1 -
1 1 - Rina
1 Dita
1
1 Cecep 1 -
1 Agus Cecep Dita Rina Agus -
25
22
2 Rina Sex
21
3 Dita Sex
23
2 Cecep Male
Cecep Dita Rina Agus UNDERSTANDING DATA : Structured vs UnStructured Data Creating the great business leaders
Creating the great business leaders
Creating the great business leaders Characteristics Stuctured Data
Unstructured Data
Well defined content Structure not obvious
Easily understood Process data to understand
Stored in RDBMS RDBMS not a good fit
Easy to enter, store, and analyze Difficult and costly to analyze Example: Data in database table (customer data, sales data, sensor data)
Example: Email, video files, audio files, web pages, presentations, social media feeds
- NoSQL: Not only SQL
Creating the great business leaders UNDERSTANDING DATA : SQL vs NoSQL
SQL NoSQL
Creating the great business leaders
Creating the great business leaders
MODELLING FRAMEWORK
Creating the great business leaders
Creating the great business leaders Case Studies : Data Analytics Common Roles
1. Estimation
2. Predictions
3. Classification
4. Clustering
5. Association
1. Estimation
18
Learning with Estimation Methods (Regresi Linier)
Delivery Time (T) = 0.48O + 0.23TL + 0.5D Knowledge
12 Label Estimate Pizza Time Delivery
2
4
2
1000
36 ...
8
6
4
4
Creating the great business leaders
Customer Number of Order (O) Number of Traffic Light (TL) Distance (D) Delivery Time (T)
4
2
3
20
4
7
1
2
16
3
3
3
1
6 Output/Pola/Model/Knowledge
1. Formula/ Function (Rumus atau Fungsi Regresi)
- DELIVERY TIME = 0.48 + 0.6 DISTANCE + 0.34 TRAFFIC LIGHT + 0.2 ORDER
2. Decision Tree (Pohon Keputusan)
3. Correlation and Association
4. Rule (Aturan)
- IF ipk>3.5 THEN lulus cum laude
5. Cluster (Klaster) Creating the great business leaders
2. Prediction
Predict Stock Price Label
Stock price data set in a form of time series (rentet waktu) model
Learning with Prediction(Neural Network) Creating the great business leaders
2. Prediction
Predict Stock Price Knowledge in a form of Neural Network Model
Prediction Plot Creating the great business leaders
Label
3. Classification
Classify Student Graduation Time Student Sex National School
IPS1
IPS2
IPS3
IPS 4 ... Graduation Number Final Score Origin Status
10001 L
28 SMAN 2
3.3
3.6
2.89
2.9 On Time 10002 P
27 SMA DK
4.0
3.2
3.8
3.7 Late 10003 P
24 SMAN 1
2.7
3.4
4.0
3.5 Late 10004 L
26.4 SMAN 3
3.2
2.7
3.6
3.4 On Time ...
... 11000 L
23.4 SMAN 5
3.3
2.8
3.1
3.2 On Time
Learning with Classification Methods(C4.5) Creating the great business leaders
3. Classification
Classify Student Graduation Time Knowledge in a form of Decision Tree Model
Creating the great business leaders
3. Classification
Golf Playing Time Recommendation Input
If outlook = sunny and humidity = high then play = no Output
If outlook = rainy and windy = true then play = no If outlook = overcast then play = yes (Rules)
If humidity = normal then play = yes If none of the above then play = yes Creating the great business leaders
Creating the great business leaders
Golf Playing Time Recommendation Output Decision Tree
3. Classification
Creating the great business leaders
Contact Lens Recommendation Input
3. Classification
Output
Creating the great business leaders
Finding Iris Flower Cluster Input
4. Clustering
Dataset without Label Learning with Clustering Methods (K-Means)
4. Clustering
Creating the great business leaders
Output (Distance Plot) Finding Iris Flower Cluster
5. Association
Creating the great business leaders
Association Product Sold Learning with Association Method (FP-Growth) Association Product Sold
5. Association
Output (Association Rules) Creating the great business leaders
5. Association
- association rule algorithm objective is to find some attributes which has shown up together “ ”
- Example, on Thursday night, 1000 customer has bought
200 orang membeli Soap , where from 200 who bought soap, 50 among them bought Fanta If buy Soap, then buy Fanta
- In association rule, we have “
”, with support value = 200/1000 = 20% and confidence value= 50/200 = 25%
A priori algorithm ,
- Some association rule algorithm are :
FP-Growth algorithm , GRI algorithm Creating the great business leaders Assignment (in The Class) o
Find a Case Study of Big Data Implementation / Application for Business or others o
State the objective, problems, solution idea o
State the methodology used (explain) o
State the model, measurement, accuracy Creating the great business leaders Assignment (at home) o
Find a Case Study of Big Data Implementation / Application for Business or others o
State the objective, problems, solution idea o
State the methodology used (explain) o
State the model, measurement, accuracy, evaluation o
Learn Big Data online free course (www.bigdatauniversity.com) Creating the great business leaders