DATA SCIENCE Syahrimi binti Hasbullah Pe
DATA SCIENCE
Syahrimi binti Hasbullah
Perunding Latihan Kanan
Unit Pengurusan Perkhidmatan Data
Sub Kluster Pembangunan Kepakaran ICT
Kluster iIMATEC, INTAN
Contents
Introduction to Big Data
Introduction to Big Data Analytics
Data Science
Public Sector Data-Driven Initiative
Slaid # 2
Big Data: Definisi
exponential growth
availability of data
structured and unstructured
Characteristics: 4V? 5V? 7V?
And big data may be as important to business – and society – as
the Internet has become. Why? More data may lead to more
accurate analyses.
Slaid # 5
10V’s of Big Data Characteristics
Slaid # 6
Evolution of
Big Data
7
Challenge?
“highvolume, velocity and
variety information assets
that demand costeffective,
innovative forms of
information processing for
enhanced insight and
decision making” Gartner
Big Data Ecosystem
Data Sources/
Advanced Data
Management
Advanced Data
Analytics
Data Presentation/
Business Intelligence
9
© INTAN 2017
Technology and Tools
© INTAN 2017
10
Who Generate Data?
Organization
Human
Machine
Data Lake
© INTAN 2017
Data Types
Semi
Structured
Data
Unstructured
Data
Structured Data
Data Types
© INTAN 2017
How does Big Data Look Like?
Partial Tweet in JSON format
Web log
Spatial data
Machinegenerated data
© INTAN 2017
Data Warehouse
A large store of data accumulated from a wide range of
sources within a company and used to guide management
decisions
© INTAN 2017
Data Lake
• Keep data in original raw and unmodelled format
• limited amount of “species”
• constrained by its size
• smaller set of data is analyzed in more detail to
help answer the question
© INTAN 2017
Data Ocean
• collection of unmodelled data from the entire business, from
every possible area
• kept in a single repository
• The size of these oceans is vast
• improvements in analytics technology
• easier to “fish” for whatever data you need
© INTAN 2017
Big Data Usage
© INTAN 2017
18
KAJIAN KES:
FACEBOOK
© INTAN 2017
19
UMUR
JANTINA
1829
3049
LELAKI
PEREMPUAN
5064
65+
LOKASI
URBAN
SUBURBAN
RURAL
© INTAN 2017
https://www.simplilearn.com/howfacebookisusingbigdataarticle
© INTAN 2017
Analyzing the
‘Likes’
Tracking cookies
Facial recognition
Tag suggestion
© INTAN 2017
Introduction to Big Data Analytics
and Data Science
© INTAN 2017
23
Contents
What is Big Data Analytics (BDA)?
Overview of BDA Process
Traditional Approach vs Big Data Analytics
Types of Analytics
Data Science
Data Scientist
Methodology
© INTAN 2017
24
What is Big Data Analytics (BDA)?
Definition 1: Science of examining raw data with the purpose of
drawing conclusions about that information
Definition 2: Process of examining large data sets containing a variety
of data types
Uncover hidden patterns, correlations, verify or
disprove existing models or theories for better
business decisions making
© INTAN 2017
25
Overview of BDA Process
Semi
structured
data
Structured
data
Unstructured
data
Information
Knowledge/insight
Value/Wisdom
© INTAN 2017
Comparison: Traditional & Big Data Analytics
Traditional Analytics
Big Data Analytics (BDA)
Structured data
Structured,
semi/unstructured data
Relational data model
Various data model with
no relation
Statistical methods
Advanced analytics
Limited value
High value
© INTAN 2017
27
Types of Analytics
Descriptive
• Past data
• Tell you what
has
happened?
• Simplest
analytics
Diagnostic
• Answer why it
happen
• Tell you what and
why it happened
• understand the
causes of events
and behaviors
Predictive
• Answer what,
why and when
it will happen
• Forecast what
might/could
happen in
future
Prescriptive
• Answer what,
when and
how to make it
happen
• Recommend
best course of
actions
© INTAN 2017
28
Predictive Analytics
• prediction of future probabilities and trends.
• predictor, a variable that can be measured for
an individual or other entity to predict future
behavior.
Predictive Analytics use statistical
models and forecasts techniques to
understand the future and answer
“What could happen?”
29
© INTAN 2017
Prescriptive Analytics
Prescriptive Analytics extends beyond predictive
analytics by specifying both the actions necessary
to achieve predicted outcomes, and the interrelated
effects of each decision
Prescriptive Analytics use optimization
and simulation algorithms to advice on
possible outcomes and answer “What
should we do?”
30
© INTAN 2017
3 Phases of Prescriptive Analytics
31
© INTAN 2017
© INTAN 2017
32
BDA: Malaysia’s Case Study
Objective
Benefit
• Understand
traveler habit and
shopper behavior
• Marketing effort
to enhance Malaysia Airports’ retailer
management system within KLIA and provide
value-added services for travelers
Challenge
• 400,000 square foot containing retail outlets
at various locations
• Accuracy of information gathered
• A precise method to track spending trends
Solution
• Install sensors (IoT devices for data collection)
• Mobile apps track customers basic
demographic
• Develop BI platform to show dashboard
reporting to clients
© INTAN 2017
DATA SCIENCE
© INTAN 2017
DATA PROFESSIONALS
The roles of data professionals can be split into:
Data Scientists: People who provide valuable insights from data to the business units and
management. Able to translate data into business story
Data Modellers: People who models the available data
Data Analysts: People who analyses huge amount of data available
Data Miners: People who work with mining and processing of raw data for analysis
The demand for data scientists is expected to grow the fastest at 66.7% (CAGR)
IDC 2015
CAGR = Compound Annual Growth Rate
© INTAN 2017
Data Science
“Data science is the study of
where information comes from,
what it represents and how it
can be turned into a valuable
resource in the creation of
business and IT strategies.”
Source: Wikipedia
Source: IBM
© INTAN 2017
Skillset
• Komunikasi
• Kreativiti dan
inovasi
• Kolaborasi
• Pengaturcaraan
• Gudang data (data
warehouse)
Pemprosesan
data
Insaniah
• Integrasi data
• Kualiti data
• Pembersihan data
Statistik
Saintis
Data
(Data
Scientist)
Technical
• Kemahiran Pelajaran Teras
Domain
• Pengetahuan perkhidmatan atau domain
tertentu
Analisis
dan
Model
• Matematik statistik
• Analisis dan model statistik
• Pengujian statistik
• Pemprosesan Bahasa
Semula Jadi/Natural
Language Processing
• Pembelajaran mesin
(Machine Language)
• Model ramalan (prediction
model)
• Visualisasi data
© INTAN 2017
37
Data Science Process
© INTAN 2017
38
Methodology
Project is
monitored for its
effectiveness,
stability and
capacity with
regards to business
requirements
identifying stakeholders, understanding the
business operations and needs, and
identifying opportunities from existing and
new data that can benefit the business
defining and documenting the
scope of work, business
requirements , user requirements
and system requirements of the
project
Product is evaluated against the
business requirements, and then
rolled out into the production
environment with access to more
data
development of Data Product,
i.e dashboard visualization
reporting software or a more
complex data driven
application
-
-
development of data model
and analysis algorithms to
process data to produce results
needed by the business
acquiring and
exploring available
data
Identifying:
- data cleansing
needs
- opportunities for
data enrichment
- analysis that can
be done with the
available data
© INTAN 2017
© INTAN 2017
40
© INTAN 2017
41
© INTAN 2017
42
© INTAN 2017
43
© INTAN 2017
44
© INTAN 2017
45
© INTAN 2017
46
Key Roles for a Successful Analytics Project
understands
the domain
area
provides
requirements
creates DB
environment
Technical skill
Ensures
meeting
objectives
Business
domain
expertise
Analytic
technique and
modelling
© INTAN 2017
© INTAN 2017
Key output from each main shareholders
determine the benefits and implications of the findings to
Business User
the business
Project Sponsor
Project Manager
BI Analyst
DE and DBA
Data Scientist
questions related to the business impact of the project, the
risks and return on investment (ROI), how the project can be
implemented within the organization (and beyond)
determine if the project completion within planned time and
budget and how well the goals were met
needs to know if the reports and dashboards will be impacted
and need to change
typically need to share their code from the analytics project
and create a technical document on how to implement it
needs to share the code and explain the model to her peers,
managers, and other stakeholders
© INTAN 2017
PELAKSANAAN DATA RAYA
SEKTOR AWAM
© INTAN 2017
Kandungan
Punca Kuasa/Mandat
Rangka Kerja analitis Data Raya Sektor Awam (aDRSA)
Pelaksanaan analitis Data Raya Sektor Awam (aDRSA)
Kes Bisnes aDRSA
Faedah aDRSA
CSF
Hala Tuju
© INTAN 2017
Punca Kuasa Pelaksanaan DRSA
Mesyuarat Majlis Pelaksanaan MSC Malaysia (ICM)
Bilangan 25 (14 November 2013)
“....the Communications and Multimedia Ministry with the
support of the Malaysian Administrative Modernisation
and Management Planning Unit and MDeC will jointly
implement four government initiated Big Data
Analytics (BDA) pilot projects by 2015 to drive ICT
services.”
Prime Minister of Malaysia
Mesyuarat Majlis Pelaksanaan MSC Malaysia (ICM)
Bilangan 26 (22 Oktober 2014)
Bersetuju supaya pelaksanaan BDA memberi tumpuan
kepada 3 imperatif iaitu Kemahiran, Centre of
Excellence (CoE) dan Data Terbuka.
MAMPU, MDEC dan MIMOS diminta melaksanakan
BDA Digital Government Lab (BDA DGLab) bagi
melaksana keputusan mesyuarat ini.
The Result :
1. Ministry of Multimedia and Communication Malaysia will
develop the skeleton BIG DATA
2. MAMPU and MDec will collaborate to implement the
strategies
3. MDec will start initiatives
Mesyuarat Jawatankuasa IT dan Internet Kerajaan
(JITIK) Bil. 2 Tahun 2014, 7 November 2014
bersetuju bagi strategi pelaksanaan DRSA iaitu:
2
1
1.
2.
3.
4.
Tadbir Urus
Strategi Pelaksanaan
Metodologi Pelaksanaan
Garis Panduan
© INTAN 2017
52
Rangka Kerja Data Raya Sektor Awam
53
© INTAN 2017
PROGRAM BERPACUKAN DATA SEKTOR AWAM
2
1
9
Tadbir Urus
Rangka Kerja
Data Terbuka, Perkongsian Data,
Klasifikasi Data
3
8
Metodologi
Khidmat Perundingan
7
4
Pembangunan Kompetensi
Platfom
5
6
Garis Panduan
Program
Inovasi
14
© INTAN 2017
INISIATIF DATA RAYA SEKTOR AWAM YANG TELAH
DILAKSANAKAN
Eksplorasi Analitis Data Raya
1. Transfomasi Perkhidmatan Optimasi Data Kerajaan (Goverment
Data Optimisation Transformation Services (GDOTS)
* PoC: 3 bulan (1 Okt 2015 hingga 31 Disember 2015 )
* Projek: 12 bulan (dicadangkan pada April 2017 hingga Mac
2018)
2. BDADigital Government Open Innovation Network
(BDAGDOIN)
* 29 Jan 2015 hingga 28 Jan 2016
3. Projek Rintis Analitis Data Raya Sektor Awam (DRSA)
* 10 Mac 2015 hingga 9 Mac 2016
4. Projek Peluasan Analitis DRSA
* 23 Nov 2016 hingga 22 Nov 2017
6
© INTAN 2017
TRANSFOMASI PERKHIDMATAN OPTIMASI DATA KERAJAAN
3 bulan (1 Okt 2015 hingga 31 Disember 2015 )
1
• GDOTS Proof Of Concept (POC) dilaksanakan pada
tahun 2015 menggunakan perkhidmatan analitis data
pihak ketiga
• Kolaboratif Strategik MAMPU bersama KPDNKK, MOA,
LKIM, FAMA, MOF, DOSM bagi kes bisnes Price of
Goods
• Memaparkan trend harga barangan mengikut cuaca
(hujan), pelaksanaan GST, musim perayaan, kenaikan
harga petrol dan kenaikan harga tol
%"Peningkatan/Penurunan"(Rantaian"Bekalan)"3"Selangor"
%"Peningkatan/Penurunan"(Rantaian"Bekalan)"3"Kedah"
1000.00%$
250.00%$
900.00%$
800.00%$
200.00%$
700.00%$
600.00%$
Pendaratan$
500.00%$
400.00%$
• Projek GDOTS dicadangkan pada Mei 2017 hingga
Mac 2018 bagi membangunkan empat (4) kes bisnes
dengan memberi fokus kepada golongan miskin bandar
(urban poor)
• Menghasilkan analisis atau laporan dalam mengenal pasti
punca perubahan harga
• Kolaboratif Strategik MAMPU bersama KPDNKK, MOA,
LKIM, FAMA, MOF, DOSM
Pendaratan$
Borong$
100.00%$
Runcit$
300.00%$
200.00%$
Runcit$
50.00%$
100.00%$
0.00%$
0.00%$
2012$
2013$
2014$
2015$
2012$
%"Peningkatan/Penurunan"(Rantaian"Bekalan)"3"Pahang"
160.00%$
160.00%$
140.00%$
Pendaratan$
100.00%$
80.00%$
Borong$
60.00%$
Runcit$
20.00%$
0.00%$
2014$
2015$
120.00%$
120.00%$
40.00%$
2013$
%"Peningkatan/Penurunan"(Rantaian"Bekalan)"3"Johor"
180.00%$
140.00%$
12 bulan (dicadangkan pada Mei 2017 hingga
Mac 2018)
150.00%$
Borong$
BIL. AKTVITI
1 Commercial Related
- Letter of Award
- Contract Management
2012$
2013$
2 Project
Team Mobilization
- Project Inception & Governance
- Team Mobilization
Mesyuarat J/K Teknikal dan
- Pemandu - Bayaran 1
3 Project Implementation
- Kick Off Meeting
- Project Management
- Development of Use Cases
- Step 1 - 3
Mesyuarat J/K Teknikal dan
- Pemandu - Bayaran 2
- Step 4 - 5
Mesyuarat J/K Teknikal dan
- Pemandu - Bayaran 3
100.00%$
Pendaratan$
80.00%$
Borong$
60.00%$
Runcit$
BULAN 1
BULAN 2
BULAN 3
BULAN 4
BULAN
5
BULAN 6
BULAN 7
BULAN 8
BULAN 9
BULAN 10
BULAN 11
BULAN 12
40.00%$
M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4
20.00%$
LOA
0.00%$
2014$
2015$
2012$
2013$
2014$
Payment Milestone 10%
Kick-Off
Payment Milestone 10%
Payment Milestone 40%
- Step 6
Mesyuarat J/K Teknikal dan
- Pemandu - Bayaran 4
- Step 7
- Insights Reporting
Mesyuarat J/K Teknikal dan
- Pemandu - Bayaran 5
4 Project Closure and Sign-Off
2015$
Payment Milestone 20%
7
56
Payment Milestone 20%
© INTAN 2017
Project Closure
•
Dilaksanakan
secara Proof
Of Concept
(POC)
• Kolaboratif
Strategik
MAMPU
MDEC
MIMOS
bersama
MOF, JAKIM,
JPS dan
NAHRIM (5
kes bisnes)
PROJEK BDAOPEN INNOVATION NETWORK
(BDADGOIN)
Meninjau pelampau islam
di kalangan rakyat
Malaysia
2
1
2
Analitik data bagi
menganalisis dan membina
Model Ekonomi Fiskal
Pengurusan
5
Membangunkan pangkalan
pengetahuan banjir
berdasarkan gabungan data
sensor dan media sosial
Pemudah cara
3
Analisis Sentimen
Kos Sara Hidup yang
diperolehi melalui
Media Sosial
Teknologi &
Platform
Mendapatkan unjuran 90
tahun taburan hujan selaras
dengan kesan limpahan di
tebing sungai dalam peta
Malaysia
4
8
© INTAN 2017
57
PROJEK RINTIS ANALITIS DATA RAYA
SEKTOR AWAM (DRSA)
Platform di PDSA
dalam 1Gov*Net
2
Rangka
kerja
Tadbir
urus
3
1
3
5
4
Metodologi
Garis Panduan
Pembangunan Empat Analitis
Pencegahan
Jenayah
Pemantauan
Harga
Ramalan
Penyakit
Berjangkit
Analisis
Sentimen
9
© INTAN 2017
PROJEK PELUASAN ANALITIS DATA RAYA
SEKTOR AWAM (DRSA)
• Pembangunan produk data secara
coaching
oleh
Syarikat dan
MAMPU dengan agensi terpilih.
• Mengikut
metodologi
DRSA
dan
Data
Analytic
Project
Lifecycle meliputi handson training
bagi
selfdevelopment
dalam
pembangunan produk data/BDA
• Pembangunan produk data melalui
aktiviti pengumpulan, pembersihan
dan
eksplorasi
data,
membangunkan model analisis,
prediktif
dan
machine
learning menggunakan analytics
tool R Studio.
• Tempoh Pelaksanaan: 12 Bulan
(23 Nov 2016 22 Nov 2017)
4
No.
Kementerian/Agensi
Business Case
1. Kementerian Kewangan Malaysia
Pemantauan Media Sosial Berkaitan
(MOF)
Kementerian Kewangan
2. Kementerian Sumber Manusia (KSM) Meningkatkan Kebolehpasaran Pekerjaan
Kepada Pencari Kerja
3. Suruhanjaya Perkhidmatan Awam
Seamless Job Recruitment
(SPA)
4. Kementerian Pengangkutan Malaysia Menjadikan Pelabuhan Klang Lebih
(MOT)
Kompetitif dan Efisien
5. Kementerian Pendidikan Malaysia
Penyelesaian Isu Keciciran Murid daripada
(MOE)
Sistem Pendidikan Malaysia
6. Jabatan Perikanan Malaysia (DOF)
Pemilihan Kawasan Akuakultur
7. Institut Penyelidikan dan Kemajuan
Meningkatkan Produktiviti dan Kualiti Padi
Pertanian Malaysia (MARDI)
8. Kementerian Tenaga, Teknologi Hijau Tahap Penggunaan Air Domestik Yang
dan Air (KeTTHA)
Tinggi di Malaysia
9. Kementerian Perdagangan
Pengurusan Permasalahan Industri
Antarabangsa dan Industri (MITI)
Pengeluaran Halal
10. Jabatan Audit Negara
Penemuan Audit (Kewangan)
11. MAMPU
Sentimen Analisis – Patriotism “Negaraku”
12. Bahagian Penyelidikan, JPM
Sulit
19
© INTAN 2017
Kes Bisnes Analitis Data Raya
Ramalan Wabak Penyakit
Ramalan dan Pencegahan Jenayah
Maklumat Pintar Kesesakan Jalan Raya
Pengesanan Penipuan Cukai
Ramalan Bencana atau Cuaca
Keselamatan Siber
Pertahanan Negara
Farmasi dan Ubat
Ekonomi dan Kewangan
© INTAN 2017
60
Hala Tuju
Bidang fokus A :
Mempertingkatkan
penyampaian
perkhidmatan dengan
mengutamakan rakyat
© INTAN 2017
61
© INTAN 2017
62
Faedah Analitis Data Raya
Membuat
keputusan yang
lebih baik
Perancangan
strategik yang
lebih baik
Hubungan yang
lebih baik
dengan pelanggan
Pengesanan risiko
yang lebih
berkesan
Prestasi
kewangan yang
lebih baik
© INTAN 2017
63
Faktor Kejayaan Kritikal
Komitmen tinggi Subject Matter Expert (SME)
daripada setiap domain/kluster
Pengetahuan dan kemahiran dalam Sains Data
Sokongan padu pengurusan atasan agensi
Ketersediaan data
4
Program pengurusan perubahan
Tadbir urus yang mantap
27
© INTAN 2017
DATA TERBUKA SEKTOR AWAM
(DTSA)
© INTAN 2017
Contents
What is Open Data
Data Terbuka Sektor Awam
–
–
–
–
–
Mandat
Tadbir Urus
Ekosistem DTSA
Rangka Kerja DTSA
Isu dan Cabaran
© INTAN 2017
Definition
Publicly available data that can be universally
and readily accessed, used, and redistributed
free of charge
It is structured for usability and computability
The Global impact of Open Data; Andrew Young and Stefan Verhulst; O’Reilly Media Inc;
2016
© INTAN 2017
Definition
Data terbuka merujuk data kerajaan yang boleh digunakan secara bebas,
boleh dikongsikan dan digunakan semula oleh rakyat, agensi sektor awam
atau swasta untuk sebarang tujuan
Data Sharing Government:
G2G, G2B, G2C
Example:
List of schools, mosques and
village clinics
data.gov.my
Pekeliling 1/2015 : Pelaksanaan Data Terbuka Sektor Awam, MAMPU
© INTAN 2017
Mandat
MESYUARAT JAWATANKUASA IT DAN INTERNET KERAJAAN (JITIK)
BIL.1 TAHUN 2014 PADA 28 MAC 2014
BERSETUJU:
Semua agensi disarankan supaya bersedia dan mengambil tindakan
mengenal pasti inisiatif big data analytic dan data set
bagi
pelaksanaan open data dalam setiap perkhidmatan teras agensi.
69
© INTAN 2017
70
© INTAN 2017
© INTAN 2017
71
Tadbir Urus
Jawatankuasa Penyelarasan
Data Terbuka Sektor Awam
(i) Menentukan hala tuju dan strategi
data terbuka sektor awam
(ii) Memantau status pelaksanaan data
terbuka sektor awam
(iii) Memantau tahap penggunaan data
terbuka sektor awam
(iv) Memainkan peranan sebagai
penasihat dalam membincangkan dasar
dan isuisu semasa berkaitan data
terbuka sektor awam
© INTAN 2017
Tadbir Urus
(i) Menyediakan dan melaksanakan
pelan pelaksanaan data terbuka
sektor awam.
(ii) Menyediakan platform penerbitan
set data terbuka yang selamat.
(iii) Menyediakan mekanisme dan
tatacara penerbitan data terbuka oleh
agensi di Portal Data Terbuka Sektor
Awam.
(iv) Mengkaji dan mengenal pasti set
data yang berpotensi.
Pasukan Kerja Data
Terbuka Sektor
Awam
(v) Memberikan khidmat nasihat
kepada agensi berhubung dengan
pelaksanaan data terbuka.
© INTAN 2017
Tadbir Urus
(i) Merangka strategi dan pelan
pelaksanaan data terbuka pada
peringkat Kementerian/Pejabat
Setiausaha Kerajaan Negeri/ Agensi.
Jawatankuasa Penyelarasan
Data Terbuka
Kementerian/SUK/Agensi
(ii) Menubuhkan pasukan kerja untuk
melaksanakan tugas/ aktiviti data
terbuka.
(iii) Meluluskan set data bagi data
terbuka.
(iv) Memantau tahap penggunaan
data terbuka.
(v) Memastikan keperluan dasar dan
sasaran yang dikenal pasti dipatuhi
dan tercapai.
© INTAN 2017
Tadbir Urus
(i) Mengkaji dan mengenal pasti set
data.
(ii) Mendapatkan kelulusan set data
bagi data terbuka.
(iii) Menyediakan dan menerbitkan
meta data.
Pasukan Kerja Data
Terbuka
Kementerian/SUK/Agensi
(iv) Memastikan set data yang
diluluskan bagi data terbuka dimuat
naik ke laman web agensi dan Portal
DTSA.
(v) Mengkaji tahap penggunaan dan
data terbuka.
© INTAN 2017
Isu dan Cabaran
Pelaksanaan memerlukan
kerjasama pelbagai pihak
dari sektor awam, swasta,
komuniti dan rakyat
Pelaksanaan merentas
pelbagai bidang dan
kluster termasuk
perundangan, polisi,
sosial, ekonomi dan
organisasi
Isu keselamatan data
Ranking Malaysia dalam
penilaian Open Data
Barometer
© INTAN 2017
76
Terima kasih
© INTAN 2017
77
Syahrimi binti Hasbullah
Perunding Latihan Kanan
Unit Pengurusan Perkhidmatan Data
Sub Kluster Pembangunan Kepakaran ICT
Kluster iIMATEC, INTAN
Contents
Introduction to Big Data
Introduction to Big Data Analytics
Data Science
Public Sector Data-Driven Initiative
Slaid # 2
Big Data: Definisi
exponential growth
availability of data
structured and unstructured
Characteristics: 4V? 5V? 7V?
And big data may be as important to business – and society – as
the Internet has become. Why? More data may lead to more
accurate analyses.
Slaid # 5
10V’s of Big Data Characteristics
Slaid # 6
Evolution of
Big Data
7
Challenge?
“highvolume, velocity and
variety information assets
that demand costeffective,
innovative forms of
information processing for
enhanced insight and
decision making” Gartner
Big Data Ecosystem
Data Sources/
Advanced Data
Management
Advanced Data
Analytics
Data Presentation/
Business Intelligence
9
© INTAN 2017
Technology and Tools
© INTAN 2017
10
Who Generate Data?
Organization
Human
Machine
Data Lake
© INTAN 2017
Data Types
Semi
Structured
Data
Unstructured
Data
Structured Data
Data Types
© INTAN 2017
How does Big Data Look Like?
Partial Tweet in JSON format
Web log
Spatial data
Machinegenerated data
© INTAN 2017
Data Warehouse
A large store of data accumulated from a wide range of
sources within a company and used to guide management
decisions
© INTAN 2017
Data Lake
• Keep data in original raw and unmodelled format
• limited amount of “species”
• constrained by its size
• smaller set of data is analyzed in more detail to
help answer the question
© INTAN 2017
Data Ocean
• collection of unmodelled data from the entire business, from
every possible area
• kept in a single repository
• The size of these oceans is vast
• improvements in analytics technology
• easier to “fish” for whatever data you need
© INTAN 2017
Big Data Usage
© INTAN 2017
18
KAJIAN KES:
© INTAN 2017
19
UMUR
JANTINA
1829
3049
LELAKI
PEREMPUAN
5064
65+
LOKASI
URBAN
SUBURBAN
RURAL
© INTAN 2017
https://www.simplilearn.com/howfacebookisusingbigdataarticle
© INTAN 2017
Analyzing the
‘Likes’
Tracking cookies
Facial recognition
Tag suggestion
© INTAN 2017
Introduction to Big Data Analytics
and Data Science
© INTAN 2017
23
Contents
What is Big Data Analytics (BDA)?
Overview of BDA Process
Traditional Approach vs Big Data Analytics
Types of Analytics
Data Science
Data Scientist
Methodology
© INTAN 2017
24
What is Big Data Analytics (BDA)?
Definition 1: Science of examining raw data with the purpose of
drawing conclusions about that information
Definition 2: Process of examining large data sets containing a variety
of data types
Uncover hidden patterns, correlations, verify or
disprove existing models or theories for better
business decisions making
© INTAN 2017
25
Overview of BDA Process
Semi
structured
data
Structured
data
Unstructured
data
Information
Knowledge/insight
Value/Wisdom
© INTAN 2017
Comparison: Traditional & Big Data Analytics
Traditional Analytics
Big Data Analytics (BDA)
Structured data
Structured,
semi/unstructured data
Relational data model
Various data model with
no relation
Statistical methods
Advanced analytics
Limited value
High value
© INTAN 2017
27
Types of Analytics
Descriptive
• Past data
• Tell you what
has
happened?
• Simplest
analytics
Diagnostic
• Answer why it
happen
• Tell you what and
why it happened
• understand the
causes of events
and behaviors
Predictive
• Answer what,
why and when
it will happen
• Forecast what
might/could
happen in
future
Prescriptive
• Answer what,
when and
how to make it
happen
• Recommend
best course of
actions
© INTAN 2017
28
Predictive Analytics
• prediction of future probabilities and trends.
• predictor, a variable that can be measured for
an individual or other entity to predict future
behavior.
Predictive Analytics use statistical
models and forecasts techniques to
understand the future and answer
“What could happen?”
29
© INTAN 2017
Prescriptive Analytics
Prescriptive Analytics extends beyond predictive
analytics by specifying both the actions necessary
to achieve predicted outcomes, and the interrelated
effects of each decision
Prescriptive Analytics use optimization
and simulation algorithms to advice on
possible outcomes and answer “What
should we do?”
30
© INTAN 2017
3 Phases of Prescriptive Analytics
31
© INTAN 2017
© INTAN 2017
32
BDA: Malaysia’s Case Study
Objective
Benefit
• Understand
traveler habit and
shopper behavior
• Marketing effort
to enhance Malaysia Airports’ retailer
management system within KLIA and provide
value-added services for travelers
Challenge
• 400,000 square foot containing retail outlets
at various locations
• Accuracy of information gathered
• A precise method to track spending trends
Solution
• Install sensors (IoT devices for data collection)
• Mobile apps track customers basic
demographic
• Develop BI platform to show dashboard
reporting to clients
© INTAN 2017
DATA SCIENCE
© INTAN 2017
DATA PROFESSIONALS
The roles of data professionals can be split into:
Data Scientists: People who provide valuable insights from data to the business units and
management. Able to translate data into business story
Data Modellers: People who models the available data
Data Analysts: People who analyses huge amount of data available
Data Miners: People who work with mining and processing of raw data for analysis
The demand for data scientists is expected to grow the fastest at 66.7% (CAGR)
IDC 2015
CAGR = Compound Annual Growth Rate
© INTAN 2017
Data Science
“Data science is the study of
where information comes from,
what it represents and how it
can be turned into a valuable
resource in the creation of
business and IT strategies.”
Source: Wikipedia
Source: IBM
© INTAN 2017
Skillset
• Komunikasi
• Kreativiti dan
inovasi
• Kolaborasi
• Pengaturcaraan
• Gudang data (data
warehouse)
Pemprosesan
data
Insaniah
• Integrasi data
• Kualiti data
• Pembersihan data
Statistik
Saintis
Data
(Data
Scientist)
Technical
• Kemahiran Pelajaran Teras
Domain
• Pengetahuan perkhidmatan atau domain
tertentu
Analisis
dan
Model
• Matematik statistik
• Analisis dan model statistik
• Pengujian statistik
• Pemprosesan Bahasa
Semula Jadi/Natural
Language Processing
• Pembelajaran mesin
(Machine Language)
• Model ramalan (prediction
model)
• Visualisasi data
© INTAN 2017
37
Data Science Process
© INTAN 2017
38
Methodology
Project is
monitored for its
effectiveness,
stability and
capacity with
regards to business
requirements
identifying stakeholders, understanding the
business operations and needs, and
identifying opportunities from existing and
new data that can benefit the business
defining and documenting the
scope of work, business
requirements , user requirements
and system requirements of the
project
Product is evaluated against the
business requirements, and then
rolled out into the production
environment with access to more
data
development of Data Product,
i.e dashboard visualization
reporting software or a more
complex data driven
application
-
-
development of data model
and analysis algorithms to
process data to produce results
needed by the business
acquiring and
exploring available
data
Identifying:
- data cleansing
needs
- opportunities for
data enrichment
- analysis that can
be done with the
available data
© INTAN 2017
© INTAN 2017
40
© INTAN 2017
41
© INTAN 2017
42
© INTAN 2017
43
© INTAN 2017
44
© INTAN 2017
45
© INTAN 2017
46
Key Roles for a Successful Analytics Project
understands
the domain
area
provides
requirements
creates DB
environment
Technical skill
Ensures
meeting
objectives
Business
domain
expertise
Analytic
technique and
modelling
© INTAN 2017
© INTAN 2017
Key output from each main shareholders
determine the benefits and implications of the findings to
Business User
the business
Project Sponsor
Project Manager
BI Analyst
DE and DBA
Data Scientist
questions related to the business impact of the project, the
risks and return on investment (ROI), how the project can be
implemented within the organization (and beyond)
determine if the project completion within planned time and
budget and how well the goals were met
needs to know if the reports and dashboards will be impacted
and need to change
typically need to share their code from the analytics project
and create a technical document on how to implement it
needs to share the code and explain the model to her peers,
managers, and other stakeholders
© INTAN 2017
PELAKSANAAN DATA RAYA
SEKTOR AWAM
© INTAN 2017
Kandungan
Punca Kuasa/Mandat
Rangka Kerja analitis Data Raya Sektor Awam (aDRSA)
Pelaksanaan analitis Data Raya Sektor Awam (aDRSA)
Kes Bisnes aDRSA
Faedah aDRSA
CSF
Hala Tuju
© INTAN 2017
Punca Kuasa Pelaksanaan DRSA
Mesyuarat Majlis Pelaksanaan MSC Malaysia (ICM)
Bilangan 25 (14 November 2013)
“....the Communications and Multimedia Ministry with the
support of the Malaysian Administrative Modernisation
and Management Planning Unit and MDeC will jointly
implement four government initiated Big Data
Analytics (BDA) pilot projects by 2015 to drive ICT
services.”
Prime Minister of Malaysia
Mesyuarat Majlis Pelaksanaan MSC Malaysia (ICM)
Bilangan 26 (22 Oktober 2014)
Bersetuju supaya pelaksanaan BDA memberi tumpuan
kepada 3 imperatif iaitu Kemahiran, Centre of
Excellence (CoE) dan Data Terbuka.
MAMPU, MDEC dan MIMOS diminta melaksanakan
BDA Digital Government Lab (BDA DGLab) bagi
melaksana keputusan mesyuarat ini.
The Result :
1. Ministry of Multimedia and Communication Malaysia will
develop the skeleton BIG DATA
2. MAMPU and MDec will collaborate to implement the
strategies
3. MDec will start initiatives
Mesyuarat Jawatankuasa IT dan Internet Kerajaan
(JITIK) Bil. 2 Tahun 2014, 7 November 2014
bersetuju bagi strategi pelaksanaan DRSA iaitu:
2
1
1.
2.
3.
4.
Tadbir Urus
Strategi Pelaksanaan
Metodologi Pelaksanaan
Garis Panduan
© INTAN 2017
52
Rangka Kerja Data Raya Sektor Awam
53
© INTAN 2017
PROGRAM BERPACUKAN DATA SEKTOR AWAM
2
1
9
Tadbir Urus
Rangka Kerja
Data Terbuka, Perkongsian Data,
Klasifikasi Data
3
8
Metodologi
Khidmat Perundingan
7
4
Pembangunan Kompetensi
Platfom
5
6
Garis Panduan
Program
Inovasi
14
© INTAN 2017
INISIATIF DATA RAYA SEKTOR AWAM YANG TELAH
DILAKSANAKAN
Eksplorasi Analitis Data Raya
1. Transfomasi Perkhidmatan Optimasi Data Kerajaan (Goverment
Data Optimisation Transformation Services (GDOTS)
* PoC: 3 bulan (1 Okt 2015 hingga 31 Disember 2015 )
* Projek: 12 bulan (dicadangkan pada April 2017 hingga Mac
2018)
2. BDADigital Government Open Innovation Network
(BDAGDOIN)
* 29 Jan 2015 hingga 28 Jan 2016
3. Projek Rintis Analitis Data Raya Sektor Awam (DRSA)
* 10 Mac 2015 hingga 9 Mac 2016
4. Projek Peluasan Analitis DRSA
* 23 Nov 2016 hingga 22 Nov 2017
6
© INTAN 2017
TRANSFOMASI PERKHIDMATAN OPTIMASI DATA KERAJAAN
3 bulan (1 Okt 2015 hingga 31 Disember 2015 )
1
• GDOTS Proof Of Concept (POC) dilaksanakan pada
tahun 2015 menggunakan perkhidmatan analitis data
pihak ketiga
• Kolaboratif Strategik MAMPU bersama KPDNKK, MOA,
LKIM, FAMA, MOF, DOSM bagi kes bisnes Price of
Goods
• Memaparkan trend harga barangan mengikut cuaca
(hujan), pelaksanaan GST, musim perayaan, kenaikan
harga petrol dan kenaikan harga tol
%"Peningkatan/Penurunan"(Rantaian"Bekalan)"3"Selangor"
%"Peningkatan/Penurunan"(Rantaian"Bekalan)"3"Kedah"
1000.00%$
250.00%$
900.00%$
800.00%$
200.00%$
700.00%$
600.00%$
Pendaratan$
500.00%$
400.00%$
• Projek GDOTS dicadangkan pada Mei 2017 hingga
Mac 2018 bagi membangunkan empat (4) kes bisnes
dengan memberi fokus kepada golongan miskin bandar
(urban poor)
• Menghasilkan analisis atau laporan dalam mengenal pasti
punca perubahan harga
• Kolaboratif Strategik MAMPU bersama KPDNKK, MOA,
LKIM, FAMA, MOF, DOSM
Pendaratan$
Borong$
100.00%$
Runcit$
300.00%$
200.00%$
Runcit$
50.00%$
100.00%$
0.00%$
0.00%$
2012$
2013$
2014$
2015$
2012$
%"Peningkatan/Penurunan"(Rantaian"Bekalan)"3"Pahang"
160.00%$
160.00%$
140.00%$
Pendaratan$
100.00%$
80.00%$
Borong$
60.00%$
Runcit$
20.00%$
0.00%$
2014$
2015$
120.00%$
120.00%$
40.00%$
2013$
%"Peningkatan/Penurunan"(Rantaian"Bekalan)"3"Johor"
180.00%$
140.00%$
12 bulan (dicadangkan pada Mei 2017 hingga
Mac 2018)
150.00%$
Borong$
BIL. AKTVITI
1 Commercial Related
- Letter of Award
- Contract Management
2012$
2013$
2 Project
Team Mobilization
- Project Inception & Governance
- Team Mobilization
Mesyuarat J/K Teknikal dan
- Pemandu - Bayaran 1
3 Project Implementation
- Kick Off Meeting
- Project Management
- Development of Use Cases
- Step 1 - 3
Mesyuarat J/K Teknikal dan
- Pemandu - Bayaran 2
- Step 4 - 5
Mesyuarat J/K Teknikal dan
- Pemandu - Bayaran 3
100.00%$
Pendaratan$
80.00%$
Borong$
60.00%$
Runcit$
BULAN 1
BULAN 2
BULAN 3
BULAN 4
BULAN
5
BULAN 6
BULAN 7
BULAN 8
BULAN 9
BULAN 10
BULAN 11
BULAN 12
40.00%$
M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4
20.00%$
LOA
0.00%$
2014$
2015$
2012$
2013$
2014$
Payment Milestone 10%
Kick-Off
Payment Milestone 10%
Payment Milestone 40%
- Step 6
Mesyuarat J/K Teknikal dan
- Pemandu - Bayaran 4
- Step 7
- Insights Reporting
Mesyuarat J/K Teknikal dan
- Pemandu - Bayaran 5
4 Project Closure and Sign-Off
2015$
Payment Milestone 20%
7
56
Payment Milestone 20%
© INTAN 2017
Project Closure
•
Dilaksanakan
secara Proof
Of Concept
(POC)
• Kolaboratif
Strategik
MAMPU
MDEC
MIMOS
bersama
MOF, JAKIM,
JPS dan
NAHRIM (5
kes bisnes)
PROJEK BDAOPEN INNOVATION NETWORK
(BDADGOIN)
Meninjau pelampau islam
di kalangan rakyat
Malaysia
2
1
2
Analitik data bagi
menganalisis dan membina
Model Ekonomi Fiskal
Pengurusan
5
Membangunkan pangkalan
pengetahuan banjir
berdasarkan gabungan data
sensor dan media sosial
Pemudah cara
3
Analisis Sentimen
Kos Sara Hidup yang
diperolehi melalui
Media Sosial
Teknologi &
Platform
Mendapatkan unjuran 90
tahun taburan hujan selaras
dengan kesan limpahan di
tebing sungai dalam peta
Malaysia
4
8
© INTAN 2017
57
PROJEK RINTIS ANALITIS DATA RAYA
SEKTOR AWAM (DRSA)
Platform di PDSA
dalam 1Gov*Net
2
Rangka
kerja
Tadbir
urus
3
1
3
5
4
Metodologi
Garis Panduan
Pembangunan Empat Analitis
Pencegahan
Jenayah
Pemantauan
Harga
Ramalan
Penyakit
Berjangkit
Analisis
Sentimen
9
© INTAN 2017
PROJEK PELUASAN ANALITIS DATA RAYA
SEKTOR AWAM (DRSA)
• Pembangunan produk data secara
coaching
oleh
Syarikat dan
MAMPU dengan agensi terpilih.
• Mengikut
metodologi
DRSA
dan
Data
Analytic
Project
Lifecycle meliputi handson training
bagi
selfdevelopment
dalam
pembangunan produk data/BDA
• Pembangunan produk data melalui
aktiviti pengumpulan, pembersihan
dan
eksplorasi
data,
membangunkan model analisis,
prediktif
dan
machine
learning menggunakan analytics
tool R Studio.
• Tempoh Pelaksanaan: 12 Bulan
(23 Nov 2016 22 Nov 2017)
4
No.
Kementerian/Agensi
Business Case
1. Kementerian Kewangan Malaysia
Pemantauan Media Sosial Berkaitan
(MOF)
Kementerian Kewangan
2. Kementerian Sumber Manusia (KSM) Meningkatkan Kebolehpasaran Pekerjaan
Kepada Pencari Kerja
3. Suruhanjaya Perkhidmatan Awam
Seamless Job Recruitment
(SPA)
4. Kementerian Pengangkutan Malaysia Menjadikan Pelabuhan Klang Lebih
(MOT)
Kompetitif dan Efisien
5. Kementerian Pendidikan Malaysia
Penyelesaian Isu Keciciran Murid daripada
(MOE)
Sistem Pendidikan Malaysia
6. Jabatan Perikanan Malaysia (DOF)
Pemilihan Kawasan Akuakultur
7. Institut Penyelidikan dan Kemajuan
Meningkatkan Produktiviti dan Kualiti Padi
Pertanian Malaysia (MARDI)
8. Kementerian Tenaga, Teknologi Hijau Tahap Penggunaan Air Domestik Yang
dan Air (KeTTHA)
Tinggi di Malaysia
9. Kementerian Perdagangan
Pengurusan Permasalahan Industri
Antarabangsa dan Industri (MITI)
Pengeluaran Halal
10. Jabatan Audit Negara
Penemuan Audit (Kewangan)
11. MAMPU
Sentimen Analisis – Patriotism “Negaraku”
12. Bahagian Penyelidikan, JPM
Sulit
19
© INTAN 2017
Kes Bisnes Analitis Data Raya
Ramalan Wabak Penyakit
Ramalan dan Pencegahan Jenayah
Maklumat Pintar Kesesakan Jalan Raya
Pengesanan Penipuan Cukai
Ramalan Bencana atau Cuaca
Keselamatan Siber
Pertahanan Negara
Farmasi dan Ubat
Ekonomi dan Kewangan
© INTAN 2017
60
Hala Tuju
Bidang fokus A :
Mempertingkatkan
penyampaian
perkhidmatan dengan
mengutamakan rakyat
© INTAN 2017
61
© INTAN 2017
62
Faedah Analitis Data Raya
Membuat
keputusan yang
lebih baik
Perancangan
strategik yang
lebih baik
Hubungan yang
lebih baik
dengan pelanggan
Pengesanan risiko
yang lebih
berkesan
Prestasi
kewangan yang
lebih baik
© INTAN 2017
63
Faktor Kejayaan Kritikal
Komitmen tinggi Subject Matter Expert (SME)
daripada setiap domain/kluster
Pengetahuan dan kemahiran dalam Sains Data
Sokongan padu pengurusan atasan agensi
Ketersediaan data
4
Program pengurusan perubahan
Tadbir urus yang mantap
27
© INTAN 2017
DATA TERBUKA SEKTOR AWAM
(DTSA)
© INTAN 2017
Contents
What is Open Data
Data Terbuka Sektor Awam
–
–
–
–
–
Mandat
Tadbir Urus
Ekosistem DTSA
Rangka Kerja DTSA
Isu dan Cabaran
© INTAN 2017
Definition
Publicly available data that can be universally
and readily accessed, used, and redistributed
free of charge
It is structured for usability and computability
The Global impact of Open Data; Andrew Young and Stefan Verhulst; O’Reilly Media Inc;
2016
© INTAN 2017
Definition
Data terbuka merujuk data kerajaan yang boleh digunakan secara bebas,
boleh dikongsikan dan digunakan semula oleh rakyat, agensi sektor awam
atau swasta untuk sebarang tujuan
Data Sharing Government:
G2G, G2B, G2C
Example:
List of schools, mosques and
village clinics
data.gov.my
Pekeliling 1/2015 : Pelaksanaan Data Terbuka Sektor Awam, MAMPU
© INTAN 2017
Mandat
MESYUARAT JAWATANKUASA IT DAN INTERNET KERAJAAN (JITIK)
BIL.1 TAHUN 2014 PADA 28 MAC 2014
BERSETUJU:
Semua agensi disarankan supaya bersedia dan mengambil tindakan
mengenal pasti inisiatif big data analytic dan data set
bagi
pelaksanaan open data dalam setiap perkhidmatan teras agensi.
69
© INTAN 2017
70
© INTAN 2017
© INTAN 2017
71
Tadbir Urus
Jawatankuasa Penyelarasan
Data Terbuka Sektor Awam
(i) Menentukan hala tuju dan strategi
data terbuka sektor awam
(ii) Memantau status pelaksanaan data
terbuka sektor awam
(iii) Memantau tahap penggunaan data
terbuka sektor awam
(iv) Memainkan peranan sebagai
penasihat dalam membincangkan dasar
dan isuisu semasa berkaitan data
terbuka sektor awam
© INTAN 2017
Tadbir Urus
(i) Menyediakan dan melaksanakan
pelan pelaksanaan data terbuka
sektor awam.
(ii) Menyediakan platform penerbitan
set data terbuka yang selamat.
(iii) Menyediakan mekanisme dan
tatacara penerbitan data terbuka oleh
agensi di Portal Data Terbuka Sektor
Awam.
(iv) Mengkaji dan mengenal pasti set
data yang berpotensi.
Pasukan Kerja Data
Terbuka Sektor
Awam
(v) Memberikan khidmat nasihat
kepada agensi berhubung dengan
pelaksanaan data terbuka.
© INTAN 2017
Tadbir Urus
(i) Merangka strategi dan pelan
pelaksanaan data terbuka pada
peringkat Kementerian/Pejabat
Setiausaha Kerajaan Negeri/ Agensi.
Jawatankuasa Penyelarasan
Data Terbuka
Kementerian/SUK/Agensi
(ii) Menubuhkan pasukan kerja untuk
melaksanakan tugas/ aktiviti data
terbuka.
(iii) Meluluskan set data bagi data
terbuka.
(iv) Memantau tahap penggunaan
data terbuka.
(v) Memastikan keperluan dasar dan
sasaran yang dikenal pasti dipatuhi
dan tercapai.
© INTAN 2017
Tadbir Urus
(i) Mengkaji dan mengenal pasti set
data.
(ii) Mendapatkan kelulusan set data
bagi data terbuka.
(iii) Menyediakan dan menerbitkan
meta data.
Pasukan Kerja Data
Terbuka
Kementerian/SUK/Agensi
(iv) Memastikan set data yang
diluluskan bagi data terbuka dimuat
naik ke laman web agensi dan Portal
DTSA.
(v) Mengkaji tahap penggunaan dan
data terbuka.
© INTAN 2017
Isu dan Cabaran
Pelaksanaan memerlukan
kerjasama pelbagai pihak
dari sektor awam, swasta,
komuniti dan rakyat
Pelaksanaan merentas
pelbagai bidang dan
kluster termasuk
perundangan, polisi,
sosial, ekonomi dan
organisasi
Isu keselamatan data
Ranking Malaysia dalam
penilaian Open Data
Barometer
© INTAN 2017
76
Terima kasih
© INTAN 2017
77