A Fruitful Data Mining Using Orange
source machine
learning and data mining
A Fruitful Data open
Mining
Using
tool. It includes a set of components for data
ORANGE
Manu Madhavan
Assistant Professor, Dept of CSE
SIMAT, Palakkad.
This article is an introduction to Orange data
mining tool and its use with python scripts, not
in
details,
in
a
brief.
Have
a
fun..
preprocessing, feature scoring and filtering,
modeling, model evaluation, and exploration
techniques.
Here the python scripting using orange package
is discussed.
Install Orange:
To build and install Orange you can use the
setup.py in the root orange directory (requires
GCC, Python and numpy development headers).
The details will be available in the orange
documentation
page
(http://orange.biolab.si/download/).
Data Mining is the computational process of
Test the installation
discovering patterns in large data sets involving
After installing orange, type the following in
methods
python interactive shell.
at
the
intersection
of
artificial
intelligence, machine learning, statistics, and
database
systems.
This
area
has
special
>>>import Orange
applications in areas like medical science,
>>>Orange.version.version
business intelligence, and similar areas of real
'2.6a2.dev-a55510d'
life.
and
If this leaves no error and warning, Orange and
proprietary softwares tools are available for data
Python are properly installed and you are ready
mining applications and research. Orange
to continue.
There
are
many
open
source
(http://orange.biolab.si) is a general-purpose,
CLEAR December 2013
Page 25
be predicted by a learning function.
Data Input
Orange can read files in native and other data
formats. Native format starts with feature
(attribute)
names,
their
type
(continuous,
discrete, string). The third line contains meta
information to identify dependent features
Orange
have a vast variety of learning functions like,
KNN, Least Mean Square Error, Naive- Bayes,
Logistic/Linear regression, etc. The following
sample code use KNN method to predict the
class value of test data set.
(class), irrelevant features (ignore) or meta
Let the training dataset is stored in trainset. tab
features (meta).
and test dataset is stored in testset. tab. The
You may download lenses.tab to a target
directory and there open a python shell.
following script will print the class of test data.
Let the training dataset is stored in trainset. tab
and test dataset is stored in testset. tab. The
>>>import Orange
following script will print the class of test data.
>>>data=Orange.data.Table("lense
s")
>>>
>>>train=Orange.data.Table("trai
nset")
Data mining using Orange-python
>>>test=Orange.data.Table("tests
et")
The orange python can be used for all data
>>>learner=Orange.classification
.knn.kNNLearner()
mining
applications
like,
classification,
prediction, clustering and learning.
Here I am illustrating, how orange can be used
>>>classifier = learner(train)
>>>for i in range
(len(trainset)):
for classification.
...print i,classifier(test[i])
For classification, you need a training data set
and test data set. The data readable by orange
methods are stored in a .tab file (stored as
Tables). The table contains different features of
the data and class value. For training data, the
This will print the class of each vector in test
set. You can verify the result by comparing the
results with that by using other learning tools.
Hope ou got it…e jo the e peri e t…!!!!
class value will be the actual class, in which the
feature vector belongs to. In case of testing data
set, the class value will be absent, which have to
CLEAR December 2013
Page 26
learning and data mining
A Fruitful Data open
Mining
Using
tool. It includes a set of components for data
ORANGE
Manu Madhavan
Assistant Professor, Dept of CSE
SIMAT, Palakkad.
This article is an introduction to Orange data
mining tool and its use with python scripts, not
in
details,
in
a
brief.
Have
a
fun..
preprocessing, feature scoring and filtering,
modeling, model evaluation, and exploration
techniques.
Here the python scripting using orange package
is discussed.
Install Orange:
To build and install Orange you can use the
setup.py in the root orange directory (requires
GCC, Python and numpy development headers).
The details will be available in the orange
documentation
page
(http://orange.biolab.si/download/).
Data Mining is the computational process of
Test the installation
discovering patterns in large data sets involving
After installing orange, type the following in
methods
python interactive shell.
at
the
intersection
of
artificial
intelligence, machine learning, statistics, and
database
systems.
This
area
has
special
>>>import Orange
applications in areas like medical science,
>>>Orange.version.version
business intelligence, and similar areas of real
'2.6a2.dev-a55510d'
life.
and
If this leaves no error and warning, Orange and
proprietary softwares tools are available for data
Python are properly installed and you are ready
mining applications and research. Orange
to continue.
There
are
many
open
source
(http://orange.biolab.si) is a general-purpose,
CLEAR December 2013
Page 25
be predicted by a learning function.
Data Input
Orange can read files in native and other data
formats. Native format starts with feature
(attribute)
names,
their
type
(continuous,
discrete, string). The third line contains meta
information to identify dependent features
Orange
have a vast variety of learning functions like,
KNN, Least Mean Square Error, Naive- Bayes,
Logistic/Linear regression, etc. The following
sample code use KNN method to predict the
class value of test data set.
(class), irrelevant features (ignore) or meta
Let the training dataset is stored in trainset. tab
features (meta).
and test dataset is stored in testset. tab. The
You may download lenses.tab to a target
directory and there open a python shell.
following script will print the class of test data.
Let the training dataset is stored in trainset. tab
and test dataset is stored in testset. tab. The
>>>import Orange
following script will print the class of test data.
>>>data=Orange.data.Table("lense
s")
>>>
>>>train=Orange.data.Table("trai
nset")
Data mining using Orange-python
>>>test=Orange.data.Table("tests
et")
The orange python can be used for all data
>>>learner=Orange.classification
.knn.kNNLearner()
mining
applications
like,
classification,
prediction, clustering and learning.
Here I am illustrating, how orange can be used
>>>classifier = learner(train)
>>>for i in range
(len(trainset)):
for classification.
...print i,classifier(test[i])
For classification, you need a training data set
and test data set. The data readable by orange
methods are stored in a .tab file (stored as
Tables). The table contains different features of
the data and class value. For training data, the
This will print the class of each vector in test
set. You can verify the result by comparing the
results with that by using other learning tools.
Hope ou got it…e jo the e peri e t…!!!!
class value will be the actual class, in which the
feature vector belongs to. In case of testing data
set, the class value will be absent, which have to
CLEAR December 2013
Page 26