A Fruitful Data Mining Using Orange

source machine
learning and data mining
A Fruitful Data open
Mining
Using
tool. It includes a set of components for data
ORANGE
Manu Madhavan
Assistant Professor, Dept of CSE
SIMAT, Palakkad.

This article is an introduction to Orange data
mining tool and its use with python scripts, not
in

details,

in

a


brief.

Have

a

fun..

preprocessing, feature scoring and filtering,
modeling, model evaluation, and exploration
techniques.
Here the python scripting using orange package
is discussed.
Install Orange:
To build and install Orange you can use the
setup.py in the root orange directory (requires
GCC, Python and numpy development headers).
The details will be available in the orange
documentation


page

(http://orange.biolab.si/download/).
Data Mining is the computational process of

Test the installation

discovering patterns in large data sets involving

After installing orange, type the following in

methods

python interactive shell.

at

the

intersection


of

artificial

intelligence, machine learning, statistics, and
database

systems.

This

area

has

special

>>>import Orange


applications in areas like medical science,

>>>Orange.version.version

business intelligence, and similar areas of real

'2.6a2.dev-a55510d'

life.

and

If this leaves no error and warning, Orange and

proprietary softwares tools are available for data

Python are properly installed and you are ready

mining applications and research. Orange


to continue.

There

are

many

open

source

(http://orange.biolab.si) is a general-purpose,
CLEAR December 2013

Page 25

be predicted by a learning function.

Data Input

Orange can read files in native and other data
formats. Native format starts with feature
(attribute)

names,

their

type

(continuous,

discrete, string). The third line contains meta
information to identify dependent features

Orange

have a vast variety of learning functions like,
KNN, Least Mean Square Error, Naive- Bayes,
Logistic/Linear regression, etc. The following

sample code use KNN method to predict the
class value of test data set.

(class), irrelevant features (ignore) or meta

Let the training dataset is stored in trainset. tab

features (meta).

and test dataset is stored in testset. tab. The

You may download lenses.tab to a target
directory and there open a python shell.

following script will print the class of test data.
Let the training dataset is stored in trainset. tab
and test dataset is stored in testset. tab. The

>>>import Orange


following script will print the class of test data.
>>>data=Orange.data.Table("lense
s")
>>>

>>>train=Orange.data.Table("trai
nset")

Data mining using Orange-python

>>>test=Orange.data.Table("tests
et")

The orange python can be used for all data

>>>learner=Orange.classification
.knn.kNNLearner()

mining


applications

like,

classification,

prediction, clustering and learning.
Here I am illustrating, how orange can be used

>>>classifier = learner(train)
>>>for i in range
(len(trainset)):

for classification.
...print i,classifier(test[i])
For classification, you need a training data set
and test data set. The data readable by orange
methods are stored in a .tab file (stored as
Tables). The table contains different features of
the data and class value. For training data, the


This will print the class of each vector in test
set. You can verify the result by comparing the
results with that by using other learning tools.
Hope ou got it…e jo the e peri e t…!!!!

class value will be the actual class, in which the
feature vector belongs to. In case of testing data
set, the class value will be absent, which have to
CLEAR December 2013

Page 26