Interactive Command Line Use of Weka
Relative absolute error Root relative squared error
Total Number of Instances 9
This output shows results for testing on the original training data so the classification is perfect. In practice, you will test on separate data sets.
=== Confusion Matrix === a b c
-- classified as 3 0 0 | a = buy
0 3 0 | b = sell 0 0 3 | c = hold
The confusion matrix shows the prediction columns for each data sample rows. Here we see the original data three buy, three sell, and three hold samples. The
following output shows random sampling testing:
=== Stratified cross-validation === Correctly Classified Instances
4 44.4444
Incorrectly Classified Instances 5
55.5556 Kappa statistic
0.1667 Mean absolute error
0.3457 Root mean squared error
0.4513 Relative absolute error
75.5299 Root relative squared error
92.2222 Total Number of Instances
9 With random sampling, we see in the confusion matrix that the three buy recom-
mendations are still perfect, but that both of the sell recommendations are wrong with one buy and two holds and that two of what should have been hold recom-
mendations are buy recommendations.
=== Confusion Matrix === a b c
-- classified as 3 0 0 | a = buy
1 0 2 | b = sell 2 0 1 | c = hold
133
The example in this section is partially derived from documentation at the web site http:weka.sourceforge.netwiki. This example loads the training ARFF data file
seen at the beginning of this chapter and loads a similar ARFF file for testing that is equivalent to the original training file except that small random changes have been
made to the numeric attribute values in all samples. A decision tree model is trained and tested on the new test ARFF data.
import weka.classifiers.meta.FilteredClassifier; import weka.classifiers.trees.J48;
import weka.core.Instances; import weka.filters.unsupervised.attribute.Remove;
import java.io.BufferedReader; import java.io.FileNotFoundException;
import java.io.FileReader; import java.io.IOException;
public class WekaStocks { public static void mainString[] args throws Exception {
We start by creating a new training instance by supplying a reader for the stock training ARFF file and setting the number of attributes to use:
Instances training_data = new Instances new BufferedReader
new FileReader test_datastock_training_data.arff;
training_data.setClassIndex training_data.numAttributes - 1;
We want to test with separate data so we open a separate examples ARFF file to test against:
Instances testing_data = new Instances new BufferedReader
new FileReader test_datastock_testing_data.arff;
testing_data.setClassIndex training_data.numAttributes - 1;
134
The method toSummaryString prints a summary of a set of training or testing instances.
String summary = training_data.toSummaryString; int number_samples = training_data.numInstances;
int number_attributes_per_sample = training_data.numAttributes;
System.out.println Number of attributes in model = +
number_attributes_per_sample; System.out.println
Number of samples = + number_samples; System.out.printlnSummary: + summary;
System.out.println; Now we create a new classifier a J48 classifier in this case and we see how to
optionally filter remove samples. We build a classifier using the training data and then test it using the separate test data set:
a classifier for decision trees: J48 j48 = new J48;
filter for removing samples: Remove rm = new Remove;
remove first attribute rm.setAttributeIndices1;
filtered classifier FilteredClassifier fc = new FilteredClassifier;
fc.setFilterrm; fc.setClassifierj48;
train using stock_training_data.arff: fc.buildClassifiertraining_data;
test using stock_testing_data.arff: for int i = 0;
i testing_data.numInstances; i++ { double pred =
fc.classifyInstancetesting_data. instancei;
System.out.printgiven value: + testing_data.classAttribute.
valueinttesting_data.instancei. classValue;
System.out.println. predicted value: +
135
testing_data.classAttribute.valueintpred; }
} }
This example program produces the following output some output not shown due to page width limits:
Number of attributes in model = 4 Number of samples = 9
Summary: Relation Name: stock
Num Instances: 9
Num Attributes: 4 Name
Type Nom
Int Real ...
1 percent_change_since_open Num
11 89
... 2 percent_change_from_day_l
Num 22
78 ...
3 percent_change_from_day_h Num
0 100 ...
4 action Nom 100
... given value: hold. predicted value: hold
given value: sell. predicted value: sell given value: buy. predicted value: buy
given value: hold. predicted value: buy given value: sell. predicted value: sell
given value: buy. predicted value: buy given value: hold. predicted value: hold
given value: sell. predicted value: buy given value: buy. predicted value: buy