Figure 8.1: Running the Weka Data Explorer
Figure 8.2: Running the Weka Data Explorer
131
The decision tree is displayed in the “Classifier output” window pane. We will run this same problem from the command line in the next section and then discuss the
generated decision tree seen in the lower right panel of the GUI display seen in Figure 8.2.
8.2 Interactive Command Line Use of Weka
We will run the same problem as in the previous section and discuss the sections of the output report:
java -cp ..libweka.jar \\ weka.classifiers.trees.J48 -t \\
stock\_training_data.arff -x 2 J48 pruned tree
------------------ percent_change_from_day_low = 0.12
| percent_change_since_open = -2: sell 3.0
| percent_change_since_open -2: hold 3.0
percent_change_from_day_low 0.12: buy 3.0 Number of Leaves
: 3
Size of the tree : 5
The generated decision tree can be described in English as “If the percent change of a stock from the day low is less than or equal to 0.12 then if the percent change since
the open is less than -2 then sell the stock, otherwise keep it. If the percent change from the day low is greater than 0.12 then purchase more shares.”
Time taken to build model: 0.01 seconds Time taken to test model on training data: 0 seconds
=== Error on training data === Correctly Classified Instances
9 100
Incorrectly Classified Instances Kappa statistic
1 Mean absolute error
Root mean squared error
132
Relative absolute error Root relative squared error
Total Number of Instances 9
This output shows results for testing on the original training data so the classification is perfect. In practice, you will test on separate data sets.
=== Confusion Matrix === a b c
-- classified as 3 0 0 | a = buy
0 3 0 | b = sell 0 0 3 | c = hold
The confusion matrix shows the prediction columns for each data sample rows. Here we see the original data three buy, three sell, and three hold samples. The
following output shows random sampling testing:
=== Stratified cross-validation === Correctly Classified Instances
4 44.4444
Incorrectly Classified Instances 5
55.5556 Kappa statistic
0.1667 Mean absolute error
0.3457 Root mean squared error
0.4513 Relative absolute error
75.5299 Root relative squared error
92.2222 Total Number of Instances
9 With random sampling, we see in the confusion matrix that the three buy recom-
mendations are still perfect, but that both of the sell recommendations are wrong with one buy and two holds and that two of what should have been hold recom-
mendations are buy recommendations.
=== Confusion Matrix === a b c
-- classified as 3 0 0 | a = buy
1 0 2 | b = sell 2 0 1 | c = hold
133