Using Weka’s Interactive GUI Application

Figure 8.1: Running the Weka Data Explorer Figure 8.2: Running the Weka Data Explorer 131 The decision tree is displayed in the “Classifier output” window pane. We will run this same problem from the command line in the next section and then discuss the generated decision tree seen in the lower right panel of the GUI display seen in Figure 8.2.

8.2 Interactive Command Line Use of Weka

We will run the same problem as in the previous section and discuss the sections of the output report: java -cp ..libweka.jar \\ weka.classifiers.trees.J48 -t \\ stock\_training_data.arff -x 2 J48 pruned tree ------------------ percent_change_from_day_low = 0.12 | percent_change_since_open = -2: sell 3.0 | percent_change_since_open -2: hold 3.0 percent_change_from_day_low 0.12: buy 3.0 Number of Leaves : 3 Size of the tree : 5 The generated decision tree can be described in English as “If the percent change of a stock from the day low is less than or equal to 0.12 then if the percent change since the open is less than -2 then sell the stock, otherwise keep it. If the percent change from the day low is greater than 0.12 then purchase more shares.” Time taken to build model: 0.01 seconds Time taken to test model on training data: 0 seconds === Error on training data === Correctly Classified Instances 9 100 Incorrectly Classified Instances Kappa statistic 1 Mean absolute error Root mean squared error 132 Relative absolute error Root relative squared error Total Number of Instances 9 This output shows results for testing on the original training data so the classification is perfect. In practice, you will test on separate data sets. === Confusion Matrix === a b c -- classified as 3 0 0 | a = buy 0 3 0 | b = sell 0 0 3 | c = hold The confusion matrix shows the prediction columns for each data sample rows. Here we see the original data three buy, three sell, and three hold samples. The following output shows random sampling testing: === Stratified cross-validation === Correctly Classified Instances 4 44.4444 Incorrectly Classified Instances 5 55.5556 Kappa statistic 0.1667 Mean absolute error 0.3457 Root mean squared error 0.4513 Relative absolute error 75.5299 Root relative squared error 92.2222 Total Number of Instances 9 With random sampling, we see in the confusion matrix that the three buy recom- mendations are still perfect, but that both of the sell recommendations are wrong with one buy and two holds and that two of what should have been hold recom- mendations are buy recommendations. === Confusion Matrix === a b c -- classified as 3 0 0 | a = buy 1 0 2 | b = sell 2 0 1 | c = hold 133