Adding Momentum to Speed Up Back-Prop Training

class constructor now takes another parameter alpha that determines how strong the momentum correction is when we modify weight values: momentum scaling term that is applied to last delta weight: private float alpha = 0f; While this alpha term is used three times in the training code, it suffices to just look at one of these uses in detail. When we allocated the three weight arrays W 1, W 2, and W 3 we also now allocate three additional arrays of corresponding same size: W 1 last delta, W 2 last delta, and W 3 last delta. These three new arrays are used to store the weight changes for use in the next training cycle. Here is the original code to update W 3 from the last section: W3[h][o] += TRAINING_RATE output_errors[o] hidden2[h]; The following code snippet shows the additions required to use momentum: W3[h][o] += TRAINING_RATE output_errors[o] hidden2[h] + apply the momentum term: alpha W3_last_delta[h][o]; W3_last_delta[h][o] = TRAINING_RATE output_errors[o] hidden2[h]; I mentioned in the last section that there are two techniques for training back-prop networks: updating the weights after processing each training example or waiting to update weights until all training examples are processed. I always use the first method when I don’t use momentum. In many cases it is best to use the second method when using momentum. 128 8 Machine Learning with Weka Weka is a standard Java tool for performing both machine learning experiments and for embedding trained models in Java applications. I have used Weka since 1999 and it is usually my tool of choice on machine learning projects that are com- patible with Weka’s use of the GPL license. In addition to the material in this chapter you should visit the primary Weka web site www.cs.waikato.ac.nzmlweka for more examples and tutorials. Good online documentation can also be found at weka.sourceforge.netwekadoc. Weka can be run both as a GUI application and for using a command line interface for running experiments. While the techniques of machine learning have many practical applications the example used in this chapter is simple and is mostly intended to show you the techniques for running Weka and techniques for embedding Weka in your Java applications. Full documentation of the many machine learning algorithms is outside the scope of this chapter. In addition to data cleansing and preprocessing utilities filters for data normaliza- tion, resampling, transformations, etc. Weka supports most machine-learning tech- niques for automatically calculating classification systems. I have used the following Weka learning modules in my own work: • Nave Bayes – uses Bayes’s rule for probability of a hypothesis given evidence. • Instance-based learner – stores all training examples and use. • C4.5 – a learning scheme by J Ross Quinlan that calculates decision trees from training data. We will use the J48 algorithm in this chapter. Weka can be used for both unsupervised and supervised learning. An example of unsupervised learning is processing a set of unlabeled data and automatically clus- tering the data into smaller sets containing similar items. We will use supervised learning as the example in this chapter: data on daily stock prices is labeled as buy, sell, or hold. We will use the J48 algorithm to automatically build a decision tree for deciding on how to process a stock, given its cost data. This example is simplistic and should not be used to actually trade stocks. It is also possible to induce rules from training data that are equivalent to decision trees for the same training data. The learned model uses linear combinations of attribute values for classification. We are going to use a simple example to learn how to use Weka interactively and 129 embedded in applications in the next two sections. Weka uses a data file format call ARFF. The following listing shows the sample ARFF input file that we will use in the next two sections: relation stock attribute percent_change_since_open real attribute percent_change_from_day_low real attribute percent_change_from_day_high real attribute action {buy, sell, hold} data -0.2,0.1,-0.22,hold -2.2,0.0,-2.5,sell 0.2,0.21,-0.01,buy -0.22,0.12,-0.25,hold -2.0,0.0,-2.1,sell 0.28,0.26,-0.04,buy -0.12,0.08,-0.14,hold -2.6,0.1,-2.6,sell 0.24,0.25,-0.03,buy Here the concept of a relation is similar to a relation in PowerLoom as we saw in Chapter 3: a relation has a name and a list of attributes, each with an allowed data type. Here the relation name is “stock” and we have three attributes that have floating point numerical values and a fourth attribute that has an enumeration of discrete allowed values. The data section defines data for initializing nine stock relations.

8.1 Using Weka’s Interactive GUI Application

The Weka JAR file is included with the ZIP file for this book. To run the Weka GUI application, change directory to test data and type: java -cp ..lib -jar ..libweka.jar Once you have loaded and possibly browsed the data as seen in Figure 8.1 you can then select the classifier tab, and using the “Choose” Classifier option, find J48 under the trees submenu, and click the “Start” button. The results can be seen in Figure 8.2. 130 Figure 8.1: Running the Weka Data Explorer Figure 8.2: Running the Weka Data Explorer 131