Implement the Serializable Interface Make Sure That Superclass State Is Handled Correctly

The last group of methods consists mostly of protected methods that provide hooks, which allow the serialization mechanism itself, rather than the data associated to a particular class, to be customized. These methods are: protected boolean enableResolveObjectboolean enable; protected Class resolveClassObje ctStreamClass v; protected Object resolveObjectObject obj; protected class resolveProxyClassString[] interfaces; protected ObjectStreamClass readClassDescriptor ; protected Object readObjectOverride ; protected void readStreamHeader ; public void registerValidationObjectInputValidation obj, int priority; public GetFields readFields ; These methods are more important to people who tailor the serialization algorithm to a particular use or develop their own implementation of serialization. Like before, they also require a deeper understanding of the serialization algorithm, so Ill hold off on discussing them right now.

10.3 How to Make a Class Serializable

So far, weve focused on the mechanics of serializing an object. Weve assumed we have a serializable object and discussed, from the point of view of client code, how to serialize it. The next step is discussing how to make a class serializable. There are four basic things you must do when you are making a class serializable. They are: 1. Implement the Serializable interface. 2. Make sure that instance-level, locally defined state is serialized properly. 3. Make sure that superclass state is serialized properly. 4. Override equals and hashCode . Lets look at each of these steps in more detail.

10.3.1 Implement the Serializable Interface

This is by far the easiest of the steps. The Serializable interface is an empty interface; it declares no methods at all. So implementing it amounts to adding implements Serializable to your class declaration. Reasonable people may wonder about the utility of an empty interface. Rather than define an empty interface, and require class definitions to implement it, why not just simply make every object serializable? The main reason not to do this is that there are some classes that dont have an obvious serialization. Consider, for example, an instance of File . An instance of File represents a file. Suppose, for example, it was created using the following line of code: File file = new Filec:\\temp\\foo; Its not at all clear what should be written out when this is serialized. The problem is that the file itself has a different lifecyle than the serialized data. The file might be edited, or deleted entirely, while the serialized information remains unchanged. Or the serialized information might be used to restart the application on another machine, where C:\\temp\\foo is the name of an entirely different file. Another example is provided by the Thread [ 4] class. A thread represents a flow of execution within a particular JVM. You would not only have to store the stack, and all the local variables, but also all the related locks and threads, and restart all the threads properly when the instance is deserialized. [ 4] If you dont know much about threads, just wait a few chapters and then revisit this example. It will make more sense then. Things get worse when you consider platform dependencies. In general, any class that involves native code is not really a good candidate for serialization.

10.3.2 Make Sure That Instance-Level, Locally Defined StateIs Serialized Properly

Class definitions contain variable declarations. The instance-level, locally defined variables e.g., the nonstatic variables are the ones that contain the state of a particular instance. For example, in our Money class, we declared one such field: public class Money extends ValueObject { private int _cents; .... } The serialization mechanism has a nice default behavior™if all the instance-level, locally defined variables have values that are either serializable objects or primitive datatypes, then the serialization mechanism will work without any further effort on our part. For example, our implementations of Account , such as Account_Impl , would present no problems for the default serialization mechanism: public class Account_Impl extends UnicastRemoteObject implements Account { private Money _balance; ... } While _balance doesnt have a primitive type, it does refer to an instance of Money , which is a serializable class. If, however, some of the fields dont have primitive types, and dont refer to serializable classes, more work may be necessary. Consider, for example, the implementation of ArrayList from the java.util package. An ArrayList really has only two pieces of state: public class ArrayList extends AbstractList implements List, Cloneable, java.io. Serializable { private Object elementData[]; private int size; ... } But hidden in here is a huge problem: ArrayList is a generic container class whose state is stored as an array of objects. While arrays are first-class objects in Java, they arent serializable objects. This means that ArrayList cant just implement the Serializable interface. It has to provide extra information to help the serialization mechanism handle its nonserializable fields. There are three basic solutions to this problem: • Fields can be declared to be transient. • The writeObject readObject methods can be implemented. • serialPersistentFields can be declared.

10.3.2.1 Declaring transient fields

The first, and easiest, thing you can do is simply mark some fields using the transient keyword. In ArrayList , for example, elementData is really declared to be a transient field: public class ArrayList extends AbstractList implements List, Cloneable, java.io. Serializable { private transient Object elementData[]; private int size; ... } This tells the default serialization mechanism to ignore the variable. In other words, the serialization mechanism simply skips over the transient variables. In the case of ArrayList , the default serialization mechanism would attempt to write out size , but ignore elementData entirely. This can be useful in two, usually distinct, situations: The variable isnt serializable If the variable isnt serializable, then the serialization mechanism will throw an exception when it tries to serialize the variable. To avoid this, you can declare the variable to be transient. The variable is redundant Suppose that the instance caches the result of a computation. Locally, we might want to store the result of the computation, in order to save some processor time. But when we send the object over the wire, we might worry more about consuming bandwidth and thus discard the cached computation since we can always regenerate it later on.

10.3.2.2 Implementing writeObject and readObject

Suppose that the first case applies. A field takes values that arent serializable. If the field is still an important part of the state of our instance, such as elementData in the case of an ArrayList , simply declaring the variable to be transient isnt good enough. We need to save and restore the state stored in the variable. This is done by implementing a pair of methods with the following signatures: private void writeObjectjava.io.ObjectOutputStream out throws IOException private void readObjectjava.io.ObjectInputStream in throws IOException, ClassNotFoundException; When the serialization mechanism starts to write out an object, it will check to see whether the class implements writeObject . If so, the serialization mechanism will not use the default mechanism and will not write out any of the instance variables. Instead, it will call writeObject and depend on the method to store out all the important state. Here is ArrayList s implementation of writeObject : private synchronized void writeObjectjava.io.ObjectOutputStream stream throws java. io.IOException { stream.defaultWriteObject ; stream.writeIntelementData.length; for int i=0; isize; i++ stream.writeObjectelementData[i]; } The first thing this does is call defaultWriteObject . defaultWriteObject invokes the default serialization mechanism, which serializes all the nontransient, nonstatic instance variables. Next, the method writes out elementData.length and then calls the streams writeObject for each element of elementData . Theres an important point here that is sometimes missed: readObject and writeObject are a pair of methods that need to be implemented together. If you do any customization of serialization inside one of these methods, you need to implement the other method. If you dont, the serialization algorithm will fail. Unit Tests and Serialization Unit tests are used to test a specific piece of functionality in a class. They are explicitly not end-to-end or application-level tests. Its often a good idea to adopt a unit-testing harness such as JUnit when developing an application. JUnit gives you an automated way to run unit tests on individual classes and is available from ht t p: w w w .j unit .or g . If you adopt a unit-testing methodology, then any serializable class should pass the following three tests: • If it implements readObject , it should implement writeObject , and vice-versa. • It is equal using the equals method to a serialized copy of itself. • It has the same hashcode as a serialized copy of itself. Similar constraints hold for classes that implement the Externalizable interface.

10.3.2.3 Declaring serialPersistentFields

The final option that can be used is to explicitly declare which fields should be stored by the serialization mechanism. This is done using a special static final variable called serialPersistentFields , as shown in the following code snippet: private static final ObjectStreamField[] serialPersistentFields = { new ObjectStreamFieldsize, Integer.TYPE, .... }; This line of code declares that the field named size , which is of type int , is a serial persistent field and will be written to the output stream by the serialization mechanism. Declaring serialPersistentFields is almost the opposite of declaring some fields transient . The meaning of transient is, This field shouldnt be stored by serialization, and the meaning of serialPersistentFields is, These fields should be stored by serialization. But there is one important difference between declaring some variables to be transient and others to be serialPersistentFields . In order to declare variables to be transient, they must be locally declared. In other words, you must have access to the code that declares the variable. There is no such requirement for serialPersistentFields . You simply provide the name of the field and the type. What if you try to do both? That is, suppose you declare some variables to be transient , and then also provide a definition for serialPersistentFields ? The answer is that the transient keyword is ignored; the definition of serialPersistentFields is definitive. So far, weve talked only about instance-level state. What about class-level state? Suppose you have important information stored in a static variable? Static variables wont get saved by serialization unless you add special code to do so. In our context, shipping objects over the wire between clients and servers, statics are usually a bad idea anyway.

10.3.3 Make Sure That Superclass State Is Handled Correctly

After youve handled the locally declared state, you may still need to worry about variables declared in a superclass. If the superclass implements the Serializable interface, then you dont need to do anything. The serialization mechanism will handle everything for you, either by using default serialization or by invoking writeObject readObject if they are declared in the superclass. If the superclass doesnt implement Serializable , you will need to store its state. There are two different ways to approach this. You can use serialPersistentFields to tell the serialization mechanism about some of the superclass instance variables, or you can use writeObject readObject to handle the superclass state explicitly. Both of these, unfortunately, require you to know a fair amount about the superclass. If youre getting the .class files from another source, you should be aware that versioning issues can cause some really nasty problems. If you subclass a class, and that classs internal representation of instance-level state changes, you may not be able to load in your serialized data. While you can sometimes work around this by using a sufficiently convoluted readObject method, this may not be a solvable problem. Well return to this later. However, be aware that the ultimate solution may be to just implement the Externalizable interface instead, which well talk about later. Another aspect of handling the state of a nonserializable superclass is that nonserializable superclasses must have a zero-argument constructor. This isnt important for serializing out an object, but its incredibly important when deserializing an object. Deserialization works by creating an instance of a class and filling out its fields correctly. During this process, the deserialization algorithm doesnt actually call any of the serialized classs constructors, but does call the zero- argument constructor of the first nonserializable superclass. If there isnt a zero-argument constructor, then the deserialization algorithm cant create instances of the class, and the whole process fails. If you cant create a zero-argument constructor in the first nonserializable superclass, youll have to implement the Externalizable interface instead. Simply adding a zero-argument constructor might seem a little problematic. Suppose the object already has several constructors, all of which take arguments. If you simply add a zero-argument constructor, then the serialization mechanism might leave the object in a half-initialized, and therefore unusable, state. However, since serialization will supply the instance variables with correct values from an active instance immediately after instantiating the object, the only way this problem could arise is if the constructors actually do something with their arguments™besides setting variable values. If all the constructors take arguments and actually execute initialization code as part of the constructor, then you may need to refactor a bit. The usual solution is to move the local initialization code into a new method usually named something like initialize , which is then called from the original constructor: public MyObjectarglist { set local variables from arglist perform local initialization } to something that looks like: private MyObject { zero argument constructor, invoked by serialization and never by any other piece of code. note that it doesnt call initialize } public void MyObjectarglist { set local variables from arglist initialize ; } private void initialize { perform local initialization } After this is done, writeObject readObject should be implemented, and readObject should end with a call to initialize . Sometimes this will result in code that simply invokes the default serialization mechanism, as in the following snippet: private void writeObjectjava.io.ObjectOutputStream stream throws java.io.IOException { stream.defaultWriteObject ; } private void readObjectjava.io.ObjectInputStream stream throws java.io.IOException { stream.defaultReadObject ; intialize ; } If creating a zero-argument constructor is difficult for example, you dont have the source code for the superclass, your class will need to implement the Externalizable interface instead of Serializable .

10.3.4 Override equals and hashCode if Necessary