How Serialization Detects When a Class Has Changed Implementing Your Own Versioning Scheme

The second type of version problem arises from local changes to a serializable class. Suppose, for example, that in our bank example, we want to add the possibility of handling different currencies. To do so, we define a new class, Currency , and change the definition of Money : public class Money extends ValueObject { public float amount; public Currency typeOfMoney; } This completely changes the definition of Money but doesnt change the object hierarchy at all. The important distinction between the two types of versioning problems is that the first type cant really be repaired. If you have old data lying around that was serialized using an older class hierarchy, and you need to use that data, your best option is probably something along the lines of the following: 1. Using the old class definitions, write an application that deserializes the data into instances and writes the instance data out in a neutral format, say as tab-delimited columns of text. 2. Using the new class definitions, write a program that reads in the neutral-format data, creates instances of the new classes, and serializes these new instances. The second type of versioning problem, on the other hand, can be handled locally, within the class definition.

10.5.2 How Serialization Detects When a Class Has Changed

In order for serialization to gracefully detect when a versioning problem has occurred, it needs to be able to detect when a class has changed. As with all the other aspects of serialization, there is a default way that serialization does this. And there is a way for you to override the default. The default involves a hashcode. Serialization creates a single hashcode, of type long , from the following information: • The class name and modifiers • The names of any interfaces the class implements • Descriptions of all methods and constructors except private methods and constructors • Descriptions of all fields except private , static , and private transient This single long , called the classs stream unique identifier often abbreviated suid , is used to detect when a class changes. It is an extraordinarily sensitive index. For example, suppose we add the following method to Money : public boolean isBigBucks { return _cents 5000; } We havent changed, added, or removed any fields; weve simply added a method with no side effects at all. But adding this method changes the suid . Prior to adding it, the suid was 6625436957363978372L ; afterwards, it was -3144267589449789474L . Moreover, if we had made isBigBucks a protected method, the suid would have been 4747443272709729176L . These numbers can be computed using the serialVer program that ships with the JDK. For example, these were all computed by typing serialVer com.ora.rmibook.chapter10.Money at the command line for slightly different versions of the Money class. The default behavior for the serialization mechanism is a classic better safe than sorry strategy. The serialization mechanism uses the suid , which defaults to an extremely sensitive index, to tell when a class has changed. If so, the serialization mechanism refuses to create instances of the new class using data that was serialized with the old classes.

10.5.3 Implementing Your Own Versioning Scheme

While this is reasonable as a default strategy, it would be painful if serialization didnt provide a way to override the default behavior. Fortunately, it does. Serialization uses only the default suid if a class definition doesnt provide one. That is, if a class definition includes a static final long named serialVersionUID , then serialization will use that static final long value as the suid . In the case of our Money example, if we included the line: private static final long serialVersionUID = 1; in our source code, then the suid would be 1, no matter how many changes we made to the rest of the class. Explicitly declaring serialVersionUID allows us to change the class, and add convenience methods such as isBigBucks , without losing backwards compatibility. serialVersionUID doesnt have to be private. However, it must be static , final , and long . The downside to using serialVersionUID is that, if a significant change is made for example, if a field is added to the class definition, the suid will not reflect this difference. This means that the deserialization code might not detect an incompatible version of a class. Again, using Money as an example, suppose we had: public class Money extends ValueObject { private static final long serialVersionUID = 1; protected int _cents; and we migrated to: public class Money extends ValueObject { private static final long serialVersionUID = 1; public float amount; public Currency typeOfMoney; } The serialization mechanism wont detect that these are completely incompatible classes. Instead, when it tries to create the new instance, it will throw away all the data it reads in. Recall that, as part of the metadata, the serialization algorithm records the name and type of each field. Since it cant find the fields during deserialization, it simply discards the information. The solution to this problem is to implement your own versioning inside of readObject and writeObject . The first line in your writeObject method should begin: private void writeObjectjava.io.ObjectOutputStream out t hrows IOException { stream.writeIntVERSION_NUMBER; .... } In addition, your readObject code should start with a switch statement based on the version number: private void readObjectjava.io.ObjectInputStream in throws IOException, ClassNotFoundException { int version = in.readInt ; switchversion { version specific demarshalling code. ....} } Doing this will enable you to explicitly control the versioning of your class. In addition to the added control you gain over the serialization process, there is an important consequence you ought to consider before doing this. As soon as you start to explicitly version your classes, defaultWriteObject and defaultReadObject lose a lot of their usefulness. Trying to control versioning puts you in the position of explicitly writing all the marshalling and demarshalling code. This is a trade-off you might not want to make.

10.6 Performance Issues