Pay Careful Attention to What You Serialize

12.2.9 Pay Careful Attention to What You Serialize

Recall how serialization, the default mechanism for marshalling and demarshalling objects, works. It starts with a single instance and records all the information associated with that instance. Some of the attribute values of the instance may themselves be instances of other classes. Those instances also get serialized, and so on. Serialization traverses all the instances reachable from the first instance, and records all the information associated with each. I previously mentioned that serialization uses the reflection API quite extensively and is rather slow. In the world of multithreaded servers, however, serialization has another flaw that is much more serious: serialization essentially assumes that the object graph is static. To see what I mean, assume, for a moment, that our instances of Account also keep records. That is, in addition to recording the new balance after every operation, they store a list of transactions to track the individual operations that occur: public class Transaction implements Serializable { public Money amount; public int typeOfTransaction; public time whenMade; } In addition, assume we have a persistence layer that serializes out the instance of Account every now and then using a background thread. At first glance, this seems quite nice. Money needs to implement Serializable anyway to pass over the wire. By adding the words implements Serializable to our implementation of Account and implementing a simple background thread, we can restore accounts when the system crashes and restart accounts on different servers quite easily. However, care needs to be taken when we do this. Suppose that we do serialize a server in a background thread, and suppose that a client makes a request on the server while serialization is occurring. Unless we are careful, scenarios such as the following can happen: 1. Serialization starts. The balance is recorded by the serialization mechanism, and we begin serializing each of the transactions. 2. A request comes in, the balance changes, and a transaction is added to the end of the list of transactions. 3. Serialization finishes recording all the transactions, including the one that was just registered. This is a problem. The balance we stored is inconsistent with the list of transactions we wrote™ its the balance from before the final request came in. The serialized copy of our implementation of Account that we saved, which is intended to be a correct copy of our server, is flawed because another thread changed the data while serialization was occurring. What makes this problem especially insidious is that it doesnt crash the server. It simply corrupts the backup copy of your data. So when the server crashes due to some other problem, you wont be able to recover. In practice, theres really only one solution to this problem. While the serialization is going on, we need to block all operations that alter the state of the objects being serialized. Since serialization is slow, and can traverse a large number of instances, this can be a problem. In the case of the bank example, this solution doesnt really cause a problem. If we serialize only instances of Account that havent been active for a while, the risks of locking out a client who wants to access her money are minimal. However, in other applications, using serialization for persistence can lead to serious problems. In practice, serialization is fine for client applications. Its quite easy to design data objects so they are fast to serialize and involve passing small amounts of information over the wire. Furthermore, most clients are single-threaded anyway; since the serialization algorithms data-corruption problems occur only when multiple threads are running, they rarely occur on the client side. Yet if you use serialization for persistence, logging, or to pass state between servers, you need to be careful. The rules are simple: Make serialization fast Limit the number of instances that can be reached by the serialization algorithm from any serializable instance. Make serialization safe Make sure that the objects being serialized are locked during the serialization algorithm.

12.2.10 Use Threading to Reduce Response-Time Variance