Partial Failures Problems That Arise in Distributed Applications

Handle what is usually called business logic. That is, they respond to client requests, manipulate data, and occasionally store that data out to a database. In particular, servers respond to requests from a client and make requests of the database. Database system Responsible for long-term persistence and integrity of important data. This already exists; our main task with respect to it will be figuring out how to manage the communication between our servers and it. Given this, the main architectural questions that need to be resolved are: how many servers are there and what are they? There are two obvious choices: a single instance of Bank or many instances of Account . In the first case, there is a single server whose interface contains methods such as the following: public Money getBalanceAccount account throws RemoteException; public void makeDepositAccount account, Money amount throws RemoteException, NegativeAmountException; public void makeWithdrawalAccount account, Money amount throws RemoteException, OverdraftException, NegativeAmountException; Note that each method is passed an account description parameter presumably, though not necessarily, as a value object. This immediately suggests the second alternative: make each account a separate server. The corresponding methods look similar; they simply have one fewer argument: public Money getBalance throws RemoteException; public void makeDeposit Money amount throws RemoteException, NegativeAmountException; public void makeWithdrawal Money amount throws RemoteException, OverdraftException, NegativeAmountException; In this scenario, there are many instances of a class that implements Account . These instances, however, are not running in distinct JVMs. Instead, many small server objects are all residing inside a few JVMs. Hence, they are implicitly either sharing, or contending, for resources. In later chapters, I refer to these two options as the bank option and the accounts option, respectively.

5.6 Problems That Arise in Distributed Applications

Now that weve got a preliminary description of how the application will be structured, a question arises: what is the role of the network in all of this?Or stated more precisely, this important question is: What problems does making the application distributed cause? The answer is that there are two main new problems associated to building a distributed application: the possibility of partial failures and the latency of the network. Lets look at both in more detail.

5.6.1 Partial Failures

A partial failure occurs when one of the programs becomes inaccessible to the other programs that are running. This can happen because the program has crashed or because the network is experiencing problems. In either case, the possibility of partial failure can cause problems for the application designer. Consider, for example, our typical use case. Step 5 stated: After choosing an action, the user is given a list of valid accounts from which to choose e.g., Checking or Savings. The user chooses an account and then the transaction proceeds. This translates, in the account option, into: The client program gets a stub for the appropriate account object from a server somewhere. It then proceeds to make method calls on the account server until the transaction is completed. And the stub, as with RMI applications, plays a role very similar to that of an object reference. That is, it exposes methods that the client application calls. And this is where partial failure is particularly insidious and unexpected. Suppose the server crashes, or becomes otherwise unavailable, in the middle of a transaction. How does an application gracefully recover? The client application cannot know what the server did before becoming inaccessible, and the server when it becomes accessible again doesnt know if it received all the messages that the client sent. In a single-process program, the analogous scenario is this: An object gets a reference to another object and calls a method on it. But, even though the reference is valid, the object referred to isnt there, and an exception is thrown. This is a very strange thing. It can happen in languages such as C++, where programmers are explicitly responsible for memory management. But this should never happen in a garbage- collected language. I said this should never happen. In point of fact, you can run into situations where you have a reference to an object that doesnt exist because of the way threads are defined in the Java Language Specification. Well discuss this in more detail in Chapt er 11 .

5.6.2 Network Latency