Network Latency Problems That Arise in Distributed Applications

Consider, for example, our typical use case. Step 5 stated: After choosing an action, the user is given a list of valid accounts from which to choose e.g., Checking or Savings. The user chooses an account and then the transaction proceeds. This translates, in the account option, into: The client program gets a stub for the appropriate account object from a server somewhere. It then proceeds to make method calls on the account server until the transaction is completed. And the stub, as with RMI applications, plays a role very similar to that of an object reference. That is, it exposes methods that the client application calls. And this is where partial failure is particularly insidious and unexpected. Suppose the server crashes, or becomes otherwise unavailable, in the middle of a transaction. How does an application gracefully recover? The client application cannot know what the server did before becoming inaccessible, and the server when it becomes accessible again doesnt know if it received all the messages that the client sent. In a single-process program, the analogous scenario is this: An object gets a reference to another object and calls a method on it. But, even though the reference is valid, the object referred to isnt there, and an exception is thrown. This is a very strange thing. It can happen in languages such as C++, where programmers are explicitly responsible for memory management. But this should never happen in a garbage- collected language. I said this should never happen. In point of fact, you can run into situations where you have a reference to an object that doesnt exist because of the way threads are defined in the Java Language Specification. Well discuss this in more detail in Chapt er 11 .

5.6.2 Network Latency

The other major problem I mentioned is network latency. Put succinctly, invoking a method or transferring data over a network is slow. People designing distributed systems usually estimate that the overhead of a distributed method call on a fast local area network is on the order of a few milliseconds. However, on a congested LAN, or when calls have to go across the Internet, method calls can be much slower. And if data needs to be sent or returned, the remote call takes even longer. Remote method calls have two main effects: sending data over the wire slows an application down and doing so slows down other distributed applications due to increased network congestion. This last point is very important. Designers of distributed applications cant assume that their program is the only one on the network. During peak business hours, lots of data will be flowing across the network. This means that 1 there is be flowing across the network. This means that 1 there is less bandwidth available to any given application, and 2 using lots of bandwidth impacts the performance of all the distributed programs on the network, not just the one using lots of bandwidth. To demonstrate this, try the following experiment: 1. Clear your web browsers cache and go to a static web site with a lot of images. 2. Click your browsers Reload button. The difference in speed between the first and second viewings of the web page is mostly due to the difference between having the images cached on your local hard drive versus downloading them across the network. You still may have to download the text in the page. But most of the images should be cached by your web browser. In other words, the difference is mostly due to network latency. Clearly, if communicating between two programs across a network is expensive, then a well- designed application needs to somehow account for this, minimizing the number of calls made across the network, the amount of data sent across the network, and the time the user has to wait because of network latency. Minimizing the number of calls, the amount of data sent, and the time the user must wait because of network latency are actually three different, and sometimes conflicting, goals. For example, using compression may reduce the amount of data sent over the network but may result in the user waiting longer because of the time it takes to uncompress the data.

Chapter 6. Deciding on the Remote Server

In Chapt er 5 , we briefly discussed the architecture of the bank example. In addition, we discussed the fundamental problems that arise when building distributed applications. In this chapter, I build on that discussion by introducing a set of basic evaluation criteria that will help you refine designs and choose between various design options.

6.1 A Little Bit of Bias

Good code invariably has small methods and small objects...no one thing I do to systems provides as much help as breaking it into more pieces ™Kent Beck, Smalltalk Best Practice Patterns The experienced distributed systems programmer will notice a certain bias in this chapter [ 1] towards what I call small-scale, semi-independent servers. The small-scale part of this is easy to explain. By and large, I build servers with very limited functionality as little as is reasonable, given the restrictions imposed by the fact that were building a distributed system. Then, I tend to give them large interfaces, exposing the same functionality in multiple ways. [ 1] To be honest, the bias permeates the rest of the boo k, too. If I didnt have opinions, I wouldnt be an author. As far as I know, theres no knockdown argument in favor of this style of designing and building programs. Many programmers who have built object-oriented systems tend to agree with Kent Beck. [ 2] In my experience, his quote almost holds for distributed systems as well™building small servers leads to flexible designs that evolve gracefully over time. However, there is a slight