Scalable network applications

10.4 Scalable network applications

Server-side applications are often required to operate with full efficiency under extreme load. Efficiency , in this sense, relates to both the throughput

of the server and the number of clients it can handle. In some cases, it is common to deny new clients to conserve resources for existing clients.

The key to providing scalable network applications is to keep threading as efficient as possible. In many examples in this book, a new thread is cre- ated for each new client that connects to the server. This approach, although simple, is not ideal. The underlying management of a single thread consumes far more memory and processor time than a socket.

In benchmarking tests, a simple echo server, running on a Pentium IV

1.7 GHz with 768-Mb memory, was connected to three clients: a Pentium

II 233 MHz with 128-Mb memory, a Pentium II 350 MHz with 128-Mb memory, and an Itanium 733 MHz with 1-Gb memory. This semitypical arrangement demonstrated that using the approach outlined above, the

10.5 Future proofing 255

server could only serve 1,008 connections before it reached an internal thread creation limit. The maximum throughput was 2 Mbps. When a fur- ther 12,000 connections were attempted and rejected, the throughput keeled off to a mere 404 Kbps.

The server, although having adequate memory and CPU time resources to handle the additional clients, was unable to because it could not create any further threads as thread creations and destructions were taking up all of the CPU resources. To better manage thread creation, a technique known as thread pooling (demonstrated later in this chapter) can be employed. When thread pooling was applied to the echo server example, the server performed somewhat better. With 12,000 client connections, the server handled each one without fail. The throughput was 1.8 Mbps, vastly outperforming the software in the previous example, which obtained only

0.4 Mbps at the same CPU load. As a further 49,000 clients connected, however, the server began to drop 0.6% of the connections. At the same time, the CPU usage reached 95% of its peak capacity. At this load, the combined throughput was 3.8 Mbps.

Thread pooling unarguably provides a scalability bonus, but it is not acceptable to consume 95% of server resources just doing socket I/O, espe- cially when other applications must also use the computer. In order to beef up the server, the threading model should be abandoned completely, in favor of I/O completion ports (see Chapter 3). This methodology uses asynchronous callbacks that are managed at the operating system level.

By modifying the above example to use I/O completion ports rather than thread pools, the server once again handled 12,000 clients without fail; however, this time the throughput was an impressive 5 Mbps. When the load was pushed to 50,000 clients, the server handled these connections virtually flawlessly and maintained a healthy throughput of 4.3 Mbps. The CPU usage at this load was 65%, which could have permitted other appli- cations to run on the same server without conflicts.

In the thread-pool and completion-port models, the memory usage at 50,000 connections was more than 240 Mb, including non-paged-pool usage at more than 145 Mb. If the server had less than this available in physical memory, the result would have been substantially worse.