Focus on shared resources Predict the effects of parallelism Assess the costs of data conversions Determine whether batch processing is faster
13.4.5.4 Consider the total work done and the design overhead
Try stripping your design to the bare essentials or going back to the specification. Consider how to create a special-purpose implementation that handles the specification for a specific set of inputs. This can give you an estimate of the actual work your application will do. Now consider your design and look at the overheads added by the design for each piece of functionality. This provides a good way to focus on the overheads and determine if they are excessive.13.4.5.5 Focus on shared resources
Shared resources almost always cause performance problems if they have not been designed to optimize performance. Ensure that any simulation correctly simulates the sharing of resources, and use prediction analyses such as those in Section 13.4.1.3 earlier in this chapter to predict the behavior of multiple objects using shared resources.13.4.5.6 Predict the effects of parallelism
Consider what happens when your design is spread over multiple threads, processes, CPUs, machines, etc. This analysis can be quite difficult without a simulation and test bed, but it can help to identify whether the design limits the use of parallelism.13.4.5.7 Assess the costs of data conversions
Many applications convert data between different types e.g., between strings and numbers. From your design, you should be able to determine the frequency and types of data conversions, and it is fairly simple to create small tests that determine the costs of the particular conversions you are using. Dont forget to include any concurrency or use of shared resources in the tests. Remember that external transfer of objects or data normally includes some data conversions. The cost of data conversion may be significant enough to direct you to alter your design.13.4.5.8 Determine whether batch processing is faster
Some repeated tasks can be processed as a batch instead of one at a time. Batch processing can take advantage of a number of efficiencies, such as accessing and creating some objects just once, eliminating some tests for shared resources, processing tasks in optimal order, avoiding repeated searches, etc. If any particular set of tasks could be processed in batch mode, consider the effect this would have on your application and how much faster the processing could be. The simplest conceptual example is that of adding characters one by one to a StringBuffer , as opposed to using a char array to add all the characters together. Adding the characters using a char array is much faster for any significant number of characters.13.5 Tuning After Deployment
Tuning does not necessarily end at the development stage. For many applications such as agent applications, services, servlets and servers, multiuser applications, enterprise systems, etc., there needs to be constant monitoring of the application performance after deployment to ensure that no degradation takes place. In this section, I discuss tuning the deployed application. This is mainlyParts
» OReilly.Java.performance tuning
» The Tuning Game System Limitations and What to Tune
» A Tuning Strategy Introduction
» Threading to Appear Quicker Streaming to Appear Quicker
» User Agreements Starting to Tune
» Setting Benchmarks Starting to Tune
» The Benchmark Harness Starting to Tune
» Taking Measurements Starting to Tune
» What to Measure Introduction
» Dont Tune What You Dont Need to Tune
» Measurements and Timings Profiling Tools
» Garbage Collection Profiling Tools
» Profiling Methodology Method Calls
» Java 2 cpu=samples Profile Output
» HotSpot and 1.3 -Xprof Profile Output
» JDK 1.1.x -prof and Java 2 cpu=old Profile Output
» Object-Creation Profiling Profiling Tools
» Monitoring Gross Memory Usage
» Replacing Sockets ClientServer Communications
» Performance Checklist Profiling Tools
» Garbage Collection Underlying JDK Improvements
» Replacing JDK Classes Underlying JDK Improvements
» VM Speed Variations VMs with JIT Compilers
» Other VM Optimizations Faster VMs
» Inline calls Remove dynamic type checks Unroll loops Code motion
» Literal constants are folded String concatenation is sometimes folded Constant fields are inlined
» Optimizations Performed When Using the -O Option
» Performance Effects From Runtime Options
» Compile to Native Machine Code
» Native Method Calls Underlying JDK Improvements
» Uncompressed ZIPJAR Files Underlying JDK Improvements
» Performance Checklist Underlying JDK Improvements
» Object-Creation Statistics Object Creation
» Pool Management Object Reuse
» Reusable Parameters Object Reuse
» String canonicalization Changeable objects
» Weak references Canonicalizing Objects
» Avoiding Garbage Collection Object Creation
» Preallocating Objects Lazy Initialization
» Performance Checklist Object Creation
» The Performance Effects of Strings
» Compile-Time Versus Runtime Resolution of Strings
» Converting bytes, shorts, chars, and booleans to Strings Converting floats to Strings
» Converting doubles to Strings
» Converting Objects to Strings
» Word-Counting Example Strings Versus char Arrays
» Line Filter Example HotSpot 1.0
» String Comparisons and Searches
» Sorting Internationalized Strings Strings
» The Cost of try-catch Blocks Without an Exception
» The Cost of try-catch Blocks with an Exception
» Using Exceptions Without the Stack Trace Overhead Conditional Error Checking
» no JIT 1.3 Variables Strings
» Method Parameters Performance Checklist
» Exception-Terminated Loops Loops and Switches
» no JIT 1.3 Loops and Switches
» Recursion Loops and Switches
» no HotSpot 1.0 2nd Loops and Switches
» Recursion and Stacks Loops and Switches
» Performance Checklist Loops and Switches
» Replacing System.out IO, Logging, and Console Output
» Logging From Raw IO to Smokin IO
» no JIT HotSpot 1.0 no JIT HotSpot 1.0 Serialization
» no IO, Logging, and Console Output
» Clustering Objects and Counting IO Operations
» Compression IO, Logging, and Console Output
» Performance Checklist IO, Logging, and Console Output
» Avoiding Unnecessary Sorting Overhead
» An Efficient Sorting Framework
» no HotSpot Better Than Onlogn Sorting
» User-Interface Thread and Other Threads
» Desynchronization and Synchronized Wrappers
» Avoiding Serialized Execution HotSpot 1.0
» no JIT no JIT HotSpot 1.0 Timing Multithreaded Tests
» Atomic Access and Assignment
» Free Load Balancing from TCPIP
» Load-Balancing Classes Load Balancing
» A Load-Balancing Example Load Balancing
» Threaded Problem-Solving Strategies Threading
» Collections Appropriate Data Structures and Algorithms
» Java 2 Collections Appropriate Data Structures and Algorithms
» Hashtables and HashMaps Appropriate Data Structures and Algorithms
» Cached Access Appropriate Data Structures and Algorithms
» Caching Example I Appropriate Data Structures and Algorithms
» Caching Example II Appropriate Data Structures and Algorithms
» Finding the Index for Partially Matched Strings
» Search Trees Appropriate Data Structures and Algorithms
» Comparing Communication Layers Distributed Computing
» Batching I Application Partitioning
» Compression Caching Low-Level Communication Optimizations
» Transfer Batching Low-Level Communication Optimizations
» Batching II Distributed Garbage Collection
» Performance Checklist Distributed Computing
» When Not to Optimize Tuning Class Libraries and Beans
» Scaling Design and Architecture
» Distributed Applications Design and Architecture
» Object Design Design and Architecture
» Use simulations and benchmarks Consider the total work done and the design overhead
» Tuning After Deployment When to Optimize
» User Interface Usability Training Server Downtime
» Performance Checklist When to Optimize
» Clustering Files Cached Filesystems RAM Disks, tmpfs, cachefs
» Disk Fragmentation Disk Sweet Spots
» RAM Underlying Operating System and Network Improvements
» Network Bottlenecks Network IO
» Performance Checklist Underlying Operating System and Network Improvements
Show more