The Performance Effects of Strings

- 97 - o Use methods that alter objects directly without making copies. o Create or use specific classes that handle primitive data types rather than wrapping the primitive data types. • Consider using a ThreadLocal to provide threaded access to singletons with state. • Use the final modifier on instance-variable definitions to create immutable internally accessible objects. • Use WeakReference s to hold elements in large canonical lookup tables. Use SoftReference s for cache elements. • Reduce object-creation bottlenecks by targeting the object-creation process. o Keep constructors simple and inheritance hierarchies shallow. o Avoid initializing instance variables more than once. o Use the clone method to avoid calling any constructors. o Clone arrays if that makes their creation faster. o Create copies of simple arrays faster by initializing them; create copies of complex arrays faster by cloning them. • Eliminate object-creation bottlenecks by moving object creation to an alternative time. o Create objects early, when there is spare time in the application, and hold those objects until required. o Use lazy initialization when there are objects or variables that may never be used, or when you need to distribute the load of creating objects. o Use lazy initialization only when there is a defined merit in the design, or when identifying a bottleneck which is alleviated using lazy initialization.

Chapter 5. Strings

Everyone has a logger and most of them are string pigs. —Kirk Pepperdine String s have a special status in Java. They are the only objects with: • Their own operators + and += • A literal form characters surrounded by double quotes, e.g., hello • Their own externally accessible collection in the VM and class files i.e., string pools, which provide uniqueness of String objects if the string sequence can be determined at compile time String s are immutable and have a special relationship with StringBuffer objects. A String cannot be altered once created. Applying a method that looks like it changes the String such as String.trim doesnt actually do so; instead, the method returns an altered copy of the String . Strings are also final , and so cannot be subclassed. These points have advantages and disadvantages so far as performance is concerned. For fast string manipulation, the inability to subclass String or access the internal char array can be a serious problem.

5.1 The Performance Effects of Strings

Lets first look at the advantages of the String implementation: • Compilation creates unique strings. At compile time, strings are resolved as far as possible. This includes applying the concatenation operator and converting other literals to strings. So hi7 and hi+7 both get resolved at compile time to the same string, and are identical - 98 - objects in the class string pool see the discussion in Section 3.5.1.2 . Compilers differ in their ability to achieve this resolution. You can always check your compiler e.g., by decompiling some statements involving concatenation and change it if needed. • Because String objects are immutable, a substring operation doesnt need to copy the entire underlying sequence of characters. Instead, a substring can use the same char array as the original string and simply refer to a different start point and endpoint in the char array. This means that substring operations are efficient, being both fast and conserving of memory; the extra object is just a wrapper on the same underlying char array with different pointers into that array. [1] [1] Strings are implemented in the JDK as an internal char array with index offsets actually a start offset and a character count. This basic structure is extremely unlikely to be changed in any version of Java. • String s have strong support for internationalization . It would take a large effort to reproduce the internationalization support for an alternative class. • The close relationship with StringBuffer s allows String s to reference the same char array used by the StringBuffer . This is a double-edged sword. For typical practice, when you use a StringBuffer to manipulate and append characters and data types, and then convert the final result to a String , this works just fine. The StringBuffer provides efficient mechanisms for growing, inserting, appending, altering, and other types of String manipulation. The resulting String then efficiently references the same char array with no extra character copying. This is very fast and reduces the number of objects being used to a minimum by avoiding intermediate objects. However, if the StringBuffer object is subsequently altered, the char array in that StringBuffer is copied into a new char array that is now referenced by the StringBuffer . The String object retains the reference to the previously shared char array. This means that copying overhead can occur at unexpected points in the application. Instead of the copying occurring at the toString method call, as might be expected, any subsequent alteration of the StringBuffer causes a new char array to be created and an array copy to be performed. To make the copying overhead occur at predictable times, you could explicitly execute some method that makes the copying occur, such as StringBuffer.setLength . This allows StringBuffer s to be reused with more predictable performance. The disadvantages of the String implementation are: • Not being able to subclass String means that it is not possible to add behavior to String for your own needs. • The previous point means that all access must be through the restricted set of currently available String methods, imposing extra overhead. • The only way to increase the number of methods allowing efficient manipulation of String characters is to copy the characters into your own array and manipulate them directly, in which case String is imposing an extra step and extra objects you may not need. • char arrays are faster to process directly. • The tight coupling with StringBuffer can lead to unexpectedly high memory usage. When StringBuffer.toString creates a String , the current underlying array holds the string, regardless of the size of the array i.e., the capacity of the StringBuffer . For example, a StringBuffer with a capacity of 10,000 characters can build a string of 10 characters. However, that 10-character String continues to use a 10,000- char array to store the 10 characters. If the StringBuffer is now reused to create another 10-character string, the StringBuffer first creates a new internal 10,000- char array to build the string with; then the new String also uses that 10,000- char array to store the 10 characters. Obviously, this process can continue indefinitely, using vast amounts of memory where not expected. - 99 - The advantages of String s can be summed up as ease of use, internationalization support, and compatibility to existing interfaces. Most methods expect a String object rather than a char array, and String objects are returned by many methods. The disadvantage of String s boils down to inflexibility. With extra work, most things you can do with String objects can be done faster and with less intermediate object-creation overhead by using your own set of char array manipulation methods. For most performance tuning, you pinpoint a bottleneck and make localized changes to objects and methods that speed up that bottleneck. But String tuning often involves converting to char arrays, whereas you rarely come across public methods or interfaces that deal in char arrays. This makes it difficult to switch between String s and char arrays in any localized way. The consequences are that you either have to switch back and forth between String s and char arrays, or you have to make extensive modifications that can reach across many application boundaries. I have no easy solution for this problem. String tuning can get messy. It is difficult to handle String internationalization capabilities using raw char arrays. But in many cases, internationalized String s form a specific subset of String usage in an application, mainly in the user interface, and that subset of String s rarely causes bottlenecks. You should differentiate between String s that need internationalization and those that are simply processing characters, independent of language. These latter String s can be replaced for tuning with char arrays. [2] Internationalization-dependent String s are more difficult to tune, and I provide some examples of tuning these later in the chapter. Note also that internationalized String s can be treated as char arrays for some types of processing without any problems; see Section 5.4.2 later in this chapter. [2] My editor summarized this succinctly with the statement, Avoid using String objects if you dont intend to represent text.

5.2 Compile-Time Versus Runtime Resolution of Strings