- 97 -
o
Use methods that alter objects directly without making copies.
o
Create or use specific classes that handle primitive data types rather than wrapping the primitive data types.
•
Consider using a
ThreadLocal
to provide threaded access to singletons with state.
•
Use the
final
modifier on instance-variable definitions to create immutable internally accessible objects.
•
Use
WeakReference
s to hold elements in large canonical lookup tables. Use
SoftReference
s for cache elements.
•
Reduce object-creation bottlenecks by targeting the object-creation process.
o
Keep constructors simple and inheritance hierarchies shallow.
o
Avoid initializing instance variables more than once.
o
Use the
clone
method to avoid calling any constructors.
o
Clone arrays if that makes their creation faster.
o
Create copies of simple arrays faster by initializing them; create copies of complex arrays faster by cloning them.
•
Eliminate object-creation bottlenecks by moving object creation to an alternative time.
o
Create objects early, when there is spare time in the application, and hold those objects until required.
o
Use lazy initialization when there are objects or variables that may never be used, or when you need to distribute the load of creating objects.
o
Use lazy initialization only when there is a defined merit in the design, or when identifying a bottleneck which is alleviated using lazy initialization.
Chapter 5. Strings
Everyone has a logger and most of them are string pigs. —Kirk Pepperdine
String
s have a special status in Java. They are the only objects with:
•
Their own operators
+
and
+= •
A literal form characters surrounded by double quotes, e.g.,
hello •
Their own externally accessible collection in the VM and class files i.e., string pools, which provide uniqueness of
String
objects if the string sequence can be determined at compile time
String
s are immutable and have a special relationship with
StringBuffer
objects. A
String
cannot be altered once created. Applying a method that looks like it changes the
String
such as
String.trim
doesnt actually do so; instead, the method returns an altered copy of the
String
. Strings are also
final
, and so cannot be subclassed. These points have advantages and disadvantages so far as performance is concerned. For fast string manipulation, the inability to
subclass
String
or access the internal
char
array can be a serious problem.
5.1 The Performance Effects of Strings
Lets first look at the advantages of the
String
implementation:
•
Compilation creates unique strings. At compile time, strings are resolved as far as possible. This includes applying the concatenation operator and converting other literals to strings. So
hi7
and
hi+7
both get resolved at compile time to the same string, and are identical
- 98 - objects in the class string pool see the discussion in
Section 3.5.1.2 . Compilers differ in
their ability to achieve this resolution. You can always check your compiler e.g., by decompiling some statements involving concatenation and change it if needed.
•
Because
String
objects are immutable, a substring operation doesnt need to copy the entire underlying sequence of characters. Instead, a substring can use the same
char
array as the original string and simply refer to a different start point and endpoint in the
char
array. This means that substring operations are efficient, being both fast and conserving of memory; the
extra object is just a wrapper on the same underlying
char
array with different pointers into that array.
[1] [1]
Strings are implemented in the JDK as an internal
char
array with index offsets actually a start offset and a character count. This basic structure is extremely unlikely to be changed in any version of Java.
• String
s have strong support for internationalization . It would take a large effort to reproduce the internationalization support for an alternative class.
•
The close relationship with
StringBuffer
s allows
String
s to reference the same
char
array used by the
StringBuffer
. This is a double-edged sword. For typical practice, when you use a
StringBuffer
to manipulate and append characters and data types, and then convert the final result to a
String
, this works just fine. The
StringBuffer
provides efficient mechanisms for growing, inserting, appending, altering, and other types of
String
manipulation. The resulting
String
then efficiently references the same
char
array with no extra character copying. This is very fast and reduces the number of objects being used to a
minimum by avoiding intermediate objects. However, if the
StringBuffer
object is subsequently altered, the
char
array in that
StringBuffer
is copied into a new
char
array that is now referenced by the
StringBuffer
. The
String
object retains the reference to the previously shared
char
array. This means that copying overhead can occur at unexpected points in the application. Instead of the copying occurring at the
toString
method call, as might be expected, any subsequent alteration of the
StringBuffer
causes a new
char
array to be created and an array copy to be performed. To make the copying overhead occur at predictable times, you could explicitly execute some method that makes the copying
occur, such as
StringBuffer.setLength
. This allows
StringBuffer
s to be reused with more predictable performance.
The disadvantages of the
String
implementation are:
•
Not being able to subclass
String
means that it is not possible to add behavior to
String
for your own needs.
•
The previous point means that all access must be through the restricted set of currently available
String
methods, imposing extra overhead.
•
The only way to increase the number of methods allowing efficient manipulation of
String
characters is to copy the characters into your own array and manipulate them directly, in which case
String
is imposing an extra step and extra objects you may not need.
• char
arrays are faster to process directly.
•
The tight coupling with
StringBuffer
can lead to unexpectedly high memory usage. When
StringBuffer.toString
creates a
String
, the current underlying array holds the string, regardless of the size of the array i.e., the capacity of the
StringBuffer
. For example, a
StringBuffer
with a capacity of 10,000 characters can build a string of 10 characters. However, that 10-character
String
continues to use a 10,000-
char
array to store the 10 characters. If the
StringBuffer
is now reused to create another 10-character string, the
StringBuffer
first creates a new internal 10,000-
char
array to build the string with; then the new
String
also uses that 10,000-
char
array to store the 10 characters. Obviously, this process can continue indefinitely, using vast amounts of memory where not expected.
- 99 - The advantages of
String
s can be summed up as ease of use, internationalization support, and compatibility to existing interfaces. Most methods expect a
String
object rather than a
char
array, and
String
objects are returned by many methods. The disadvantage of
String
s boils down to inflexibility. With extra work, most things you can do with
String
objects can be done faster and with less intermediate object-creation overhead by using your own set of
char
array manipulation methods.
For most performance tuning, you pinpoint a bottleneck and make localized changes to objects and methods that speed up that bottleneck. But
String
tuning often involves converting to
char
arrays, whereas you rarely come across
public
methods or interfaces that deal in
char
arrays. This makes it difficult to switch between
String
s and
char
arrays in any localized way. The consequences are that you either have to switch back and forth between
String
s and
char
arrays, or you have to make extensive modifications that can reach across many application boundaries. I have no easy
solution for this problem.
String
tuning can get messy. It is difficult to handle
String
internationalization capabilities using raw
char
arrays. But in many cases, internationalized
String
s form a specific subset of
String
usage in an application, mainly in the user interface, and that subset of
String
s rarely causes bottlenecks. You should differentiate between
String
s that need internationalization and those that are simply processing characters, independent of language. These latter
String
s can be replaced for tuning with
char
arrays.
[2]
Internationalization-dependent
String
s are more difficult to tune, and I provide some examples of tuning these later in the chapter. Note also that internationalized
String
s can be treated as
char
arrays for some types of processing without any problems; see Section 5.4.2
later in this chapter.
[2]
My editor summarized this succinctly with the statement, Avoid using
String
objects if you dont intend to represent text.
5.2 Compile-Time Versus Runtime Resolution of Strings