String canonicalization Changeable objects

- 86 - There are various other frequently used objects throughout an application that should be canonicalized. A few that spring to mind are the empty string, empty arrays of various types, and some dates.

4.2.4.1 String canonicalization

There can be some confusion about whether String s are already canonicalized. There is no guarantee that they are, although the compiler can canonicalize String s that are equal and are compiled in the same pass. The String.intern method canonicalizes strings in an internal table. This is supposed to be, and usually is, the same table used by strings canonicalized at compile time, but in some earlier JDK versions e.g., 1.0, it was not the same table. In any case, there is no particular reason to use the internal string table to canonicalize your strings unless you want to compare String s by identity see Section 5.5 . Using your own table gives you more control and allows you to inspect the table when necessary. To see the difference between identity and equality comparisons for String s, including the difference that String.intern makes, you can run the following class: public class Test { public static void mainString[] args { System.out.printlnargs[0]; see that we have the empty string should be true System.out.printlnargs[0].equals; should be false since they are not identical objects System.out.printlnargs[0] == ; should be true unless there are two internal string tables System.out.printlnargs[0].intern == ; } } This Test class, when run with the command line: java Test gives the output: true false true

4.2.4.2 Changeable objects

Canonicalizing objects is best for read-only objects and can be troublesome for objects that change. If you canonicalize a changeable object and then change its state, then all objects that have a reference to the canonicalized object are still pointing to that object, but with the objects new state. For example, suppose you canonicalize a special Date value. If that object has its date value changed, all objects pointing to that Date object now see a different date value. This result may be desired, but more often it is a bug. If you want to canonicalize changeable objects, one technique to make it slightly safer is to wrap the object with another one, or use your own subclass. [5] Then all accesses and updates are controlled - 87 - by you. If the object is not supposed to be changed, you can throw an exception on any update method. Alternatively, if you want some objects to be canonicalized but with copy-on-write behavior, you can allow the updater to return a noncanonicalized copy of the canonical object. [5] Beware that using a subclass may break the superclass semantics. Note that it makes no sense to build a table of millions or even thousands of strings or other objects if the time taken to test for, access, and update objects in the table is longer than the time you are saving canonicalizing them.

4.2.4.3 Weak references