Menu Close

String concatenation in Java 9 (part 2): Conversion confusion

String concatenation in Java 9 (part 2): Conversion confusion

By: Tim Van Den Broecke - Software engineer

 
 This text was updated to reflect feedback by Aleksey Shipilëv, principal software engineer in OpenJDK development.
 

In most cases, the Java 9 String concatenation change (more details in our previous blog) is a clean upgrade. Under some circumstances, however, the same Java program behaves differently when compiled with JDK 8 or with JDK 9. In this blog, we go down the rabbit hole once more to talk about the curious behavior of String conversion order in the new concatenation methods.

The puzzle

 
In the following example, the class Alice.java tries to quote a short line from Alice’s Adventures in Wonderland. The terminal output below shows the result of compiling and running Alice.java, first with Java 8 and then with Java 9.

~ $ jdk-8/bin/javac Alice.java 
~ $ jdk-8/bin/java Alice 
Do cats eat bats? 
~ $ jdk-9/bin/javac Alice.java 
~ $ jdk-9/bin/java Alice 
Do bats eat cats?

As you can see, the order of the words has changed. The concatenation arguments are no longer processed and appended from left to right in Java 9. Instead, they’re all consumed from the stack at once and processed in an order determined by the dynamically generated implementation.

Side effects

 
The implementation of Alice.java is shown below.

public class Alice 
{
    String[] wonders     = new String[]{" cats", " eat", " bats"};
    int      wonderIndex = 0;

    @Override public String toString() 
    {
        return wonders[wonderIndex++];
    }

    public static void main(String[] args)
    {
        Alice alice = new Alice();
        System.out.println("Do" + alice + alice + alice + '?');
    }
}

Instances of Alice contain a simple String array called wonders and an integer wonderIndex that is used to fetch its elements. The overridden Object.toString() method fetches elements from this array with a side effect. It increments the wonderIndex each time it is called. The main method of this class simply creates an instance of itself and then prints that instance three times in a concatenated String. Before an Object can be concatenated, however, the JVM needs to convert it into a String. To do that, it uses Object.toString().

The contract of Object.toString() states that the returned String should give some informative description of the Object instance. Since toString() is available on every Java class, it is the go-to method to acquire any readable representation of Object state in general and can therefore be used by many different agents without your prior knowledge.

Although not specified in the JLS, any seasoned developer assumes and enforces String conversion to be an idempotent operation. If you call it multiple times on the same Object without changing its state, you assume the result will be the same. This criterium is trivially achieved for primitive values, but for Objects this property relies on the implementation of toString(). Violating it can result in problems that are hard to debug and, ironically, the act of debugging can further complicate the matter since debuggers often use toString() themselves to display Object instances. For example, Alice.java violates idempotence by adding the side effect on wonderIndex. If we place a breakpoint in our Alice.toString() method and debug the class, we run into an ArrayIndexOutOfBoundsException before the program finishes.

Up until Java 8, both the intuition and the result of the expression "Do" + alice + alice + alice + '?' is that toString() is called from left to right. Java 9 foregoes this intuition and takes advantage of the lack of a language specification entry on the order of String conversion within a String concatenation expression. The default dynamically generated concatenation method constructs the concatenated String in reverse order, resulting in the switched order of the words.

To find out why it behaves like that, we need to take a look at the openJDK source code.

Concatenation strategies

 
The first thing we learn from looking at StringConcatFactory, the bootstrap method for String concatention, is that there are six different concatenation strategies to choose from (using the command line argument -Djava.lang.invoke.stringConcat=<strategy_name>) that define how the generated method will perform the concatenation. They are divided in two groups: one generates bytecode similar to the way DexGuard and ProGuard perform backporting, the other relies on the MethodHandle class and its utility class MethodHandles to construct a new method. The default strategy MH_INLINE_SIZED_EXACT is of the second kind. It is the only strategy that doesn’t use StringBuilder anywhere.

When we try out these strategies, we learn a second lesson: the two strategy groups behave differently. The terminal output below shows that the MethodHandle strategies (prefixed MH_) concatenate the Strings in reverse order, while the bytecode strategies (prefixed BC_) concatenate them in the original order. The culprit appears to be the MethodHandle. This class was added to the JDK 7 as a lightweight and more powerful alternative to reflection. It supports many interesting features borrowed from functional programming and is used extensively in StringConcatFactory.

~ $ jdk-9/bin/java Alice
Do bats eat cats?
~ $ jdk-9/bin/java -Djava.lang.invoke.stringConcat=MH_SB_SIZED Alice
Do bats eat cats?
~ $ jdk-9/bin/java -Djava.lang.invoke.stringConcat=BC_SB_SIZED Alice
Do cats eat bats?

Looking at the way StringConcatFactory uses MethodHandles, we learn a third and final lesson: MethodHandles are constructed bottom-up. Using the default strategy, the source code warns us that the implementation is “arguably hard to read”. However, we need only look at the comment eight lines below it to confirm our suspicion. It states: “We are assembling the String backwards”. The code underneath it cleverly strings methods together using the MethodHandles class to create the desired functionality. They perform element-wise String conversion followed by conversion to a byte array, then combine the byte arrays and finally form a new String from the collected byte array. The other two MethodHandle strategies employ StringBuilder instead of a byte array, but the order of application is backwards too.

Edit 26/03/2018: The developers at OpenJDK pointed out that the rabbit hole still goes deeper. It is actually the implementation of MethodHandles.filterArguments that applies filters from right to left. In the case of StringConcatFactory, those filters are the String converters. That means it isn’t a problem that the MethodHandle is constructed backwards as stated in the paragraph above.

Bug or quirk

 
The new behavior can be seen in two ways. On the one hand, it could be labeled a bug. It breaks backward compatibility by producing different results depending on your Java version. Additionally, the API doc of StringConcatFactory states that the recipe is processed left to right. Although technically correct for the default strategy, the other two MethodHandle strategies explicitly process it from right to left. The JLS is also clear in ‘guaranteeing’ order of expression evaluation from left to right, although following up immediately with a note recommending code doesn’t crucially rely on it.

On the other hand, it could be labeled a simple quirk. Firstly, the new behavior doesn’t break String concatenation JLS and there is an argument to be made for the wording in the StringConcatFactory documentation. Secondly, even with a poor implementation of toString(), the behavior discussed in this blog only arises in this specific mix of conditions. For example, the alternative printing statements in the snippets below print the same result with Java 8 and 9. The first is because only a single String conversion is applied per concatenation. The second is because the String conversion is done manually so the concatenation method will receive three Strings where it received three Objects before.

System.out.print("Do" + alice);
System.out.print(       alice);
System.out.println(     alice + '?');
System.out.println(
    "Do" + alice.toString() + alice.toString() + alice.toString() + '?');

And finally, you should simply never override Object.toString() with side effects. There are many more reasons for that than we discuss here, each worthy of their own blog.

Edit 26/03/2018: The behavior is classified as a MethodHandles.filterArguments bug in the OpenJDK bug tracker and is fixed in JDK 11 (possibly JDK 10 soon as well). A related entry calls for tests on the behavior discussed in this blog.

Epilogue

 
We can use the discussed bug/quirk to our literary advantage. Change the println statement in Alice.java to print and then run the bash script below to have the full Wonderland quote printed on your terminal in a nonsensical and convoluted manner.

~ $ ./alice.sh 
And here Alice began to get rather sleepy, and went on saying to herself, 
in a dreamy sort of way, 'Do cats eat bats? Do cats eat bats?' and sometimes,
'Do bats eat cats?' for, you see, as she couldn't answer either question, 
it didn't much matter which way she put it.

 

#!/bin/bash
# Clean up.
clear
# Print first part of the quote.
echo -n "And here Alice began to get rather sleepy, and went on saying to herself, in a dreamy sort of way, '"
# Use Java 8 to print Alice's first wonder.
jdk-8/bin/javac Alice.java
jdk-8/bin/java Alice
# Leave some space for Alice's wonder to repeat itself.
echo -n " "
# Once more.
jdk-8/bin/javac Alice.java
jdk-8/bin/java Alice
# Chain to the rest of the quote.
echo -n "' and sometimes, '"
# Use Java 9 to print Alice's rephrased wonder.
jdk-9/bin/javac Alice.java
jdk-9/bin/java Alice
# For, you see, as the code is written in poor habit, it didn't much matter which way you execute it.
echo "' for, you see, as she couldn't answer either question, it didn't much matter which way she put it."

There is a big chance that you haven’t run into this behavior and that you never will. If you do encounter it, consider it a good time to adopt some new good programming practices.