January 28, 2018

    String concatenation in Java 9 (part 1): Untangling invokeDynamic

    The arrival of Java 9 in September 2017 introduced some interesting changes to the javac compiler and the Java internals. Some noteworthy elements are the completion of project Jigsaw, allowing for the construction of minimal JREs by shattering the old JRE monolith in well-defined modules, and the compact internal String representation based on the most popular encodings.

    In this blog, we will focus on a more low-level change to the handling of Strings, namely the “indified String concatenation”. We will show how the compiler output has changed, then we will summarize the mechanism for dynamic method invocation in Java and finally we explain how DexGuard (version 8.1 or higher) and ProGuard (version 6.0 or higher) perform backporting on Java 9 class files.

    Working example:
                String hello = “Hello “;
                String world = “world!“;
                String message = hello + world;

    Compiler changes

     The change is in the Java bytecode that the Java 9 javac compiler outputs. In Java 8 and earlier, the last line of our source code snippet above would compile to the following bytecode (some details and fully qualified class names are omitted for readability):

    6:  new #4      // class StringBuilder
    9:  dup
    10: invokespecial #5      // Method StringBuilder."<init>"
    13: aload_1	          // String Hello
    14: invokevirtual #6      // Method StringBuilder.append:(LString;)LStringBuilder; 
    17: aload_2		  // String world!
    18: invokevirtual #6      // Method StringBuilder.append:(LString;)LStringBuilder;
    21: invokevirtual #7      // Method StringBuilder.toString:()LString;

    The bytecode that is generated now is much shorter, but also introduces two new elements: an invokedynamic instruction (abbreviated to indy in the rest of the text) and a bootstrap method.

    6: aload_1		  // String Hello 
    7: aload_2		  // String world!
    8: invokedynamic #4, 0    // InvokeDynamic #0:makeConcatWithConstants:
                              // (LString;LString;)LString;
    
    BootstrapMethods:
       0: #19 invokeStatic StringConcatFactory.makeConcatWithConstants:
         (… , … , LMethodType; , LString; , … ) LCallSite;
         Method arguments:
            #20 \u0001\u0001
    

    The reason to change the compiler now in this way is, from the project description, to “enable future optimizations of String concatenation without requiring further changes to the bytecode emitted by javac.”. Dynamic method invocation is an ideal solution for that challenge, as it delays method implementation to the runtime. The developers of the Java runtime can then improve the implementation of the factory class, without all other developers needing to recompile their projects.

    Dynamic method invocation in Java

     Recall that dynamic method invocation in Java works as follows: first, the compiler places an invokedynamic bytecode instruction in your method body to indicate that we’re trying to use a dynamic method there. That indy instruction refers to a bootstrap method, which is a regular Java method that is stored in a special attribute in the class file. During runtime, this bootstrap method is called to dynamically create the method we’re trying to invoke and wrap it in a container object called a CallSite. Finally, the JVM extracts a MethodHandle for the newly generated method from the CallSite and executes the method, manipulating the stack as if it were a regular method invocation.

    The challenging part of this process is in the creation of the CallSite that contains the newly generated method. Bootstrap methods are user-defined by design, but the JRE ships with classes that define some for us. In the case of JRE9, we have the class StringConcatFactory. It defines the two String concatenation bootstrap methods that are used by javac.

    In the Java 9 bytecode snippet above, we have two elements highlighted that offer some insight in these methods. The first is the MethodType argument. The compiler deduces the descriptor for the specific concatenation that we’re trying to perform and the JVM supplies it as a MethodType object to the bootstrap at runtime. It is made human-readable in the javap comment of the indy instruction. The StringConcatFactory uses this info to generate a CallSite containing a String concatenation method with matching descriptor.

    The second highlighted element is the first “Method arguments:” entry. Note that it is actually the fourth argument of the bootstrap method, but the JVM fills in the first three automatically. It is a String that represents the “recipe” for the concatenation. It uses two marker characters \u0001 and \u0002 to indicate whether the method should consume an argument from the stack or load an argument from the constants passed to the bootstrap method (the varargs Object array). The recipe of our snippet above, for example, indicates that the concatenation method should consume two values from the stack. This recipe-based approach especially shines when we for example change our last line of source code to String message = hello + “,“ + world;. The compiler then generates the following output:

    6: aload_1		  // String Hello 
    7: aload_2		  // String world!
    8: invokedynamic #4, 0    // InvokeDynamic #0:makeConcatWithConstants:
                              // (LString;LString;)LString;
    
    BootstrapMethods:
       0: #19 invokeStatic StringConcatFactory.makeConcatWithConstants:
         (… , … , LMethodType; , LString; , … ) LCallSite;
         Method arguments:
            #20 \u0001,\u0001
    

    There are still only two load instructions: one for hello and one for world. Can you see where our comma is loaded? How about when we change the source code to this: String message = hello + “ there, “ + world; ?

    6: aload_1		  // String Hello 
    7: aload_2		  // String world!
    8: invokedynamic #4, 0    // InvokeDynamic #0:makeConcatWithConstants:
                              // (LString;LString;)LString;
    
    BootstrapMethods:
       0: #19 invokeStatic StringConcatFactory.makeConcatWithConstants:
         (… , … , LMethodType; , LString; , … ) LCallSite;
         Method arguments:
            #20 \u0001 there, \u0001
    

    Since the concatenation always happens with the “ there, “ String in the middle, the Java compiler simply defines it in our recipe. This way, we skip a load instruction and the generated method can optimize over the concatenation of the specific constant String.

    The final piece of the puzzle is in the implementation of these dynamically generated methods. A bootstrap method is entirely free to choose the implementation details of the resulting method. The only constraint is that it needs to match the descriptor that was passed to it via the MethodType argument. The StringConcatFactory class has several different options that can be specified to the JVM, including one that simply constructs a StringBuilder chain.

    Backporting

     The Dalvik VM that’s shipped with Android versions up to Nougat does not support dynamic method invocation. This means that a classfile with invokeDynamic instructions and bootstrap methods cannot be converted to a valid DEX file entry if it needs to be deployed on devices that don’t run the latest Android Oreo yet (which is about 99.5% of devices at the time of writing). DexGuard, ProGuard, Retrolambda and Gradle’s desugaring task all offer similar ways to backport the class files that use invokeDynamic instructions for Java 8 features, meaning that all unsupported features in the bytecode are replaced with older mechanisms that result in the same behavior. This backported code may be less efficient than its predecessor as it doesn’t benefit from the latest VM infrastructure, but at least it is compatible with older versions.

    DexGuard and ProGuard now additionally fully support Java 9 class files and backport the code where it is required. In the case of String concatenations, we decided to replace the indy instruction with the following series of instructions:

    1. Pop the relevant stack values into local variables
    2. Create a StringBuilder with an estimated initial size
    3. Push the variables or load new constants one by one and append them
    4. Call toString() on the StringBuilder to end with the concatenated String on the stack

    There are three main challenges to overcome during the refactoring. The first challenge is that we are replacing one bytecode instruction with many more (the entire StringBuilder chain). DexGuard and ProGuard use their analysis of the code structure to automatically ensure that all class elements are updated to be consistent with the changed method body (i.e. branching instruction targets, line number tables etc.).

    The second challenge is that multiple indy instructions will refer to the same bootstrap method when the compiler notices that they expect the same method descriptor. This is a neat little optimization since the CallSite only needs to be created once. It also means that the backporting first needs to replace all indy instructions, leaving their referenced bootstrap methods lingering around, and then needs to clean the bootstrap methods attribute in a separate step. DexGuard and ProGuard perform this two-step backporting thoroughly and clean the constant pool and inner classes attribute as well.

    The third challenge, as we already mentioned, is that the concatenation recipe can contain Strings that are not mentioned or stored anywhere else in the class file. This means we may need to add completely new StringConstants to the constant pool and inject instructions to load them at the right moment. DexGuard and ProGuard automatically create the chain of required constant pool entries and load the correct entry to be appended.

    Android integration

     So can we all start using Java 9 in our Android projects now? Almost. Gradle currently allows you to use the Java 9 javac compiler, but only with a target version lower than or equal to 8. There is no way to produce Java 9 class files during the build process yet, because Gradle offers no backporting solution and thus the output may be invalid. You can, however, include libraries with compiled Java 9 classes and leave it to DexGuard or ProGuard to backport them. And when Gradle decides to enable Java 9 support, developers will have this backporting solution ready for their own code as well.

    Tag(s): Android , Technical , Dexguard

    Guardsquare

    Discover how Guardsquare provides industry-leading protection for mobile apps.

    Request Pricing

    Other posts you might be interested in