June 9, 2020

    5 Ways to Take Advantage of the Newly Released ProGuardCORE Project

    Our new open source project, ProGuardCORE, provides a powerful basis to read, write, create, analyze, and process Java bytecode. After having proven itself for years as the basis of the ProGuard obfuscator and optimizer, ProGuardCORE is now available as a separate project. If you're into bytecode engineering, this library may be useful for you.

    The foundation of ProGuardCORE is a clean implementation of all the elements of the binary Java class file format. It can, of course, read, write, create and edit bytecode, but it also offers more advanced features, such as preverification and abstract evaluation. These building blocks are useful for automatic bytecode optimization, instrumentation, logging and debugging.

    Let's take a tour of some of the features of ProGuardCORE. We'll show you how you can perform some common tasks, both trivial and complex, with simple code snippets.

    1. Printing Out Bytecode

    ProGuardCORE contains assorted classes to read and write directories, archives, and, of course, class files. The following snippet prints out the bytecode of all classes in a given jar:

    DataEntrySource source =
        new FileSource(new File(inputJarFileName));
    
    source.pumpDataEntries(
        new JarReader(
        new ClassFilter(
        new ClassReader(false, false, false, false, null,
        new ClassPrinter()))));

    Even without documentation, the snippet is largely self-explanatory. All classes are building blocks in a clean API based on the visitor pattern. The implementations typically perform small tasks that you can compose easily. You could add filters on the classes and class members, or transform them on the fly. You can find all details in the documentation and samples.

    2. Creating Classes Programmatically

    Sometimes you just want to create classes from scratch. You can then create assembly files and pass them through our ProGuard Assembler. For more flexibility, you can also create them programmatically. For example, you may want to inject logging code, performance tracking or glue code in existing software. Or maybe you want to generate a parser for a data format. The following concise code creates the iconic HelloWorld class:

    ProgramClass programClass =
        new ClassBuilder(
            VersionConstants.CLASS_VERSION_1_8,
            AccessConstants.PUBLIC,
            "HelloWorld",
            ClassConstants.NAME_JAVA_LANG_OBJECT)
    
            .addMethod(
                AccessConstants.PUBLIC |
                AccessConstants.STATIC,
                "main",
                "([Ljava/lang/String;)V",
                50,
    
                code -> code
                    .getstatic("java/lang/System", "out", "Ljava/io/PrintStream;")
                    .ldc("Hello, world!")
                    .invokevirtual("java/io/PrintStream", "println", "(Ljava/lang/String;)V")
                    .return_())
    
            .getProgramClass();
    
    

    The data structures directly correspond to the bytecode specifications. You can create classes, with fields and methods, with sequences of instructions, all in a fluent style. ProGuardCORE can preverify the code for you and write it out. Again, you can find more information in the documentation and samples.

    3. Replacing Instruction Sequences

    At other times, you may have an existing code base and you want to make small changes throughout. In many cases, you simply want to replace some bytecode instruction sequences by other instruction sequences. ProGuardCORE has an excellent instruction pattern matching engine to help. You can apply it to inject logging calls or to perform peephole optimizations. For example, you can replace instruction sequence "putstatic X, getstatic X" by the equivalent but more compact instruction sequence "dup, putstatic X", where the wildcard X can match any constant index:

    Instruction[][] replacements =
    {
    	comp.putstatic(X)
              .getstatic(X).instructions(),
    	comp.dup()
        	    .putstatic(X).instructions()
    };
    ...
    programClassPool.classesAccept(
    	new AllMethodVisitor(
    	new AllAttributeVisitor(
    	new PeepholeEditor(branchTargetFinder, codeAttributeEditor,
    	new InstructionSequenceReplacer(constants, replacements, branchTargetFinder, codeAttributeEditor)))));

    In short, the snippet defines the instruction sequences and then applies the instruction sequence replacer to all code attributes of all methods of all classes previously loaded in a class pool. You can find the full explanation in the documentation and samples.

    4. Analyzing Code

    ProGuardCORE provides a number of ways to analyze code, from simply inspecting all instructions to reconstructing the control flow and the data flow. One powerful technique is abstract evaluation (closely related to symbolic execution). It executes the code, but instead of computing with concrete values on the stack, in the local variables, in the fields, etc., it can compute with abstract values, like "an integer", or "an integer between 0 and 5", or "the sum of integer #3 and integer #7". At the same time, it tracks where the values come from and where they go to. For example, consider this basic method:

    public static int getAnswer() {
        int f1 = 6;
        int f2 = 7;
        return f1 * f2;
    }
    

    ProGuardCORE can read the compiled bytecode, evaluate it, and provide the following results. The table represents the bytecode instructions at their offsets, and the resulting evolving stacks and local variables:

    Each value slot has a numerical value and its origin (an instruction offset), presented as `origin:value`. You can see how the method returns, popping the integer value 42 from the stack. Of course, this is a simple case. The code can also have unknown parameters. For example:

    private static int getAnswer(int a, int b) {
        return 2 * a + b;
    }
    

    The evaluator can then proceed with symbolic representations i0 and i1 for the parameters:

    The code returns a symbolic expression of the method arguments.

    The documentation and samples show different levels of abstraction for different types of values and more complex code constructs.

    5. Read, Write & Modify Kotlin Metadata

    ProGuardCORE provides the ability to read, write and modify the Kotlin metadata that is attached to Java classes. You can combine Java and Kotlin metadata visitors as shown in the following example which prints all the names of Kotlin functions in the metadata attached to the Java class “Foo”:

    programClassPool.classesAccept(
       new ClassNameFilter("Foo",
       new ReferencedKotlinMetadataVisitor(
       new AllFunctionsVisitor(
           (clazz, container, function) -> System.out.println(function.name)))));

    Explore & Contribute—We Want to See Your Creations!

    We expect ProGuardCORE to be useful for all advanced developers who are working with Java bytecode. You can find the project on GitHub. It comes with documentation and samples, and plenty of working code for reference in ProGuard, the ProGuard Assembler and Disassembler, and the Kotlin Metadata Printer. You can use the library under the flexible Apache 2 license.


    We’d love to see your creations, so hit us up on Twitter @Guardsquare to share the work you do using ProGuardCORE.
     

    Guardsquare

    Discover how Guardsquare provides industry-leading protection for mobile apps.

    Request Pricing

    Other posts you might be interested in