June 29, 2022

    Obstacles in Dart Decompilation & the Impact on Flutter™ App Security

    This is part 2 of a blog series. Read part 1 here: The Current State & Future of Reversing Flutter™ Apps

    In our recent blog about Flutter in a mobile security context, we set out to describe the  current state of reverse engineering tooling, the problems attackers would face and the direction in which things are likely to evolve. Our goal was to investigate if Flutter apps are more resilient to reverse engineering than other types of mobile applications. We already demonstrated that the metadata included in Dart snapshot could be extracted and used to ease reverse engineering. Namely, we showed that it is possible to recover class and function names and use them to automatically locate and rename functions in a reverse engineering database, which allows reverse engineers to use this information to focus on what matters: application-specific code. We also noticed that even though the resulting decompiled code had function names, it was still not very clean and difficult to understand.

    In this post, we will investigate whether the fact that the decompiled code is hard to understand is fundamentally linked to Dart code being hard to reverse engineer or if it is linked to the current lack of support of Flutter by reverse engineering tools. We performed several experiments to verify if this code can be cleaned up in order to make it more closely resemble the original Dart code.

    Just like in the previous post, we will use an obfuscated build of NyaNya Rocket!, thanks again to CaramelDunes for letting us use it. Using an obfuscated build has no impact on our experiment since Flutter obfuscation just renames entities and does not perform any actual code obfuscation.

    Why does the decompiled code look weird?

    First let’s have a quick look at the initial decompiled code that we have after renaming, and see if we can identify some of the underlying issues that are making the decompiled code look weird.

    blog_2_initial_decompiled_code

    All the strange looking artifacts that you can see are linked to the third obstacle that we discussed in the previous post: the Dart code depends on the Dart VM to be executed. 

    The three main issues that we can see in previous image are linked to the three following characteristics of the Dart code:

    1. All Dart objects are accessed through the object pool. Thus, the decompiled code only contains the index of the object accessed and does not reference the object itself.
    2. The Dart VM uses a custom stack indexed by registerX15, which prevents reverse engineering tools from correctly identifying local stack variables out of the box.
    3. Dart code uses a non-standard ARM64 ABI where most of the parameters of function calls are pushed to the Dart VM stack. Thus, IDA Pro isn’t able to identify function parameters and it considers that functions have no parameters.  

    Dart code characteristic 1: Object pool indirection

    The first issue is closely linked to the way Dart snapshots work. When you compile a Flutter application, all Dart objects are serialized and stored inside the Dart snapshot. Thus the Dart code can’t access them directly. When the Flutter runtime loads your application, these objects are deserialized and stored on the Flutter heap. Because of that, Dart code can’t know the addresses of these deserialized Dart objects at compilation time, thus it needs to access them using an indirection at runtime. 

    This is where the Dart object pool comes into play. At compilation time, a reference to each Dart object is stored in a big array called the Dart object pool, which is itself serialized (since it is a Dart object). At the same time, in the Dart code, all direct accesses to these objects are replaced by indirect access through the object pool using their corresponding object pool index.

    The following figure demonstrates what is happening at runtime:

    • Steps 1 and 2 are performed while the Flutter engine loads the Flutter application
    • Steps 3 and 4 are performed each time the Dart code wants to access a Dart object. As you can see, the Dart code never uses the serialized Dart objects.

    0_blogpost_2-dart_memory

    The usage of this object pool technique has the following impacts on Dart code: 

    • There are no direct cross-references between code and data, all these links are hidden by the object pool indirection.
    • It explains why we can only see the index of the Dart object in the decompiled code instead of the Dart object itself.

    Thus, if you look only at the compiled Dart code for a specific function, you have no way of understanding which data (i.e. Dart object) it is accessing. In the same way, if you are only looking at a serialized object of the Dart snapshot, you can’t figure out which Dart compiled function is using it, since it is never accessed directly.

    Not having these cross-references is a major hurdle for reverse engineering as it slows down the identification of code and data dependencies.

    However this information is inherently required to make the Dart code load the objects it needs at runtime, thus it needs to be stored somewhere. We will see later on that all the information required to restore this link between the data and the code is available in Dart metadata.

    Dart code characteristic 2: Custom stack

    The second issue is caused by the Dart SDK having its own stack, which is indexed by theX15register instead of the classical ARM64 SP register. 

    Most reverse engineering tools are able to deduce a lot of things by analyzing the stack operations and accesses. For instance, they can automatically compute the function frame size, and identify local variable reads and writes. Additionally, they can find out function parameters by looking at variables pushed on the stack before function calls.

    Because the register used for stack operations by the Dart VM is not the classical ARM64 SPregister, all these analysis passes don’t work on Dart code. This is the reason why, in the decompiled code, there are no local variables detected and functions don’t have any parameters.

    Understanding stack manipulation and how data is exchanged between functions is a critical part of reverse engineering. Without this information it is very hard to properly understand code. To make a parallel with development, it would be a bit like trying to understand a library where:

    • All functions have 0 parameters but use global variables to transmit data to each other.
    • All local variables of each function are stored in global variables.

    Dart code characteristic 3: Custom ABI

    The third issue is caused by the fact that the Dart SDK does not use the standard ARM64 calling convention. Rather than using registersX0-X7to pass arguments to called functions, it pushes most arguments on the custom Dart VM stack. The main impact is that reverse engineering tools aren’t able to identify function parameters correctly.

    Let’s see why it happens, using one of the heuristics that reverse engineering tools can use to identify input parameters of a function (assuming the default ARM64 ABI):

    • It looks as if one of theX0-X7registers is used before being written, if this is the case, it means that it is probably a function parameter (e.g. ifX3is used before being written, it means that the function has at least 4 parametersX0-X3
    • Then, it looks for stack access to detect parameters (for functions with more than 8 parameters)

    Thus, because the Dart custom ABI doesn’t use X0-X7, the function will not use any of these registers unless first writing a value to them itself, so reverse engineering tools will consider that the first 8 parameters are not used. Additionally, because of the Dart custom stack, the real parameters are not pushed on the system stack, and thus they are not detected as function parameters by reverse engineering tools.

    For instance, in the decompiled code image above, you can see that IDA Pro thinks that none of the functions have parameters.

    One important thing to keep in mind is that all this information is public since the Dart SDK is open source. For instance, all special register values and purposes can be found here. Additionally while trying to get more information on the second and third issues, we found an open issue on the Dart SDK GitHub repo which is pushing for the adoption of standard ARM64 ABI and usage of SPinstead of X15. If these changes are implemented, they will directly fix the second and third issues identified.

    Experimenting with Dart code cleaning

    Now that we understand the specifics of what makes decompiled Dart code look strange and hard to work with, let’s investigate if these are fundamental issues inherent to the language or rather symptoms of the available tooling. Through some experiments we hope to understand how feasible it could be to automatically and consistently overcome the identified issues. The results should give us an indication and rough timeline on the capabilities of future tooling and the implications for the security of Flutter apps.

    We will tackle each of the 3 characteristics and concretely try to achieve the following respective goals:

    1. Adding cross-references between Dart objects and Dart code and making them visible in decompiled code.
    2. Ensuring that the Dart VM custom stack is considered as a regular stack by reverse engineering tools.
    3. Fixing the ABI of Dart functions in reverse engineering tools so that function parameters are identified correctly.

    Adding cross-references to Dart objects

    The Dart snapshot contains serialized Dart objects, whereas the Dart code uses deserialized Dart objects. Moreover, the Dart code doesn’t directly access deserialized Dart objects, it uses the Dart object pool to access them indirectly through their object pool index. Thus, re-introducing cross-reference will be a multi-step process:

    1. Getting deserialized Dart objects.
    2. Adding these objects to the reverse engineering database.
    3. Adding cross-references to Dart objects:
      1. Detecting where the Dart code is accessing the Dart object pool
      2. Finding out which Dart object is accessed
      3. Adding a cross-reference between the Dart assembly code and corresponding deserialized Dart object.
    4. Making the used Dart objects visible in decompiled code.
    Recovering the deserialized Dart objects

    As we explained in our previous post, it is possible to use a parser to recover serialized Dart objects from a Dart snapshot statically. Thus, it is possible to just extend a parser to also allow it to deserialize these objects. Since the deserialization code is available to everyone, the main challenge here is to support multiple versions of Dart SDK, since the Dart snapshot format is changing frequently.

    Rather than doing that, a more robust approach would be to let the Flutter runtime parse and deserialize all Dart objects, and to then dump the deserialized objects from memory. As long as you can do dynamic analysis, using this approach is fairly simple. For instance, it can be done with a simple Frida script that hooks a Dart function, reads the Dart Object Pool pointer (i.e. read the value ofX27), and dumps the Flutter heap. The Frida script looks like this (full version here):

    var FLUTTER_MEM_START = 0x7200000000
    var FLUTTER_MEM_END = 0x7300000000
    var FLUTTER_MEM_MASK = 0xff00000000
    var SHARED_PREF_GET_INSTANCE_OFFSET = 0x6D4F88
    var APP_DATA_DIR = "/data/data/fr.carameldunes.nyanyarocket/"
    
    function hook_libapp() {
       var base_address = Module.findBaseAddress("libapp.so");
       console.log(`Hooking libapp: ${base_address} `);
       Interceptor.attach(base_address.add(SHARED_PREF_GET_INSTANCE_OFFSET), {
           onEnter: function (args) {
               console.log(`Calling SharedPreferences::getInstance()`);
               console.log(` Object Pool pointer (X27): ${this.context.x27}`)
               dump_memory(FLUTTER_MEM_START, FLUTTER_MEM_END, APP_DATA_DIR)
           }
       });
    }

    When the script is used on NyaNyaRocket it successfully dumps the memory containing the deserialized Dart objects to disk:

    ➜  obfu frida -U -f fr.carameldunes.nyanyarocket -l dump_flutter_memory.js --no-pause
    …
    Spawned `fr.carameldunes.nyanyarocket`. Resuming main thread!           
    [Pixel 6::fr.carameldunes.nyanyarocket ]-> Hooking libapp: 0x73428a4000 
    Calling SharedPreferences::getInstance() 
     Object Pool pointer (X27): 0x7200600040
    Dumping memory into /data/data/fr.carameldunes.nyanyarocket/0x7200000000
    Dumping memory into /data/data/fr.carameldunes.nyanyarocket/0x7200080000

    Then, this memory dump can be imported into IDA to have access to all deserialized Dart objects using this script:

    blogpost_2_new_segment

    Note that some deserialized Dart objects include pointers tolibapp.so. Thus, it is better to rebase thelibapp.soso that its base address matches the one used when the memory was dumped (shown in Frida’s output). Doing that will allow IDA to automatically identify these pointers.

    Creating Dart objects

    At this point, all Dart objects are mapped into the IDA database. However, they are still considered raw data.

    3_blogpost_2_before_object_creation

     

    In order to better parse and understand these Dart objects, we need to deserialize them. We decided to automate this task by creating structures for them in IDA. So, we looked at the  Dart SDK source code which performs the Dart objects deserialization to find characteristics shared by all of them.

    Each Dart object starts with a 4 byte tag containing its class ID cid.

    struct DartObjectTag
    {
     char is_canonical_and_gc;
     char size_tag;
     __int16 cid;
    };
    

    The Dart object pool is a Dart object itself thus it starts with aDartObjectTagfollowed by the number of Dart objects in the pool (at offset 8) and ends with an array of pointers to Dart objects.

    struct DartObjectPool
    {
     DartObjectTag tag; 
     
     __int64 nb_dart_objects_in_object_pool;
     DartObject *object_pool_array[];
    };
    

    You can also observe that addresses stored in the Dart object pool array are odd. This is linked to Dart pointer tagging. Thus, this value must be untagged (i.e. subtract one from it) to find the Dart object’s actual address. This technique is used to avoid creating Dart objects (a.k.a. boxing) for small integers by tagging them as pointers using their LSB. 

    All this information can be used to create custom structures in IDA and deserialise all Dart objects into these structures. By using the address of the Dart Object Pool (object_pool_ptr) recovered with the previous Frida script we can do this as follows:

    • Get the number of objects (nb_dart_objects_in_object_pool) in the object pool.
    • For each item inobject_pool_array:
      • If it is an odd number, then it is a pointer to a Dart Object, we first untag it after which we can parse it to get its class ID, and deserialize it into the corresponding structure.
      • Otherwise, we ignore it as it is a small integer and not a Dart object.

    As a first step, we decided to create one structure per class ID each with the same generic Dart object contents and not yet containing any class specific fields:

    
    // Will be used to deserialize Dart object with class ID 45
    struct DartUnkObj45
    {
     char is_canonical_and_gc;
     char size_tag;
     __int16 cid;
     <int padding>
     <unknown class specific data>
    };
     
    // Will be used to deserialize Dart object with class ID 11
    struct DartUnkObj11
    {
     char is_canonical_and_gc;
     char size_tag;
     __int16 cid;
     <int padding>
     <unknown class specific data>
    };
     
    // Will be used to deserialize Dart object with class ID 85
    struct DartUnkObj85
    {
     char is_canonical_and_gc;
     char size_tag;
     __int16 cid;
     <int padding>
     <unknown class specific data>
    };
    

    When we parse a Dart object we deserialize it using the structure associated with its class ID. Since this is still just a generic structure, the deserialization will be incomplete. However, it is still useful:

    • Each structure can be extended later in IDA and IDA will automatically apply the extension to all Dart objects associated with it.
    • IDA Pro has a feature that allows it to list all objects of a specific type. Thus, it is already possible to quickly find all Dart objects of a specified class ID.

    Since we are most interested in Dart String objects, we decided to add the complete structure for Dart strings, which means that we replaced theDartUnkObj85structure by:

    struct DartString
    {
     char is_canonical_and_gc;
     char size_tag;
     __int16 cid;
     <int padding>
     __int64 s_len;
     char s[];
    };
    

    After running this script, a lot of new information is added to the IDA database:

    4_blogpost_2_dart_object_creation

    As you can see, all Dart objects referenced by the Dart object pool are created (with meaningful names when possible) and some convenient features, like getting the list of all objects of a certain class ID, are already available:

    5_blogpost_2_object_type_cross_ref

    But it is not yet possible to find out where a Dart object is used in the code.

    Adding Dart object cross-references in assembly code

    Now that all Dart objects are created, the next step is to link Dart objects to Dart code.

    As explained earlier, all accesses to Dart objects are made using indirections through the Dart object pool. The three ARM64 assembly patterns that are used for this are shown below (remember that the pointer to the Dart object pool is stored in theX27register):

    11_ARM64_assembly_patterns* Note that we need to subtract 16 from the address to calculate the object index becauseX27points to the beginning of the Dart object pool, butobject_pool_arraystarts at offset 16.

    * These patterns can be found in the Dart SDK source code

    Based on this observation, the strategy to add cross-references is straightforward:

    • Search for these patterns throughout all Dart code
    • For each time one of the patterns is found, 
      • Compute the relative offset toX27used in the found pattern
      • Use the address of the Dart object pool recovered with the previous Frida script to compute the address of the pointer to the referenced Dart object
      • Add a cross-reference between the pattern address and the Dart object address

    The full script which finds these patterns and adds the cross reference to Dart objects can be found here

    After running it, cross-references are added in IDA and it suddenly becomes very easy to find out which Dart function is using which Dart objects. For instance, we can now find all functions that are using thegameDataDart String with IDA’s built-in cross-reference search:

    6_blogpost_2_xref_to_code

    However, the cross-references are only added for assembly, it does not get added to the decompiled code, instead we still see the object pool accesses instead of direct references to Dart objects.

    Resolving object pool accesses in decompiled code

    The reason why Dart objects don’t yet appear in decompiled code is that IDA Pro doesn't know the initial value of theX27register when decompiling a function, thus it can’t resolve the indirection.

    However, this can be easily fixed by writing a decompilation plugin (as pointed out by Rolf Rolles on StackExchange). The idea is trivial: each time IDA sees theX27register during the decompilation process, the plugin replaces it with the address of the Dart object pool (which was recovered by the Frida script). The associated microcode plugin can be found here

    In the following image, you can see the impact on decompiled code. On the left-hand side, thev5variable is used to indirectly access Dart objects from the object pool. On the right-hand side, the plugin is enabled and all indirect accesses have been replaced by direct references to the Dart objects, which considerably eases the reverse engineering of this function.

    7_blogpost_2_microcode_bef_aft

    Dealing with the Dart VM stack

    Now that the first of the three issues has been resolved, we can investigate the second one: the custom stack used by the Dart VM.

    As a reminder, on ARM64 Dart code doesn’t use the standard ARM64 system stack, it uses a separate stack managed by the Dart VM and theX15register is used as the Dart VM stack pointer. Thus, when a function wants to call another function, it pushes the parameters on the Dart VM stack and updatesX15accordingly. Similarly all local variables of a function are stored on the Dart VM stack and accessed throughX15or the frame pointer(X29).

    This has a strong impact on heuristics used by IDA, and other decompilers, to analyze and decompile code. Concretely, IDA is unable to identify any function parameters and local variables.

    Patching stack accesses

    In order to handle the Dart VM stack automatically, we initially tried several approaches such as:

    • Using the__usercallABI feature in IDA which lets us define custom calling conventions.
    • Creating a custom structure for the Dart VM stack.

    But none of these approaches led to satisfying results.

    We decided to try a more basic approach: find all instructions that are using the Dart VM stack pointer register X15and patch them so that they use theSPregister instead (script can be found here). 

    Once the program has been re-analyzed after the patching, the decompiled code looks much cleaner and IDA is able to (almost) detect functions parameters and local variables automatically:

    8_blogpost_2_stack_bef_aft

    Note that there might be some corner cases where this patching script can generate incorrect code, e.g. when dealing with boundaries between Dart and native code. However, it worked fine in our experiments and should generally not impact reverse engineering of the application code significantly.

    Dealing with Dart’s custom function ABI

    After patching the stack pointer, the Dart stack is considered to be the regular stack by IDA. But as you can see on the previous image, functions now have too many parameters. This is linked to the third issue: Dart’s custom ABI with its own calling convention.

    During the initial analysis, IDA Pro used the standard ARM64 ABI, which assumes that function arguments are stored into registersX0-X7before being pushed on the stack. Thus IDA Pro detects (real) arguments being pushed on the stack but it also still considers that there are (false) arguments inX0-X7.

    When it comes to Dart code, by default, all parameters are pushed on the Dart VM stack, but due to our patching they are now all pushed on the regular stack. Thus, we have to fix it by specifying a custom calling convention to specify that the parameters are solely located on the stack and not in the usual registers. In IDA, this can be done using the__usercallkeyword (see this post for more info on how to use it ). For instance, the result looks like this for thehandlePublishTappedfunction:

    void __usercall LocalPuzzles___handlePublishTapped(void *context@<^16.8>, void 
    *uuid@<^8.8>, void *user@<^0.8>)xs

    After applying the correct calling convention, the function parameters are correctly identified:

    9_blogpost_2_after_usercall

    Back to classical native reverse engineering

    After going through all these steps, we end up back at the classical scenario of reverse engineering native code. Reverse engineers can now work the usual way and focus on understanding application code.

    Let’s see what this could look like for the following Dart code example:

    void _handlePublishTapped(BuildContext context, String uuid, User user) {
       if (user.isConnected) {
         PuzzleStore.read(uuid).then((NamedPuzzleData? puzzle) {
           if (puzzle != null) {
             _verifyAndPublish(context, puzzle);
           }
         });
       } else {
         final snackBar = SnackBar(
             content: Text(NyaNyaLocalizations.of(context).loginPromptText));
         ScaffoldMessenger.of(context).showSnackBar(snackBar);
       }
     }
    

    After a bit of manual work, the associated decompiled code looks like this:

    10_blogpost_2_final_decompiled

    As you can see, we can identify all parts of the original Dart code in the resulting decompiled code. But the decompiled code is way bigger and contains more operations than the Dart source code. The main reason is because Dart code abstracts a lot of details, whereas in the decompiled code everything is explicit:

    • We can see the allocation of local Dart objects (e.g.SnackBar,Text).
    • Similarly, the closure which calls_verifyAndPublishis a separate function. And we can also see that it is itself allocated in aClosureStubobject.
    • We can see the resolution ofloginPromptTextand the three possible values depending on the language being English, French or German.

    Conclusion

    In this post, we took a deeper look at some of the particularities of compiled Dart code. We investigated the reason behind the absence of cross-references between Dart code and objects, and demonstrated that it was possible to reintroduce them by analyzing the Dart object pool. We also explored the impact of the custom Dart VM stack on reverse engineering tools and showed how this can be easily fixed by taking the custom calling convention into account in tooling. At the end of this analysis, we end up with decompiled Dart code that can be reverse engineered in almost the same way as native code. 

    Based on our results, it seems that enhancing reverse engineering tools to allow them to properly analyze and decompile Dart code is not an insurmountable task and we will probably soon see new features or plugins to help with reverse engineering Flutter apps. All experiments presented in this blog post are automated, so all these transformations can be done in a matter of minutes while reverse engineering any other Flutter application. Furthermore, changes to the Dart SDK should have a limited impact on the taken approach as it only relies on Dart’s internal structure to correctly parse Dart Strings. 

    That being said, the code provided should be considered as a proof of concept and not as a Flutter reverse engineering solution. It provides only a set of limited features and there are plenty of corner cases where it won’t work as expected.

    In the next post, we will focus on dynamic analysis to explore how tampering and hooking could be used to cheat in a Flutter game.

    Notes

    Some comments of the demonstration made in this blog post:

    • We used ARM64 as the basis for our investigation as this is the most common architecture in mobile platforms, but similar results can be achieved on all architectures.
    • We used IDA Pro for the experiments in this blog, and the materials for reproducing our results are made available. All things discussed could just as well be performed using alternative reverse engineering tools.

    In this post, we used the real function names to ease reading. Remember that when performing reverse engineering on an obfuscated build, you can only retrieve the framework function names. If you want to have the same function names as the one shown on this post, you can use the script from the previous post to add class and function names based on debug symbols.

    Boris Batteux - Security Researcher

    Discover how Guardsquare provides industry-leading protection for mobile apps.

    Request Pricing

    Other posts you might be interested in