October 26, 2020

    The C++ Mocking Tool

    Available at: https://github.com/Guardsquare/mocxx

    C++ is a very powerful general-purpose programming language. However, this power comes with a cost. Big applications written in C++ become very complicated, very fast. Maintaining such applications and testing them is a challenge in its own right. Many modern programming languages already received excellent testing and mocking tools, allowing them to switch type and object implementations at any time, to limit the testing scope. Typically this is harder to do in a compiled language, that’s not designed with this functionality in mind, for instance Python is one of such languages. C++, thanks to its compilers, has very little dynamic data present at runtime, which makes it difficult to make any substitutions. Most often such solutions(for instance GoogleMock, Cmock, FakeIt) require direct source-code or/and build system modifications, which is admittedly the last thing you would want to do. But wait…

    Here at Guardsquare we work days and (sometimes) nights on cutting-edge protection technologies to prevent reverse-engineers from reaching their nefarious goals. This requires us to analyze all these fancy tools, libraries and techniques used by that motley bunch obsessed with the color of their hats. One such tool is Frida, which according to its description on GitHub is a dynamic instrumentation toolkit for developers, reverse-engineers, and security researchers. You can think of it as a debugger of sorts, it attaches to the running process and provides you with a javascript interpreter and API allowing you to manipulate process memory, thread state, control flow and so forth. This is a remarkable library.

    Frida is based on portable, battle-tested dynamic code instrumentation technology, and if used correctly would make it possible to create a new C++ mocking framework, standing on par with similar frameworks for dynamic languages in functionality and flexibility. And, we wrote such a framework.

    Mocxx

    Mocxx is a versatile function mocking framework. It replaces a target function with the provided implementation, and integrates well with the existing testing and mocking frameworks. It requires no macros, virtually no source code modification, allows you to replace any function, including system functions, such as open. It is type safe and follows RAII. You can find Mocxx at https://github.com/Guardsquare/mocxx. The rest of the article will dive deep in its capabilities, design and implementation. Keep reading.

    Replacing free functions

    To replace a function with Mocxx you are required to construct an instance of it and make a single call to Replace:

    Mocxx mocxx;
    
    mocxx.Replace([](const std::filesystem::path&) { return true; },
                  std::filesystem::exists);
    
    std::string file = "/this/file/now/exists";
    
    // Returns true
    std::filesystem::exists(file);

     

    That’s it, simply pass in the replacement lambda and the target function to the Replace method. The lambda/target passing order is necessary to drive target function type resolution, because in C++ it is possible to have many overloads of the same function with different arguments. To save you some typing, the target type is derived from the provided lambda. If you want a different overload, simply change the type of the lambda:

    mocxx.Replace([](const std::filesystem::path&, std::error_code&) { return true; },
                  std::filesystem::exists);

    If the type of the lambda cannot be matched against any of the overloads, then the replacement call will fail to compile.

    The target function type resolution is done via a set of function declarations and type aliases, which effectively takes its call operator type and strips its lambda type off. Take a look:

    template
    constexpr auto
    LambdaToFreeFunctionImpl(ResultType (LambdaType::*)(Args...) const) -> ResultType (*)(Args...);
    
    ...
    
    template
    using LambdaToFreeFunction = decltype(LambdaToFreeFunctionImpl(&Lambda::operator()));

    This is enough to resolve free functions’ overloading sets, but requires a bit more effort to resolve member function types. As a convention, the very first parameter in the replacement lambda indicates the type the target member function belongs to. The constness of this parameter indicates the const/non-const member version.

    Replacing member functions

    The differences between member and free function types are substantial, and the fact that the function overloading sets are resolved at the call sites, makes it almost impossible to handle such replacements with a single Replace API. Subsequently, in order to replace member function you have to make a call to ReplaceMember:

    struct Name
    {
      using SizeType = std::string::SizeType;
    
      SizeType Size() const { return name.size(); }
      SizeType Size() { return name.size(); }
    
      std::string name;
    };
    
    mocxx.ReplaceMember([](Name* foo) -> Name::size_type { return 13; },
                        &Name::size);
    The interceptor

    Now that we have a replacement lambda and properly typed replacement target, how do we actually switch the target implementation? As was already mentioned we are using a dynamic instrumentation toolkit called Frida, specifically its instrumentation component called Gum. This library has a wide range of capabilities, for example it can hook a function to replace its implementation; it can stealthy trace a program, rewriting it on the go; it provides memory scanning and monitoring, symbols lookup, code generation, you name it. In the Mocxx case we required only two API sets: function hooking and symbols lookup. To replace a function with Gum we do the following:

    gum_interceptor_begin_transaction(mInterceptor);
    gum_interceptor_replace_function(
      /*        self */ mInterceptor,
      /*      target */ targetPtr,
      /* replacement */ details::TargetToVoidPtr(&ProxyType::Invoke),
      /*        data */ targetPtr);
    gum_interceptor_end_transaction(mInterceptor);

    Gum API accepts as its context an object of type GumInterceptor, you can obtain it by calling gum_interceptor_obtain() and destroy it by calling g_object_unref(interceptorInstance). Gum implements a transaction-style API. In the snippet above you can see the calls to the transaction begin and end. In between transaction calls jammed a request to replace the target function. This request requires passing the interceptor instance, the target function void pointer, replacement function and any data, that can be queried later. It was mentioned already how to obtain the interceptor instance, let’s figure out how to deal with the rest of the arguments.

    The target

    Since Gum is written in C, any function pointer it accepts is necessarily type-erased and converted to void pointer via TargetToVoidPtr(). After this conversion such pointers cannot be treated as functions, because void pointers are inherently pointers to data, and this is all perfectly safe because Mocxx never invokes the target function via this pointer.

    Free function can be easily converted to void pointer, assuming it is treated as data thereafter, or some additional logic is implemented to invoke it correctly. This is not necessarily true for member functions. In C++ such functions are not represented in the same way as free functions. A pointer to a member function might be a data structure rather than a single pointer. Consider a virtual member function. At compile time it is not possible to resolve it to a valid address, at runtime you can resolve it, but it will change depending on the underlying object. So how would member function type erasure even work. C++ standard is intentionally vague on that account. The best next thing that can act as an authority on member pointer representation is the compiler. There are two (open-source) mainstream compilers in the wild: Clang and GCC. All our development is done with Clang (with its libc++) on the machines running x86_64 processors, so the following explanation should be viewed from this position. Other compilers (and I am looking at you MSVC) and other architectures might employ a different kind of witchcraft.

    Clang is based on the open-source compiler toolkit called LLVM. There, if you look carefully in the codegen package, you will find a file ItaniumCXXABI.cpp with the following comment:

    // In the Itanium and ARM ABIs, method pointers have the form:
    // struct { ptrdiff_t ptr; ptrdiff_t adj; } memptr;

    As you can see the member function pointers are not even of the size of the regular pointer. The (intel) semantics of this structure is as follows:

    • Method pointer is virtual if (memptr.ptr & 1) != 0
    • The this-adjustment is memptr.adj
    • The virtual offset is (memptr.ptr - 1)

    From this description we can assert that the first field in this structure is always a legal pointer if the member function is not virtual. If the function is virtual, ptr value should be treated as an offset into the object vtable. Mocxx currently cannot deal with virtual member functions, but it works well with regular member functions, because of the conversion trick it uses:

    union
    {
      TargetType pf;
      void* p;
    };
    
    pf = target;
    return p;

    Admittedly this is a grey area. This will work until Clang changes its C++ ABI, which happens quite rarely, at least in this part.

    The second field in this structure called memptr.adj is an adjustment value to be added to this pointer available in member functions. Why would you need it? Consider these three types:

    struct A {
        auto get_number() const { return number; }
        double number;
    };
    
    struct B {
        auto get_flag() const { return flag; }
        bool flag;
    };
    
    struct C : public A, B { };

    If you construct A or B directly, this pointer will contain the address of the object of its respective type. And the reference to any of the fields would be equal to this pointer plus the relative offset to this field, in this case fields are number and flag. When class C inherits from A and B the reality changes slightly. For inheritance to work, C++ compilers must be able to reserve space for all involved types (A, B and C in this case) in the same memory object. Since the memory is linear in nature, the allocation is linear as well, so the compiler simply reserves space sequentially. In our case, an object of type C consists of object A followed by object B followed by object C, and alignment padding in between if necessary.

    This is where memptr.adj comes into play. It is simply an offset from the memory object start to a sub-object of some inherited type. In the case above, if you invoke get_number() on an object of type C the adjustment to this pointer will be 0, but if you invoke get_flag() adjustment will be 8 bytes, to slide past the number field of the sub-object A.

    Mocxx does not require special treatment for regular member functions even if the call requires this pointer adjustment, because such adjustment is done at the call site, before the replacement is invoked.

    With all the information above it is possible to freely convert a free function and member function pointer (albeit not virtual yet) to void pointer. Now that we have the target, let’s talk about the replacement.

    The replacement

    In the provided function replacement example you can see that we don’t pass the lambda directly but instead we pass something like this:

    details::TargetToVoidPtr(&ProxyType::Invoke)

    Gum requires replacement to be passed as void pointer, but a lambda, a replacement for some target is implemented as a struct with state, passing only the call operator would not work in this case. To solve this a static proxy must be used of the following form:

    template
    class ReplacementProxy : public ReplacementProxyBase

    The proxy is a template class parameterized by the return and argument types of the replacement (and therefore the ones of the target). It extends ReplacementProxyBase type, that serves the type-erasure purpose, to be able to store instances of this typed proxy in a single map.

    The replacement proxy template class (not its objects) instantiation is unique per function signature, the ResultType and Args... parameters. This means that multiple functions of the same signature will share the same replacement proxy. This will become relevant in a moment.

    When such a replacement proxy is constructed, it is passed the void pointer of the target function and the replacement lambda. The mapping between the target and the replacement is stored in a static std::unordered_map of that proxy template class instantiation.

    The static Invoke method of the proxy template class instantiation is parameterized by the ResultType and Args... template parameters, and upon template class instantiation it can be used as a substitution for the target, because it can accept all the arguments target can, and returns the same result as the target does. This static method is converted to void pointer, to pass it further to Gum. The implementation of this method is rather simple:

    static ResultType Invoke(Args... args)
    {
      auto* context = gum_interceptor_get_current_invocation();
      const auto* target = gum_invocation_context_get_replacement_function_data(context);
      return Replacements.at(target)(std::forward(args)...);
    }

    First, the method requests current invocation context. Its content is outside the scope of this article, but for all intents and purposes it is unique per target invocation. From the context we get the target pointer, preserved as data at the replacement request above. And as the last action, the method looks up the replacement static map keyed by target pointers, to find the lambda it needs to invoke and return its result. This is where the titbit about uniqueness per signature is important. The static Invoke method can be invoked for very different functions matching the same signature, and to invoke the correct lambda instance we look up this static replacements map at runtime.

    A small but still important caveat about this proxy is that it stores the target pointer per its object instance. This is required for proxy destruction. Recall that the static replacement map is keyed by the target pointers. When a proxy is destroyed the mapping between target pointer and its replacement is removed from the static map.

    The extended API

    So far we have been discussing the basic replacement facilities inside Mocxx. The base API is already powerful enough to do 80% of the work. At times you’d wish to have a shortcut to just replace the target invocation result, return a new instance of some time on every target invocation, or limit the replacement to a single invocation. All this is relatively easy to implement, so let’s dive in.

    Replacing the result

    To replace a target invocation result, without the need to specify all the necessary types can be achieved via Result API:

    mocxx.Result((std::FILE*)nullptr, fopen);

    With this mock in place any attempt to fopen will result in nullptr. The implementation for this method is straightforward, but not without caveats:

    template
    bool Result(ResultValue&& value, TargetResult (*target)(TargetArgs...))
    {
      return Replace(
        [capture = details::Capture(std::forward(value))]
        (TargetArgs... /* unused */) -> TargetResult {
          return std::forward(std::get<0>(std::move(*capture)));
      }, target);
    }

    In short, the Result method simply wraps the desired result value in a lambda and passes it to the Replace method. Two important questions might pop into your head while looking at this code sample. How the type of the target can be resolved at the call site, and what that call to details::Capture does.

    The answer to the first question is irritatingly simple: there is no target type resolution at call site with this method, except by providing the exact type for the target function:

    mocxx.Result(false, (bool(*)(const std::filesystem::path&))std::filesystem::exists);

    To answer the second question consider what happens when you pass a reference, or a value to this method. It does use perfect-forwarding into and out of the synthetic lambda. However, the way C++ lambda declaration syntax is organised, does not allow you to decide on the storage type, either by value or by reference, from the value or its type. In other words, this decision is syntactic, not semantic. To solve this problem we need a wrapper type that can make this decision semantically. It just happens that std::tuple is one such type. This type is perfectly suited to store values and references, so we use it to “capture” the result value and pass it to the synthetic lambda.

    But this is not everything that is required to capture the result value. Recall that ReplacementProxy contains a map from target pointer to its replacement. This replacement is wrapped in std::function, which cannot accept a lambda without copying it, and this is problematic for values that can be only moved. The reason behind this is outside the article, but we still have to deal somehow with this. The current solution simply uses std::shared_ptr wrapped around the std::tuple. Here it is:

    template
    auto
    Capture(Value&& value)
    {
      return std::make_shared<std::tuple>(
        std::tuple(std::forward(value)));
    }</std::tuple
    Replacing once

    A useful extension to the base API is the ability to replace targets only once. Here how it looks like:

    template
    bool ResultOnce(ResultValue&& value, TargetResult (*target)(TargetArgs...))
    {
      return Replace(
        [this, target, capture = details::Capture(std::forward(value))]
        (TargetArgs... /* unused */) -> TargetResult {
          auto tuple = std::move(*capture);
          Restore(target);
          return std::forward(std::get<0>(tuple));
        },
        target);
    }

    The code above is pretty straightforward, except for one subtlety. The restoration of the target must occur after the result value is read, because upon restoration this enclosing synthetic lambda is destroyed and with it the capture storage.

    Replacing with a generator

    Another useful extension is the ability to return a new value every target invocation, without the need to specify every parameter:

    template
    bool ResultGenerator(ResultGenerator&& generator, TargetResult (*target)(TargetArgs...))
    {
      return Replace(
        [capture = details::Capture(std::forward(generator))]
        (TargetArgs... /* unused */) -> TargetResult {
          return std::get(*capture)();
        },
        target);
    }

    No surprises here, a new synthetic lambda is created that simply wraps invocations to the passed generator.

    Limitations

    As all good things, this one does have some drawbacks. One major issue you might encounter while using Mocxx is that your functions are not being replaced, this is especially true for system headers. This section will go over the most common problems, and potential solutions.

    Optimisations

    C++ is a powerful language and it is usually packaged within a powerful compiler that tries to optimise every bit of your code. The function you are trying to replace might be inlined, or simply removed, because it is no longer required for the program. This is especially evident in the very first example this article show-cases. In order to successfully replace std::filesystem API you would want to wrap its header inclusion with the following pragma (clang only):

    #pragma clang optimize off
    #include <filesytem>
    #pragma clang optimize on

    For such wraps we create a dedicated header, so that all call sites are left unoptimised.

    Overloading sets

    The problem with overloading sets in C++ is that they are not first-class citizens, you cannot bind an overloading set to a name, or pass it through a function call. Overloading sets are always resolved at call site. At the moment there is no straightforward solution to this problem, and you have to provide target function type for lambdaless API.

    Templates

    The major issue with template functions is the fact that they are not actual functions. Mocxx is a runtime tool, it can only replace a function that has an address. What this means is that you first need to instantiate the template function, and then pass its address to the tool.

    Virtual methods

    The nature of virtual methods (aka dynamic dispatch, or late method binding) in C++ makes it impossible to home in on the target member function using language means, which is what Mocxx requires. You can of course pass in the virtual member function pointer, but it contains no information about the actual function. So what can be done with it?

    The C++ standard committee in its infinite wisdom decided not to specify the way virtual functions should be implemented, but most commonly used compilers use a technique that can be colloquially called vtables. Every class implementing a virtual method, or inheriting another class with virtual methods will have a vtable associated with it. This vtable contains pointers to all virtual methods of the current class and every class it inherits from. Every object of such class will have a hidden pointer called vpointer. When a virtual call needs to be done, compilers generate a load of vtable address from the vpointer, add (memptr.ptr - 1), as explained above, to it, make another load and invoke the resulting method pointer. All of this is done at the call site, which Mocxx cannot control.

    But we don’t actually need the call site, because it is enough to load the target method from the vtable using the offset given in the virtual method pointer and the class associated with this pointer, luckily C++ type system allows us to do at least that. This of course highly depends on the compiler vendor or even compiler version, so we haven’t implemented this yet, you are welcome to try.

    Forbidden mocks

    This tool makes use of various STL containers like std::string, std::variant, memory allocation API and so for. To save your sanity, suppress the urge to mock commonly used generic API.

    A potential solution to this limitation would be an ability to invoke the original target from its replacement. This way you could for example check this pointer or other critical argument to invoke a desired action only when actually required. In other cases you would default to the original function.

    In conclusion

    Mocxx proved itself an invaluable tool for our internal testing. With its help we were able to isolate the code under testing down to the actually important functions, detaching it from the execution environment and the feature context.

    The work on this mocking tool is very much in progress. If you find Mocxx helpful, please consider contributing by testing and porting it to other compilers and systems.

    Tag(s): iOS , Technical

    Artyom Goncharov - Software Architect

    Discover how Guardsquare provides industry-leading protection for mobile apps.

    Request Pricing

    Other posts you might be interested in