November 12, 2019

iOS App Security: An Introduction to Objective-C Metadata & Symbols in Swift & Objective-C Apps (Part 1 of 2)

Written by: André Jacobs - Software Engineer

iOS applications are bundled as IPAs. These IPAs consist of various files, including assets, resources, plist files, etc. One of these files is the main executable binary containing the machine code that defines the app's functionality. This compiled machine code is unreadable to most humans. However, binaries also contain a lot more human-friendly information than one might think.

Identifiers (i.e., function names, class names etc.), are names introduced by the developer to write comprehensible code. Some of these names remain in the binary after compilation, potentially serving as a blueprint for reverse engineers. Obfuscating or, if possible, removing these identifiers in your binary is an important part of proper, multi-layered, code hardening. In order to correctly obfuscate or hide them, it is important to understand what they look like.

To help you better understand how to execute this type of obfuscation, the following article explains two identifier types, Objective-C metadata and Mach-O symbols, through a sample application.

Inspecting a binary

Objective-C and Swift applications contain two layers of identifiers:

1) Metadata: Layer of information required for the inner workings and functionality of the programming language. This layer may be used for reflection purposes or Objective-C method invocation, for example.

2) Symbols: Names used by various system processes to load correct data, correctly resolve indirections, perform linking (both statically and dynamically), and more..

We take a closer look at the differences between these two types of identifiers by inspecting the following small program:

#import <Foundation/Foundation.h>

@interface Class1 : NSObject
@end

@interface Class2 : Class1
@end
 
@implementation Class1
 
- (void) myMethod {
        	NSLog(@"Hello world");
}
@end

@implementation Class2
@end
 
int main(int argc, const char * argv[]) {
  @autoreleasepool {
        	Class1* class1 = [[Class1 alloc] init];
        	Class2* class2 = [[Class2 alloc] init];
        	[class1 myMethod];
        	[class2 myMethod];
        	}
        	return 0;
}

Compiling and executing the sample application simply prints ‘Hello world’ twice by instantiating an Objective-C class and calling a method on that class.

Retrieving Objective-C metadata

For the Objective-C runtime to execute the program correctly, the necessary metadata must be made available by the compiler. This is done by storing it as strings inside the binary, which can be verified by, for example, running the strings tool on it:

» strings test.bin
Hello world
Class1
Class2
myMethod
alloc
init
v16@0:8

Running a tool like class-dump on the binary illustrates how much information the metadata contains. Class-dump semantically parses the metadata, just as the Objective-C runtime would, and outputs it in a nicely formatted way:

» class-dump test.bin                                                                                                     
 
@interface Class1 : NSObject
{
}
 
- (void)myMethod;
@end
 
@interface Class2 : Class1
{
}
@end

This shows that the metadata contains all the information needed to fully reconstruct the original class declarations.

Dumping symbols

Symbols on the other hand are very different from metadata. They are not stored in the text or data section, but separately in a special string table for symbols located in the `__LINKEDIT` segment. Although the default `strings` tool, installed on Mac OS, will not check this segment, we can print them using another tool specifically made for inspecting symbols:

» nm test.o             
0000000100000e50 t -[Class1 myMethod]
                    	U _NSLog
00000001000011f8 S _OBJC_CLASS_$_Class1
0000000100001248 S _OBJC_CLASS_$_Class2
                    	U _OBJC_CLASS_$_NSObject
00000001000011d0 S _OBJC_METACLASS_$_Class1
0000000100001220 S _OBJC_METACLASS_$_Class2
                    	U _OBJC_METACLASS_$_NSObject
                    	U ___CFConstantStringClassReference
0000000100000000 T __mh_execute_header
                    	U __objc_empty_cache
0000000100000e80 T _main
                    	U _objc_autoreleasePoolPop
                    	U _objc_autoreleasePoolPush
                    	U _objc_msgSend
                    	U dyld_stub_binder

Symbols serve a different purpose. Symbols are used to point to addresses inside the binary and are required for. linking and some other activities.

After inspecting the application, it's clear that these 2 sets of names are present. It shows that they can give insight into the binary's functionality. But what's the difference between these 2 layers?

The Difference Between Symbols and Metadata (And Why it Matters)

From the samples above, it's clear that symbols and metadata are somehow connected to the code written by the developer. They are the compiler, linker and Objective-C runtime's perspective on the compiled code and used to connect together all the pieces of the binary.

An important observation is that symbol names can be renamed independent from the metadata that is derived from the same source code. In other words, it is possible to have an Objective-C class defined in the source code as "A", that is renamed in Objective-C metadata to "B" while the class symbol says "_OBJC_METACLASS_$_C".

If metadata and symbols can be renamed independently from each other, and metadata and symbols are both used to point out specific parts of the binary,then what's the difference? When is which one used?

Objective-C metadata

Objective-C metadata is used by the Objective-C runtime and is required for the inner workings of the application. The compiler will generate various tables describing available classes, methods, etc. This metadata is parsed and used during execution by the Objective-C runtime, similar to how developer code can just access data in structs. Every class, selector, protocol, variable, etc. is encoded in this metadata so that the runtime has an overview of everything and knows what code to execute and when.

When calling a method, you provide the Objective-C runtime with a class and selector. This selector is basically just a string and not really a reference to a function. But the runtime knows exactly which function is intended and makes a call to that function.

It's this detailed description of the code that allows Objective-C to provide powerful features like method swizzling and reflection.

Symbols

Symbols, on the other hand, are used in the compilation process itself and during linking. There are two types of linking: static and dynamic. The first is an essential step of the build process and happens on an application developers local machine. The second is a process performed when the application is executed on a device.

Static linking is an intermediate step when building a binary and happens between object files. The final binary is a collection of statically linked object files. Even when linking with a static library, the idea is the same. As a static library is just a collection of unlinked objects, waiting to be linked in to a binary.

Dynamic linking is performed by the dynamic linker (dyld on MacOS/iOS). The dynamic linker will load a binary file to memory and find the dynamic dependencies of the binary. Effectively, dynamic linking happens between two binary files.

As mentioned before, symbols point to locations in the binary. In the nm output, the symbols defined in the binary are preceded by the address. This address tells the linkers where certain parts of code or information are inside this binary.

For example, the main function for this binary is at address 0000000100000e80. While the binary is loaded, the dynamic linker searches for the main symbol and finds this address. The address then helps the operating system to find the starting point in the binary, after the loading process. This address can be verified by looking inside the binary via a disassembler. When we go to address 0000000100000e80, we find our main symbol and the compiled code of our main function.

For Objective-C code, the function symbol is not needed, as the Objective-C runtime will use the metadata, like explained earlier. Therefore, symbols that point to Objective-C methods are rarely left in the symbol table. In Xcode you can remove these private symbols by setting the strip style to " Non-Globals Symbols". This runs a tool called ‘strip’ at the end of compilation. For IPAs the strip style is set to "All Symbols" by default, removing all unused symbols in dynamic linking.

Summarizing the Differences Between Metadata and Symbols

The diagram below visually summarizes the differences between Objective-C metadata and symbols:

Knowing the difference between metadata and symbols helps teams better understand the inner workings of Objective-C (and Swift) binaries/apps. But most importantly, it also shows that these binaries still contain a lot of semantic information on more than one level. Obfuscating the identifiers in a binary requires action on two levels, especially for SDKs. The more obvious of the two is obfuscating metadata. The metadata is required for the inner workings of the Objective-C runtime and directly translates to the original source code. This is commonly associated with name obfuscation. Yet the importance of protecting symbols in SDKs is usually overlooked, as a symbol's purpose and function are often less clear.

Tag(s): iOS , Technical , Protection , iXGuard