The Problem:

Imagine you are coding on your newest, super awesome Android App. You compile your work and upload it to the Google Playstore. Because you have put so much work in it and because it is so super awesome, you decide to ask for a small payment of 2€ to be able to buy some food after work.

Your App is so good, that its userbase is getting bigger and bigger each day. After a few days, you notice that your App is not getting bought anymore though. “Why?”, you may ask. You search for reasons and find the solution: Someone has reuploaded your App. They have removed your license checks, they have removed your copyright information, they have rebranded the App.

This is possible because Java, the standard language for Android Apps, compiles to Bytecode. Bytecode is an “in between” thing - it is not really high-level like Java anymore, but it is also not as low level as Assembly. The Java Bytecode (and the Dalvik Bytecode, which is used for Android Applications) may not be as easily readable as Java sourcecode anymore, but with a bit of thought and an additional hour or two of reading, good developers can still understand and modify the code. The bigger issue is, tools can do it too.

Software like jadx take Java Bytecode as input and recreate the original Java sourcecode almost perfectly. They’ll even give variables, methods and classes the correct names - this information is included in the Java Bytecode as well.

The process of reverse engineering these applications is so easy, because the process of generating the bytecode has this inherent weakness of the output still being very similar to the original source code.

The “solution”: Obfuscation

Luckily, there is a way to slow down those annoying app thieves. It is a concept called “obfuscation”. The Cambridge Dictionary explains obfuscation as “to make something less clear and harder to understand, especially intentionally”. That is exactly what we need. Look at the following short method:

void doSomething(){
	int counter = 0;
	while (counter < 100){
		calculateMathematicalProblem(counter);
	}
}

Now look at the same method, after an obfuscation tool has obfuscated method and variable names and moved some of the numbers to other places:

static int c1 = 100;
static int c2 = 0;

void f1(){
	int v1 = c2;
	while (v1 < c1){
		f2(v1);
	}
}

It is much harder to understand - even seasoned developers take longer to realize exactly what this piece of code does. Especially if it is only one method in a few hundreds or thousands and all the others are obfuscated using the same methods. In case of Android Apps, there is an obfuscator included in the build toolchain, ProGuard. Enabling it is as easy as setting two settings in the gradle build file (minifyEnabled and proguardFiles).

Renaming, the technique shown above is just one of the ways obfuscation is done nowadays. Others include:

Encryption: Applying XOR (or stronger encryption) to strings or other data hides it from plain sight. A string might be

s = "Hello World!"

after using XOR with the key ‘secretpass’ it will only show as:

s = "\x48\x65\x1f\x09\x0c\x52\x32\x1b\x02\x0d\x17\x52"

This will not help against someone who really tries to understand your application, as there are mulitple ways to figure out the former string, but it will slow them down.

Control Flow Modifications: There are quite a few Control Flow Modification techniques when it comes to obfuscation. Adding bogus instructions that in the end evaluate to not changing anything, adding dead branches, increasing how often a loop is run, putting instructions from outside a loop inside it or the other way round, etc.

a = 5
b = a * 3 + 2

might be changed to:

a = 5
b = a
for i in range(2):
  if b < 0:
    b += 7
  b += a + 1

This example shows a few techniques. First of all, a dead if has been included. The variable b will never be smaller than 0, so the part “b += 7” will never be evaluated. Also, the simple calculation has been put inside a loop, which now adds instead of multiplying. To make things a bit more confusing, instead of moving the complete multiplication to the loop, the first iteration of the loop has been left outside (so called “loop unrolling”).

Packing: Packers compress and/or encrypt the complete code. They then prepend their own start routine, which will unpack the original application and run it. Commonly used packers are UPX and VMProtect for Binaries or pack200 for Java.

There are quotation marks around “solution”

Yes, obfuscation is not the perfect solution. Obfuscation will not stop someone dedicated to reverse engineering your application, but it will slow them down. It will also stop script kiddies, or people who do not feel like it is worth the hard work. So, obfuscation will slow potential attackers down, which is definitely worth your time, especially when obfuscating is as easy as it is in the android toolchain. It can not fully protect your intellectual property, but it increases the time needed to steal it.

In a few days or weeks, I will post an article about automatically reversing obfuscation. I cannot do this before the paper to my master thesis is released, so it may take a bit of time. This basically is an introduction post to the deobfuscation topic!

That’s it for today! See you next time. Also, take a look at some obfuscation contests (yes, obfuscation is a sport, kind of) like the International Obfuscated C Code Contest (IOCCC).