Code obfuscation, (often referred to simply as ‘obfuscation’) is where a machine code or source code is purposefully designed to be difficult for humans and computers to read. These codes are made to be difficult to read by making them ambiguous and circular.
The purpose of concealing codes and commands is to hide their purpose and its logic to prevent intrusion, avert reverse engineering or mask a source code. It can be done manually or automatically, with automation being the preferred method. Techniques include identifying variables in a misleading way, double coding, and making the data appear as comments or complex structures.
Outside of cybersecurity, code obfuscation can be undertaken as part of competitions and recreational activities.
The following example is taken from this site.
Below is an example of an obfuscated code:
Below is the de-obfuscated version of the above code. As you can see, it is much more readable for humans.
How to make sure an obfuscation method is reliable
A good obfuscation method has the four following qualities:
- Stealth. One of the main objectives of code obfuscation is to hide the performance of the given program.
- Cost effectiveness. As cybersecurity attacks ramp up even more in 2023 including state sponsored attacks, any preventative measures will need to be applied on a large scale, depending on the size of the organisation.
- Efficiency. A good software, a preventative measure or a cybersecurity solution etc, needs to be expected to reduce complexities. This will save time and money.
- Its ability to resist de-obfuscation. This is a blend of programmer effort to create a deobfuscator and the time plus resources needed to create it. The highest level of resistance is one that cannot be deobfuscated at all.
Ways to obfuscate code in Python
In this section, different obfuscation methods will be shared. We will obfuscate a code together step by step. The exact example is from this site. The methods are in line with Python, the most popular and trusted scripting language.
We will obfuscate this:
- Remove the comments and docstrings. This will of course make it less known what the code is intended for. This will give you the below modification.
- Remove helpful syntax. There will likely be some syntax in any code that provides extra helpful information. We want to remove that to obfuscate a code. Here, ‘solution path’ is marked as a string.
- Now, you need to remove additional whitespace. In Python, the most popular scripting language, numerous whitespaces are provided, making the scripting language popular with beginners. White spaces in Python carry value so they need to be removed.
The white spaces removed were between solution[_]path and encoding, between data and = and between = and solution[.]readlines ().
- Give unrecognisable, random names to help hide the associated functions and variables.
Now, the code provided in the beginning has been obfuscated. To be honest though, these 4 steps are the basic minimum. You can take it further for more security.
- Replace hardcoded strings with numbers. For example, for ‘utf8’ with a sequence of numbers for each character.
Here, ‘u’ has been replaced with 100, ‘t’ has been replaced with 70, ‘f’ has been replaced with 86 and ‘8’ has been replaced with 94. But this modification can still be tracked relatively well. So, more needs to be done.
You can reverse the list:
You can also manipulate the numbers for example, writing ‘25×4’ instead of 100.
Other ways to obfuscate a code:
To make things harder for a cybersecurity attacker, you can also:
- Include a dead branch.
- Add dead parameters.
Things to bear in mind
Code obfuscation is also used by cyber attackers. It can be used in two ways, by attackers and by those looking to protect their systems. Additionally, defense through obfuscation is – at best – an outdated method of genuine cybersecurity, so should only be used in conjunction with other proven methods of stopping the adversary in their tracks.
The following information will include various information about how attackers use code obfuscation to target your system:
- Dead code insertion – To handle dead code insertions, an established anti-virus scanner should be able to detect and delete them.
- Code transportation/code re-ordering – This is where a code is re-organised or moved around. It can be done by shuffling codes around or by reordering codes. To prevent this, you need to remove unconditional branches or jumps.
- Code integration – This decompiles the target code/command, attaches itself between them and then reassembles the newly attached, parasitic code to create a new one. This is one of the most advanced forms of obfuscation as well as costly.
Do we need to use code obfuscation?
You need to take code obfuscation step by step. It may seem intimidating to carry out at first but really, it is not too bad. It is a good way for you to keep your system safe in addition to being aware of how it can be used by cybercriminals. In particularly high-profile cases, cybercriminals have obfuscated code to lie in the HEX code of a particular file – remember, just because defense through obfuscation is mostly gone, it doesn’t mean offense through obfuscation is.