May 22, 2023
8 mins read
In this post, I’ll use pyinstxtractor and pycdc in order to reverse engineer a Discord infostealer written in Python3.10 wrapped with PyInstaller. I’ll then use CyberChef to further deobfuscate the malware and learn about the wonders of open source. Lastly, I’ll abuse some interesting Discord design decisions to effectively kill the malware.
Recently, a friend’s discord account was hacked. They started messaging everyone the link http://redacted.com/redactedmalware
, claiming it was a way to earn free robux or something:
Shocker: It’s malware. What kind of malware, exactly? I was curious if I could reverse engineer it. I started off by running strings
on it, and pretty quickly found some indicators of PyInstaller:
If you don’t know what PyInstaller is, it’s a Python package meant to pack Python .py
scripts into a single .exe
file. This provides a (minor) layer of obfuscation along with the user not needing Python to run the program. If we want to view the Python source code, we’ll have to convert the .exe
back to the original .py
files. How?
.exe
to .py
Some research brought me to this GitHub repository, claiming to extract the contents of a PyInstaller exe
. It works!
Download the script, and run python3 pyinstxtractor.py file.exe
. It’ll output the extracted contents in file.exe_extracted
:
We’re not done just yet. As the program says, we only have .pyc
file, not .py
files:
.pyc
files are the compiled bytecode of a .py
program. This means we still can’t read the source code. Luckily, since this is bytecode (instead of machine code), we can accurately reverse the .pyc
files back to their original .py
files with another program!
.pyc
to .py
There are some more popular .pyc
decompilers out there like uncompyle6 that you think might work on these, but they actually don’t:
This is because they’re outdated. uncompyle6
, for instance, only works up to Python 3.8. Our code is Python 3.10! We need to find a program supporting more recent versions of Python. Luckily, one exists: pycdc/decompyle++.
There are no binaries we can download, but compiling is pretty easy:
git clone https://github.com/zrax/pycdc
cd pycdc
cmake .
make
./pycdc -h
We can try it with ./pycdc source.py
, and it almost works:
It clearly starts printing out some of the source code but it errors out with Unsupported opcode: CALL_FUNCTION_EX
.
Some more googling and I find this savior of a github comment. We have to change a little bit of code, but it’s nothing crazy. Pretty much, every time we encounter an opcode error:
ASTree.cpp
and scroll to line 1200, where there should be a big set of case
statements like so:
(if you are reading this in the far future, it might not be at this line anymore. check the commit history)CALL_FUNCTION_EX
and I added the line case Pyc::CALL_FUNCTION_EX_A:
make
, and try the decompilation again. If you get another opcode error, repeat the same thing.
Once you add enough, it’ll fully decompile it, and we now have our source main.py
file!There’s some obfuscation in the source code itself, but luckily it’s pretty bad. Here’s the main.py
file, if you want to try and reverse it before I explain how:
# Source Generated with Decompyle++
# File: main.pyc (Python 3.10)
from components.antidebug import AntiDebug
from components.browsers import Browsers
from components.discordtoken import DiscordToken
from components.injection import Injection
from components.startup import Startup
from components.systeminfo import SystemInfo
from config import __CONFIG__
def main():
Warning: Stack history is not empty!
Warning: block stack is not empty!
funcs = [
AntiDebug,
Browsers,
DiscordToken,
Injection,
Startup,
SystemInfo]
for func in funcs:
if __CONFIG__[func.__name__.lower()]:
try:
if [
iter(__import__('base64').b64decode(__import__('zlib').decompress(b'x\xda\xf3\x0bq4\x05\x00\x02\xef\x01\x19')).decode())] * 3 or int.from_bytes == map((lambda O, i: 511 - int(O) + i)(map, __import__('base64').b64decode(__import__('zlib').decompress(b'x\xda\x03\x00\x00\x00\x00\x01')).decode().join(zip, [
iter(__import__('base64').b64decode(__import__('zlib').decompress(b'x\xda\xf3\x0bq4\x05\x00\x02\xef\x01\x19')).decode())] * 3), range(1)), __import__('base64').b64decode(__import__('zlib').decompress(b'x\xdaKr\xcf1Hq\xaf\xc8\x01\x00\x0cB\x02\xd5')).decode(), False, **('signed',)):
func(__CONFIG__[__import__('base64').b64decode(__import__('zlib').decompress(b'x\xdaK1\n\xcbLt\xb7,K,\xb7\xb5\x05\x00\x1a,\x03\xff')).decode()])
else:
func()
finally:
continue
if Exception:
e = None
try:
print(__import__('base64').b64decode(__import__('zlib').decompress(b'x\xda\x0b\x8a\xf0\xaaL2\xf6LO\x0c7IO560\xf3\xf4(\xb1\x04\x00D_\x06B')).decode().format(func.__name__, e))
finally:
e = None
del e
continue
e = None
del e
if None:
if None:
continue
return None
if __name__ == __import__('base64').b64decode(__import__('zlib').decompress(b'x\xda\x8b0\xb4,\x89\x0c\xcf)\x8d0\xb4\xb0\x05\x00\x19/\x03\xc6')).decode():
main()
return None
We see a bunch of the same thing: __import__('base64').b64decode(__import__...
ending with a bytestring. We can decode each of these in the Python interpreter, but that’s a little slow:
Is it possible to automatically deobfuscate all of this without any risk to my VM?
Yes! We can use CyberChef to automatically decode all of them for us. The following recipe will deobfuscate a single bytestring
Of course, this is just as slow as the Python interpreter version, given that we still have to copy+paste every one in. To decode every single one of them at once, we can use the Subsection
operation. This operation takes in a regex, and anything matching the regex at the time the operation is run will have future operations performed on it. For instance:
Here, we’ve set our subection regex to match for base64 strings, then we decode every match as base64. This lets us decode base64 strings while keeping other parts of the text intact. We’ll do the same with our malware, just with a few more steps.
To achieve this same effect on our malware, we can set our regex to match the entire __import__('base64....decode()
part, strip everything off but the bytestring, then do the unescape/zlib/base64 like we just did. The following recipe does just that:
Subsection('__import__\\(\'base64\'\\).b64decode\\(__import__\\(\'zlib\'\\).decompress\\(b[\'"].*?[\'"]\\)\\).decode\\(\\)',true,true,false)
Drop_bytes(0,63,false)
Find_/_Replace({'option':'Regex','string':'\'\\)\\).decode\\(\\)'},'',true,false,true,false)
Unescape_string()
Zlib_Inflate(0,0,'Adaptive',false,false)
From_Base64('A-Za-z0-9+/=',true,false)
Merge(true)
This works, and here’s the semi-deobfuscated code. It’s not 100%, but it’s definitely good enough:
from components.antidebug import AntiDebug
from components.browsers import Browsers
from components.discordtoken import DiscordToken
from components.injection import Injection
from components.startup import Startup
from components.systeminfo import SystemInfo
from config import __CONFIG__
def main():
Warning: Stack history is not empty!
Warning: block stack is not empty!
funcs = [
AntiDebug,
Browsers,
DiscordToken,
Injection,
Startup,
SystemInfo]
for func in funcs:
if __CONFIG__[func.__name__.lower()]:
try:
if [
iter(509)] * 3 or int.from_bytes == map((lambda O, i: 511 - int(O) + i)(map, .join(zip, [
iter(509)] * 3), range(1)), little, False, **('signed',)):
func(__CONFIG__[webhook])
else:
func()
finally:
continue
if Exception:
e = None
try:
print(Error in {}: {}.format(func.__name__, e))
finally:
e = None
del e
continue
e = None
del e
if None:
if None:
continue
return None
if __name__ == __main__:
main()
return None
There are a couple things I immediately notice:
Let’s start with that DiscordToken part first. Looking at the third line, it gets that function by loading the DiscordToken
function from the components.discordtoken
library. This means the code being ran is in a different file discordtoken.pyc
the malware includes that we need to decompile. I found it with find . -name 'discordtoken.pyc'
and ran pycdc
on it like normal.
This file is much, much bigger. Looking at the deobfuscated code, I see something interesting at the bottom:
A github link? And a name ‘Empyrean’? What’s this?
Just when you’ve thought you’ve seen it all. That github link is to an ‘‘’educational purpose only’’’ infostealer. So, this malware is actually just some script kiddie stealing already-existing malware (in python? really??) and configuring it to send the data to them. There’s not much more reason to deobfuscate this when the source is right here. The wonders of open-source, I guess.
We aren’t done just yet! Let’s look at that config file I mentioned earlier.
# Source Generated with Decompyle++
# File: config.pyc (Python 3.10)
__CONFIG__ = {
systeminfo: True,
startup: True,
injection: True,
discordtoken: True,
browsers: True,
antidebug: True,
webhook: https://discordapp.com/api/webhooks/[REDACTEDID]/[REDACTED] }
At the very end we get a juicy webhook. Once the malware finishes running, all your data is sent to this webhook, meaning it pops up in Mister Skid’s discord server. And interestingly enough (this is intended), sending a HTTP DELETE request to this endpoint deletes the webhook. Completely unauthenticated! If you don’t believe me, here’s an example webhook I made: Now, I’ll copy this webhook link, and send a DELETE request to it with CURL. Again, clearly no authentication is being done here: And if we go back: It’s gone! So, I don’t have the screenshot, but I quickly sent a delete request to the malware’s webhook and now it’s gone. Since the webhook is gone, that means the program has nothing to send the stolen data to and is essentially dormant on the computer.
Sharing is caring!