Who’s Talking about Polyglot Files? 
W

In recent investigations by the Deep Instinct team, a steady rise in polyglot files with malicious JARs are causing concern. But the most concerning aspect for a lot of people reading this will be – “what is a polyglot file and why should I be worried?” 

Don’t worry, we’ve got you. 

This week, we’re exploring ployglot files and examining ways in which the adversary has apparently started to abuse them. Next week, we’ll look over how you can identify malicious polyglot files and how to deal with them. 

Wait – what’s a polyglot file? 

Great question! Polyglot isn’t a common word in English, so when we then get ultraspecific and start to use it in relation to computing, we’re even likelier to confuse people. So, let’s start at the basics. 

A polyglot is anyone who speaks multiple languages, although it generally refers to people who speaks three or more. Coming from the Greek word for many languages, the history of the word is pretty obvious. As the majority of _secpro readers are multilingual, you should understand the issues we polyglots face when we’re interacting with people who don’t share all the languages we understand. 

But what’s a polyglot file? A text document written in English, French, and German? Not quite… 

Just like humans can speak multiple languages, a computer can “speak” multiple languages. The odds are that many of our readers are very comfortable with most popular languages that computers can interpret – Python, Java, Go, Malbolge (if you’re a sadist…). When a programmer includes two or more formats within a file, they are creating a polyglot file. It’s all about combining syntax from different formats. 

What’s up with polyglot files? 

At face value, these polyglot files aren’t out-and-out malicious. They have many useful applications, albeit that these uses are generally fringe cases or simply acts of intellectual or scholarly experimentation. Some interesting suggestions include combining HTML5 and XHTML to allow documents to be interpreted as either HTML or XML.  

But, despite these edge cases, polyglot files cause deep issues for security teams. They can easily be used to bypass validation and to exploit vulnerabilities. By understanding the internal architecture of a business, a threat actor could easily create a file that is seemingly innocent (even to initial human inspection). However, any malicious polyglottery potentially wouldn’t be captured until it’s too late. And here’s where our story begins…

How has the adversary been using polyglot files? 

Well, this isn’t actually new. A quick search of VirusTotal shows us that this “MSI+JAR polyglot technique” was discovered in 2019. It has also been assigned a CVE number – CVE-2020-1464, in fact. Hopefully, you like any good cybersecurity expert are thinking “well, why are you telling me about this, then?” 

As Deep Instinct noted, the fix seems to have been inadequate. Because of the fluid nature of the exploit, this has been changed to just get around the problems that we thought were solved. Let’s investigate. 

What are JAR files problematic? 

If you ask the most cynical programmer what a JAR file is, they might say that it’s just a different kind of ZIP archive. And what’s the problem with ZIP files? When we attach junk code or files to the beginning of the ZIP, the victim computer might not recognise it as corrupted, damaged, or otherwise problematic. 

Over 2022, Deep Instinct tracked both StrRAT and Ratty (both imaginatively named remote access trojans) infiltrating systems via the polyglot tactic or through JAR files. These two known threats seem to both report back to the same C2 server, implying that this is one group working to focus exploitation against a vulnerability they’ve uncovered. While an in-depth analysis of StrRAT and Ratty would be useful to explain the issue, you will have to wait until next week to find out the specifics about these malicious files. 

How are these polyglot files being used? 

While you will have to wait for the specifics, here is a rundown of the relevant MITRE ATT&CK techniques that we have seen from this mysterious, multilingual threat actor gang. 

Initial access 

T1566.002 – Phishing: Spearphishing 

Using URL shorteners to trick unwitting victims, cybersecurity pros have noticed this maicious link in different malicious environments: Rebrand[.]ly/afjlfvp 

Defense evasion 

T1036.001-Masquerading: Invalid Code Signature 

A signed MSI file is used to evade detection, observed as: 85d8949119dad6215ae0a21261b037af 

T1027.001 Obfuscated Files or Information: Binary Padding 

JAR files love junk data. The junk data in question transforms the file command into a command which forces the computer to return a different file type. It has been observed as: cb17f27671c01cd27a6828faaac08239 

Command and Control (C2) 

T1102 – Web Service 

Using Discord as a C2 centre isn’t a new trick, but I still get surprised every time I see it. Observed as: https://cdn[.]discordapp[.]com/attachments/938795529683480586/941658014962823208/Package_info[.]jar 

Stay up to date with the latest threats

Our newsletter is packed with analysis of trending threats and attacks, practical tutorials, hands-on labs, and actionable content. No spam. No jibber jabber.