Introducing an alternative to YARA: ROST

The development of Chrysalide remains active, and its analysis process has been lately focused on binary scanning.

On this matter, the YARA tool from VirusTotal is commonly adopted, but the program evolves quite slowly and any module addition requires recompiling the whole project. So ROST gets created with these key principles in mind:

allowing to define new functions and namespaces as keywords for the match conditions;
offering extension capabilities through native modules or Python scripts;
making easier to process scan results, with an optional JSON output;
supporting a lot of formats, relying on definitions from Kaitai Struct for the parsing process.

The ROST's grammar is similar to the YARA`s one, with some minor incompatibilities. A provided tool, yara2rost, may help to translate on the fly existing YARA rules without effort.

Here is a mandatory Hello World example to show ROST in action:

$ echo "Hello world!" | iconv -f ascii -t utf-16le > hello.bin

$ cat hello.rost

rule HelloWorld {

    bytes:
        $w = "world" wide

    condition:
        $w

}

$ rost -j hello.rost hello.bin | jq '.[] | [ .matched, .bytes_patterns[0].matches[0].offset ]'
[
  true,
  12
]

Regular output is also available:

$ rost -s hello.rost hello.bin
HelloWorld hello.bin
0xc:$w: w\x00o\x00r\x00l\x00d\x00

Even if some important features are still missing (such as support for loops or regular expressions), ROST is currenly usable and this blog post highlights a few new capabilities through real world cases.

For more information, the documentation provides all the details of the implementation current state.

For the record, this article is based on commit ab6b87b7, so you can give this version of ROST a try by installing one of the available packages or by running:

git clone http://git.0xdeadc0de.fr/chrysalide.git
cd chrysalide
git checkout ab6b87b7

Three real world use cases

Reverse engineering usually targets regular binaries when auditing applications, but this kind of activity also helps documenting malware behaviour. Scanning binaries with tools such as YARA or ROST is then the first step to identify malwares.

The next cases illustrate the benefits of using ROST for hunting samples.

All the processed samples are available in an online repository. If needed, please ask @vxunderground for the ZIP passwords!

Keeping simple patterns using modifiers

Dynamic API resolution is commonly used by malware authors, as there is no string of the function to import in the malware. Function names are replaced by pre-computed hashes, which get compared to runtime-computed hashes of all exported functions from a loaded DLL. If there is a match, then the function to import is recovered! This technique is referenced by the Mitre Att&ck as T1027.007.

There are plenty of hash algorithms which can be used for dynamic resolution purpose: CRC32 (Panda), DJB2 (GULoader), MurmurHash2 (LummaC2), aso. The trend is to replace such algorithms by custom implementations, in order to make analysis more difficult.

However, the old and simple ROR13 algorithm is still used, as show recently by Mandiant with an alert about phishing operations from APT29. As explained by EclecticIQ, the dropped malware even rely on ROR13 to hide three DLL names: Kernel32.dll, Ntdll.dll, User32.dll.

Here these names are processed as UTF-16 strings and translated into upper-case before getting hashed, but variants exist:

the old Buer hashed plain upper-case ASCII strings;
TrickBot uses plain upper-case bytes too, but stores the result as string;
LockBit performs a XOR opération to add extra obfuscation;
Mustang Panda computes hashes without changing the case;
Metasploit combines two ROR13 hashes (one for the DLL name and one for the function name).

The involved root hashing code is really basic:

def ror(dword, bits):
    return ( dword >> bits | dword << ( 32 - bits ) ) & 0xffffffff

def ror13_csum(name):
    state = 0;
    for b in name:
        state = b + ror(state, 0xd)
    return state

So it may be interesting to keep the content of a scan rule targetting such API hashing as basic as possible: define one pattern, detect many hashes sounds like a good principle which helps to keep rules short and readable.

ROST implements an idea which comes from a 3-year old RFC for composable modifiers in YARA. This implementation allows to chain pattern modifications. For instance, the next rule covers the ROR13 hash of both ASCII and UTF-16 upper-case strings:

$ cat apihash.rost

rule ApiHashing {

     bytes:
        $imp_00 = "Kernel32.dll" plain wide | upper | ror13
        $imp_01 = "Ntdll.dll" plain wide | upper | ror13
        $imp_02 = "User32.dll" plain wide | upper | ror13

     condition:
        all of them

}

All the pre-computed values are detected inside the mso.dll binary:

$ rost -s apihash.rost Malwares/ae79aa17e6f3cc8e816e32335738b61b343e78c20abb8ae044adfeac5d97bf70
ApiHashing Malwares/ae79aa17e6f3cc8e816e32335738b61b343e78c20abb8ae044adfeac5d97bf70
0xc31:$imp_00: [\xbcJj
0xc41:$imp_01: ]h\xfa<
0x232e:$imp_01: ]h\xfa<
0xc51:$imp_02: \x83B\xc8c

With more hash algorithms included, there can be plenty of modification combinations. In order to ensure matchs share a common modification process, the modpath and maxcommon keywords may be useful.

Moreover, pre-computing a large quantity of Windows API function hashes is an idea about which Mandiant has published a long time ago, as well as Kaspersky more recently. Other repositories keep adding implementations for new algorithms. Finally, to build good hash list, the apihash_to_yara project provides a file with the TOP 100 Windows API functions found in samples hosted on Malpedia, which may be a good start.

Matching LNK files using Kaitai

Windows shortcut files are quite popular among attackers: easy to deploy, these .lnk files usually do not contain explicit mailicous payload and thus tend to not raise suspicion. The shortcuts are currently a good alternative to well known Microsoft Office macros in order to achieve execution, and the technique is documented in the Mitre Att&ck as T1204.002.

Here are a few recent articles about such LNK usage:

PlugX (EclecticIQ, 02/23);
DUCKTAIL (Deep Instinct, 03/23);
IcedID (The DFIR Report, 04/23);
RATs and stealers (Cyble, 08/23).

A thread on Twitter provides a few hints about hunting malicious LNK files: dates, machine ID, commands wih lolbins, aso. Talos has also written a comparative analysis between several shortcut builders, showing all the format features which can be exploited from the LNK metadata in order to track malicious actor.

Such data can be explored from command line:

$ exiftool 2c0273394cda1b07680913edd70d3438a098bb4468f16eebf2f50d060cdf4e96 
ExifTool Version Number         : 12.55
File Name                       : 2c0273394cda1b07680913edd70d3438a098bb4468f16eebf2f50d060cdf4e96
Directory                       : .
File Size                       : 2.2 kB
File Modification Date/Time     : [REDACTED]
File Access Date/Time           : [REDACTED]
File Inode Change Date/Time     : [REDACTED]
File Permissions                : -rw-r--r--
File Type                       : LNK
File Type Extension             : lnk
MIME Type                       : application/octet-stream
Flags                           : IDList, LinkInfo, RelativePath, WorkingDir, CommandArgs, IconFile, Unicode, ExpIcon, TargetMetadata
File Attributes                 : Archive
Create Date                     : 2022:01:20 06:01:22+01:00
Access Date                     : 2022:11:21 09:55:57+01:00
Modify Date                     : 2022:01:20 06:01:22+01:00
Target File Size                : 289792
Icon Index                      : 13
Run Window                      : Show Minimized No Activate
Hot Key                         : (none)
Target File DOS Name            : cmd.exe
Drive Type                      : Fixed Disk
Drive Serial Number             : 7659-109E
Volume Label                    : 
Local Base Path                 : C:\Windows\System32\cmd.exe
Relative Path                   : ..\..\..\..\..\Windows\System32\cmd.exe
Working Directory               : %cd%
Command Line Arguments          : /q /c "System Volume Information\  \test2022.ucp"
Icon File Name                  : .\iSsRFuZJGwxgCAjNKJmqfVC.doc

Using the LNK format definition provided by the Kaitai project, ROST is able to parse shortcut files and to evaluate match conditions about them without any dedicated module.

At first, the LNK format definition has to be retrieved from Kaitai Struct and setup:

$ git clone https://github.com/kaitai-io/kaitai_struct_formats.git
$ ln -s windows_lnk_file.ksy kaitai_struct_formats/windows/lnk.ksy
$ export KSPATH=$PWD/kaitai_struct_formats/windows

Then, a rule has to be created to detect some particular properties of .lnk files linked to malwares.As most fields are UTF-16 encoded, here is one way to write a rule detecting the previous LNK sample:

$ cat badlnk.rost

rule MaliciousLnk
{
    meta:
        hash = "2c0273394cda1b07680913edd70d3438a098bb4468f16eebf2f50d060cdf4e96"

    condition:

        // Run cmd.exe
        kaitai.lnk.rel_path.str endswith string.wide("cmd.exe")

           // Target something.ucp
           and kaitai.lnk.arguments.str contains string.wide(".ucp")

           // Show Minimized No Activate
           and kaitai.lnk.header.show_command == 0x7

}

Here the kaitai.lnk namespace refers to the lnk.ksy definition file pointed by $KSPATH.

Running ROST shows detection as expected:

$ rost -j badlnk.rost Malwares/2c0273394cda1b07680913edd70d3438a098bb4468f16eebf2f50d060cdf4e96
[
   {
      "name": "MaliciousLnk",
      "tags": [],
      "target": "Malwares/2c0273394cda1b07680913edd70d3438a098bb4468f16eebf2f50d060cdf4e96",
      "bytes_patterns": [
      ],
      "matched": true
   }
]

Detect ZIP bombs by extending the grammar

In a recent technical article, the Deep Instinct's Threat Research team provides details about fresh Emotet samples involved in a malicious spam.

The malware is delivered as a ZIP archive attachment. The archive is pretty small:

$ stat -c "%s" be2265d9*zip
698149

But the Word document inside this archive is heavy ! More than 500Mb are thus extracted from a 700kb ZIP file:

$ unzip -l be2265d9*zip
Archive:  be2265d9422d84713470e38e18184e8c5e035dc5a5db0f44760084fbcf52f701.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
557117952  2023-03-08 06:05   WY-4644 report.doc
---------                     -------
557117952                     1 file

The kind of trick is well known : an overlay of zeros appended at the end of the file to inflate its size. The expected side effet is to make some security products and sandboxes skip scanning such big content and thus detection is avoided.

Relying on the uint() keyword is here not enough to parse binary content and to match such kind of archives.

The Kaitai format specification for ZIP archive may help to detect ZIP bomb, however it may be more simple to write a plug for this task: a native one or a Python Script. The later is chosen for this blog article, so detecting ZIP bombs using Kaitai is left as an exercise for the reader for the moment.

Here is the full simple Python code registering a new keyword and implementing the logic to detect a ZIP bomb inside ROST:

import io
import zipfile

from pychrysalide.analysis.scan import ScanRegisteredItem
from pychrysalide.analysis.scan.exprs import ScanLiteralExpression
from pychrysalide.core import get_rost_root_namespace
from pychrysalide.plugins import PluginModule

class ZipBombKeyword(ScanRegisteredItem):
    """Introduce a new keyword in the ROST grammar."""

    _name = 'is_zip_bomb'

    def _reduce(self, ctx, scope):

        expr = None

        try:

            has_bomb = False

            # compress < 10% plain => bomb
            ZIP_BOMB_RATIO = 0.01

            data = io.BytesIO(ctx.content.data)

            zf = zipfile.ZipFile(data)

            info = zf.infolist()

            for i in info:

                ratio = i.compress_size / i.file_size

                has_bomb = ratio < ZIP_BOMB_RATIO

                if has_bomb:
                    break

            expr = ScanLiteralExpression(has_bomb)

        except:
            pass

        return expr

class ZipBombChecker(PluginModule):
    """Check for ZIP bombs."""

    _name = 'ZipBombChecker'
    _desc = 'Check if a ZIP archive may contain heavy files on extraction'
    _version = '0.1'
    _url = 'https://www.chrysalide.re/'

    _actions = ( )

    def __init__(self):
        """Create a ZipBombChecker plugin instance."""

        super().__init__()

        ns = get_rost_root_namespace()

        kw = ZipBombKeyword()

        ns.register_item(kw)

This code is available online. The following commands retrieve it and make ROST aware of its location:

$ git clone http://git.0xdeadc0de.fr/snippets.git
$ export PYTHONPATH=$PWD/snippets/python-plugins

The newly defined is_zip_bomb keyword can then be embedded into rules:

$ cat bomb.rost

rule ZipBomb {

    condition:
        is_zip_bomb

}

Once again, running ROST shows detection as expected:

$ rost bomb.rost Malwares/be2265d9422d84713470e38e18184e8c5e035dc5a5db0f44760084fbcf52f701.zip
ZipBomb Malwares/be2265d9422d84713470e38e18184e8c5e035dc5a5db0f44760084fbcf52f701.zip

What is next?

ROST works but is not yet a mature product. It still requires development, documentation and feedback!

Beyong extending its grammar, the next steps for ROST may be driven by the following ideas:

understanding why the implemented Bitap does not perform better;
introducing a real pre-processor for the rule's content;
automatically including enumerations provided by Kaitai's definitions.

Posted on October 13, 2023 at 4:36