DEV IN PROGRESS

Definition of search patterns for ROST

This page provides documentation about the bytes: part of a ROST rule definition.

This bytes: section defines search patterns aimed to identify portions of binary contents. The match or rejection conditions are later stated in the the condition: part of a rule.

There are three kinds of patterns:

  • textual ones, to target human-readable content;
  • hexadecimal ones, suitable for binary files or executable codes for instance;
  • regular expressions based ones, providing a way to build complex textual patterns.

String patterns

Basic syntax

String patterns are the most simple way to create a search pattern.

The syntax is the following one: $<id> = "<text>" [<modifiers>] [<flags>].

The last two fields are optional and detailed in the next sections, so a simple rule to look for any "hello" bytes inside a text file is:

Example:

rule Hello

{

    bytes:

        $str = "hello"


    condition:

        $str

}

The choice of the pattern identifier ($str here) is up to the writer, but has to be unique inside the containing rule.

Warning

The text defined in a string pattern is always interpreted as ASCII encoded, and therefore processed as raw bytes.

This should be fine in most cases, but any modifier working character per character may not produce the expected final pattern with UTF-8 encoded text for instance.

The use of a backslash helps to insert some interpreted sequences into a string pattern, if needed:

Sequence Generated byte Description
\a 0x07 Alert (bell)
\b 0x08 Backspace
\t 0x09 Horizontal tabulation
\n 0x0a New line
\v 0x0b Vertical tabulation
\f 0x0c Form feed
\r 0x0d Carriage return
\e 0x1b Escape
\" 0x22 Double quote
\\ 0x5c Backslash
\xNN 0xNN Raw byte from a two-digit hexadecimal value

Available modifiers

String patterns defined in a rule act as sources which can be used to produce other patterns automatically thanks to various modifiers.

Here is a list of these modifiers, and their relative effect:

Modifier Operation Plugin Production for "AbcDef"
base64 All possible base64 encodings encodings QWJjRGVm, FiY0RlZg, BYmNEZWY
hex Hexadecimal version using 2-digit representation - 416263446566
plain Source as this, which may be useful in case of chained multi modifiers - AbcDef
rev Orginal text in reversed order - feDcbA
ror13 ror13 operation on patterns apihashing \x871.I'
wide Emulation of UTF-16 by interleaving null (0x00) bytes - A\x00b\x00c\x00D\x00e\x00f\x00
xor XOR operation with a 1-byte key - AbcDef, @cbEdg, C`aFgd, ...

If defined, the quoted plugin is the ROST internal source of the involved modifier. This source is provided for information only as it is fully transparent for the user.

Some modifiers may accept one or several optional arguments:

  • base64: a 64-byte long custom alphabet;
  • xor: a 256-bit value or a range of values.

If a list of arguments is provided, the modifier is called as many times as there are items in the list.

Example:

rule Hello

{

    bytes:

        // The next line will procude 1 + 1 + 6 + 3 = 11 final patterns to search

        $str = "hello" rev xor(0xfe) xor(0xaa, 1 .. 0x5) base64


    condition:

        $str

}





Final flags

Located at the end of a definition, a few flags can be linked to a string pattern:

Label Meaning
fullword Match has to be not preceded or followed by an alphanumeric character
nocase Ignore the case of the string pattern
private Results are only used to build match condition but will never be displayed

As its name suggests, the fullword keyword is aimed to select matches which refer to plain words. For instance, with the following rule, "Othello is worldwide known" gives a match for the $str_0 pattern but not for the $str_1 one.

Example:

rule Hello

{

    bytes:

        $str_0 = "hello"

        $str_1 = "world" fullword


    condition:

        all of ($str_*)

}

As a note, if the goal of the rule is to match the "hello world" string, the following rule is more suitable and optimized? With the extra nocase flag, the rule will match "hello world" and "Hello World!" as well:

Example:

rule OptimizedHello

{

    bytes:

        $str = "hello world" fullword nocase


    condition:

        $str

}

The private keyword allows to hide some results from the final output, as shown by the following shell commands:

Shell:

$ cat > /tmp/Hello.rost << EOF
rule Hello
{
    bytes:
        \$str_0 = "hello" private
        \$str_1 = "world"

    condition:
        \$str_0 and \$str_1 and console.log(#str_0, " ", #str_1)
}
EOF


$ echo "Othello is worldwide known" > /tmp/Hello.txt


$ rost -s /tmp/Hello.rost /tmp/Hello.txt
0xb:$str_1: world
0x1 0x1
Rule 'Hello' has matched!

Binary patterns






Rejecting specific values

The ~ operator allows to reject a defined binary pattern. It can be applied to bytes and masked bytes; for instance:

  • ~41 rejects matches with the letter "A";
  • ~?0 only matches bytes whose the second nibble is not zero.

So the following rules provides the same results, even if the later is more optimized then the former:

Example:

rule Search
{
    bytes:
        $want = { 41 ?? ?? 64 }
        $dont_want = { 41 ?? 63 64 }

    condition:
        $want and for all of @want : ( not( $ in @dont_want ) )
}


rule OptimizedSearch
{
    bytes:
        $want = { 41 ?? ~63 64 }

    condition:
        $want
}

but provides no distributive property.

Example:

rule Hello

{

    bytes:

        $a = { ~41 ~?2 }

        $b = { ~( 41 ?2 ) }


    condition:

        all of them

}






Involving ranges

A regular range [m-n] stands for any byte sequence of length between m and n.

The negative version produced by the ~ operator then drives to any byte sequence of length not between m and n.

The following table summarizes all the possible cases involving ranges:

Definition pattern Description
~[m-n] alone Create matches with all the byte sequences with size from 0 to (m - 1) and (n + 1) to datasize
~[m-n] pattern Extend the current matches when the pattern is not starting at an offset between m and n after the previous pattern (or after the beginning of the content if the scan gets started).
pattern ~[m-n] Extend the current matches when the pattern is not ending at an offset between m and n before the next matches (or before the end of the content if there is no next pattern).
~( pattern ) Extend the current matches when pattern (which may include ranges) does not match

Warning

As computing and building all the combinations of byte sequences excluding only one byte range may be very expensive, the first case from the previous table gets forbidden.

The negative version produced by the ~ operator drives then any byte sequence of length not between m and n.

So { ?? } is a valid binary pattern definition and { ~?? } is not.











For instance, ~41 will match bytes which are not "A" and ~?2 will match any byte whose second nibble is not equal to 2.

from 4 to 6 bytes can occupy the position of the jump. Any of the following strings will match the pattern: F4 23 01 02 03 04 62 B4 F4 23 00 00 00 00 00 62 B4 F4 23 15 82 A3 04 45 22 62 B4 Any jump [X-Y] must meet the condition 0 <= X <= Y. In previous versions of YARA both X and Y must be lower than 256, but starting with YARA 2.0 there is no limit for X and Y.

All the previously kinds of patterns are handled, but those which refer to any bytes (in other words any number of wildcards ?? or any range [n-m]) get a special meaning: