DEV IN PROGRESS

ROST rules' condition block

This page provides documentation about the condition: part of a ROST rule definition.

This final match condition is composed of one or several expressions which have to produce a final analysis status: does the scanned content drive to a match regarding to the conditions specified by a rule?

Each expression is processed during the scan and can build a more simple expression of itself when possible. For instance, a call to hash.md5() is replaced by a string literal as soon as the scan begins ; however, the (i + 1)th match $var[i], if available, for the $var pattern definition may require some time before being translated.

Some expressions naturally reduce into booleans:

  • the expression true and false produces a false value;
  • the expression (6 - 3 * 2) is equal to 0, which a common synonym of false;
  • empty intersection of sets equals to false;
  • expressions leading to error or undefined value produce a false state.

Basics for conditions

Basic expressions share a common feature: they all can be implcitly converted into booleans while the reduction process.

Literal values

A rule condition can rely on several basic types:

  • booleans: true, false;
  • signed and unsigned integers: -1, 0x10;
  • strings, which are handle as bytes as they may contain any kind of characters: "Hello\tWorld", "!";
  • regular expressions: /kb|mb|gb/i.

Strings can be handled as sets of characters if needed, so strings provide all the features of a regular set:

Example:

rule AlwaysTrue

{

    condition:

        "ABC"[2] == "B" and "x" in "x"

}

Regular expressions in the condition: section are not limited in the ones from the bytes: section: they follow the full POSIX Extended Regular Expression syntax.

All these basic types resolve to boolean values when reducted:

Type... resolves to... only if...
boolean true true
integer false its value is 0.
string false empty
regular expression false Regular expressions never translate to true directly.

Sets of expressions

In some cases it may be interesting to group items aimed to get processed by common operations computing a boolean state as condtion.

Such items are collected between brackets; for items relative to strings or rules, wildcards can also be applied:

Example:

// Define a list from the first prime numbers

(1, 3, 5, 7)


// Match the string names $foo1, $foo2, $foo3 and $bar

($foo*, $bar)


// Create a list of the MD5 checksum

( "e1faffb3e614e6c2fba74296962386b7", "2bb225f0ba9a58930757a868ed57d9a3" )

Each kind of sets provides an iterator over the contained expressions.

Sets resolve to the true boolean value when reducted, except for empty sets.

Sets can embed all kind of expressions. However, if a set contains only one item, its definition has to end with a trailing comma:

Example:

// Empty set

()


// Set with one item

( strings.lower("SINGLE"), )

Integer ranges

Ranges handle a set of integers in a compact form: such expressions only memorize start and end points.

Example:

// Define a set of 1024 integers

1 .. 1024


// Define all valid values for a signed char

-128 .. 127


// Create a range for each byte of the scanned data

1 .. datasize

A range reduces to a boolean value only if its start point is smaller than or equal to its end point.

Building conditional bricks

The condition for validating a rule may involve a sequence of expressions which have to compute a final boolean state.

Expressions can be built or mixed with several operators, from basic values or modules outputs, and are aimed to get combined in order to create a more complex expression.

The following table provides information about all the available operators and how the ROST grammar handles them. The smaller the precedence of an operator is, the more higher is the priority given to an expression involving this operator when the reduction process gets performed.

Precedence Operator Description Associativity
1 [] Array subscripting Left-to-right
. Structure member access
2 - Unary minus Right-to-left
~ Bitwise not
3 * Multiplication Left-to-right
/ Division
% Remainder
4 + Addition Left-to-right
- Subtraction
5 << Bitwise left shift Left-to-right
>> Bitwise right shift
6 & Bitwise AND Left-to-right
7 ^ Bitwise XOR Left-to-right
8 | Bitwise OR Left-to-right
9 < Lesser than Left-to-right
<= Lesser than or equal to
> Greater than
>= Greater than or equal to
10 contains String contains substring Left-to-right
startswith String starts with substring
endswith String ends with substring
icontains Like contains but case-insensitive
istartswith Like startswith but case-insensitive
iendswith Like endswith but case-insensitive
iequals Case-insensitive string comparison
matches String matches a regular expression
11 == Equal to Left-to-right
!= Not equal to
12 and Logical AND Left-to-right
13 or Logical OR Left-to-right

Performing computation

Arithmetic operations

The ROST grammar supports the classic arithmetic operators +, -, *, / and %, with their relative priority.

Example:

rule MulFirst

{

    // Should always be validated

    condition:

        1 + 4 * (3 + 2) == 21

        and

        (1 + 4) * (3 + 2) == 25

}





Bitwise operations





Integers are stored internally as 64-bit values, unsigned by default. So small values get promoted and this has to be kept in mind, especially when dealing with bitwise operators: ~0x01 is not 0xfe but 0xfffffffffffffffe during rule processing.





Extra operations









Mixing boolean expressions

If one of the relational operators == or != is used and one of the operands is a boolean expression, then a cast into a boolean value is performed against the second operand, and the relationship gets then computed.









Operations with strings

Various operations can be performed against strings:

  • <haystack> contains <needle>
  • <haystack> startswith <needle>
  • <haystack> endswith <needle>
  • <haystack> icontains <needle>
  • <haystack> istartswith <needle>
  • <haystack> iendswith <needle>
  • <haystack> matches <regex>
  • <haystack> iequals <needle>

Each keyword returns true if haystack contains, starts with or ends with a needle string. The "i" prefix denotes a case-insensitive operation.

The matches keyword implies a regular expression.

Case-insensitive string comparison can be done with the iequals keyword. For case-sensitive comparison, the == operator has to be used.

Available core functions

Quantity of items

The count function compute the size of given sets. Such sets may be produced by other functions, but also by strings (which are sets of chararacters) and search patterns.

The function can handle one or several arguments. The computed result is the sum of all items inside the provided arguments.

Here are some illustrations of use cases with the count function:

Example:

rule AlwaysTrue

{

    bytes:

        $int_01 = "\x01"

        $int_02 = "\x02"

        $int_3 = "\x03"


    condition:

        count("AB", "C") == count("ABC")

            and count($int_0*, $int_3) == #int_*

}

Size of available data

The quantity of scanned data can be retrieved with the datasize keyword, which returns a size expressed in bytes as an integer value. For compatibily with Yara, a filesize keyword is introduced as an alias to the generic datasize.

To ease writing, suffixes can be involved: KB for 1024 bytes, MB for 2^20 bytes and GB for 2^30 bytes. These suffixes are case-insensitive and can be used for any integer constants.

In order to speed up data processing, filtering content by size is a good way to skip scans for large contents without any expected matchs:

Example:

rule KeepSmallFiles

{

    condition:

        datasize <= 200KB

}

Most available item in a set

The number of the most available item in a set can be retrieved with the maxcommon function.

These sets can be provided as a list of arguments or by other functions, such as modpath.

Here is a condition example which should always drive to a match:

Example:

rule AlwaysTrue

{

    condition:

        maxcommon("AAA", "A") == 1

            and maxcommon("AA", 1, "AA", 2) == 2

            and maxcommon("AA", 1, "AA", 2, 1, False, 1) == 3

}

Read values from binary content

Words of data can be read from the scanned data in order to produce literal integer expressions. These expressions can then get used for rule conditions.

Several functions are available:

  • uint8(<offset>)
  • int8(<offset>)
  • uint16(<offset>)
  • int16(<offset>)
  • uint16be(<offset>)
  • int16be(<offset>)
  • uint32(<offset>)
  • int32(<offset>)
  • uint32be(<offset>)
  • int32be(<offset>)
  • uint64(<offset>)
  • int64(<offset>)
  • uint64be(<offset>)
  • int64be(<offset>)

All these functions read 8, 16, 32 or 64 bits integers, signed or unsigned, from data at a given offset position from the start of the scanned binary content. The considered encoding is little-endian by default; big-endian can be handled with the functions with a "be" suffix.

The offset argument is an expression which can be either a literal integer value or an expression providing such a value.

For instance, here is a rule matching PE files:

Example:

rule IsPE

{

    condition:

        // MZ signature at offset 0 and ...

        uint16(0) == 0x5a4d and

        // ... PE signature at offset stored in the MZ header at offset 0x3c

        uint32(uint32(0x3c)) == 0x00004550

}

























Conditions on bytes patterns

Pattern properties

For a given pattern definition $var, several properties can be implied in a conditional expression:

Expression Meaning Production Comment
$var Main reference to the results of the searched pattern. [empty] set of bytes -
#var Number of times the pattern $var has been identified. integer The expression is an alias for count($var).
@var Start location of all matched patterns. [empty] set of integers -
!var Length of all matches, which may vary when regular expressions are used as patterns. [empty] set of integers -
~var End location of all matched patterns. [empty] set of integers -

Fuzzy labels can be used to point to several patterns at once: for instance, @var* could refer to @var1, @var2 and @var_X. More sources are then considered while processing and extend the produced results.

Each prefix can also be applied to individual items pointed by an expression which should reduces itself to an integer:

Expression Meaning Production Comment
$var[<expr>] Selected result of the searched pattern. bytes -
#var[<expr>] Number of appearance of one identified pattern. integer (1) This kind of valid expression should not be very useful as the resulting value will always be 1 if count($var) >= <expr>.
@var[<expr>] Location of a selected indentified pattern. integer -
!var[<expr>] Length of a selected indentified pattern. integer -
~var[<expr>] End of a selected indentified pattern. integer This location provides an offset pointing to the byte next after the last matched byte of the pattern.

For instance, the follow rule triggers only if:

  • there are any amount of matches for $a, $b1 and $b2;
  • there are at leat 11 matches for the $c pattern;
  • the 11th match for $c does not end at offset 123.

Example:

rule DummyMatching

{

    bytes:

        $a = "dummyA"

        $b1 = "dummyB_1"

        $b2 = "dummyB_2"

        $c = "dummyc"


    condition:

        $a and $b* and ~c[10] != 123

}

Filtering on global status

<expected> of <set>

The set definition can be replaced by the special keyword them, refering to all defined search patterns.

The expected part is defined using one of the following expressions:

Expression Meaning
none None of the provided patterns has be found.
any At least one pattern has one or more matches.
all All the provided patterns have been found at least once.
n At least n patterns have one or more matches.
























Special keywords

















Strings and referenced rules

















Referencing other rules

Because writing rules share common points with coding, some basic bricks can be defined within simple rules. More complex rules can then be built by combining other rules, and each of these simple rules can be linked into the complex one like external variables:

Example:

rule SimpleRule

{

    condition:

        uint32(0) == 0x464c457f and datasize < 1MB

}


rule ComplexRule

{

    strings:

        $d = "dummy"


    condition:

        $d and SimpleRule

}

In the previous example, the ComplexRule rule matches all files beginning with an ELF, smaller than 1Mb and containing the bytes sequence "dummy".

Note

As names inside conditions are resolved at runtime, it is not required to define a rule before referencing it into other rules.

Indeed, when the resolution process is launched, all rules are loaded and all their names are known.

A wildcard can be used to match several rules at once:

Example:

rule ElfHeader_byte0

{

    condition:

        uint8(0) == 0x7f

}


rule ElfHeader_byte1

{

    condition:

        uint8(1) == 0x45

}


rule ElfHeader_byte2

{

    condition:

        uint8(2) == 0x4c

}


rule ElfHeader_byte3

{

    condition:

        uint8(3) == 0x46

}


rule ElfHeader

{

    condition:

        all of (ElfHeader_byte*)

}