ROST rules' condition block

This page provides documentation about the condition: part of a ROST rule definition.

This final match condition is composed of one or several expressions which have to produce a final analysis status: does the scanned content drive to a match regarding to the conditions specified by a rule?

Each expression is processed during the scan and can build a more simple expression of itself when possible. For instance, a call to hash.md5() is replaced by a string literal as soon as the scan begins ; however, the (i + 1)^th match $var[i], if available, for the $var pattern definition may require some time before being translated.

Some expressions naturally reduce into booleans:

the expression true and false produces a false value;
the expression (6 - 3 * 2) is equal to 0, which a common synonym of false;
empty intersection of sets equals to false;
expressions leading to error or undefined value produce a false state.

Basics for conditions

Basic expressions share a common feature: they all can be implcitly converted into booleans while the reduction process.

Literal values

A rule condition can rely on several basic types:

booleans: true, false;
signed and unsigned integers: -1, 0x10;
strings, which are handle as bytes as they may contain any kind of characters: "Hello\tWorld", "!";
regular expressions: /kb|mb|gb/i.

Strings can be handled as sets of characters if needed, so strings provide all the features of a regular set:

Example:

rule AlwaysTrue

{

condition:

"ABC"[2] == "B" and "x" in "x"

}

Regular expressions in the condition: section are not limited in the ones from the bytes: section: they follow the full POSIX Extended Regular Expression syntax.

All these basic types resolve to boolean values when reducted:

Type...	resolves to...	only if...
boolean	`true`	`true`
integer	`false`	its value is `0`.
string	`false`	empty
regular expression	`false`	Regular expressions never translate to `true` directly.

Sets of expressions

In some cases it may be interesting to group items aimed to get processed by common operations computing a boolean state as condtion.

Such items are collected between brackets; for items relative to strings or rules, wildcards can also be applied:

Example:

// Define a list from the first prime numbers

(1, 3, 5, 7)

// Match the string names $foo1, $foo2, $foo3 and $bar

($foo*, $bar)

// Create a list of the MD5 checksum

( "e1faffb3e614e6c2fba74296962386b7", "2bb225f0ba9a58930757a868ed57d9a3" )

Each kind of sets provides an iterator over the contained expressions.

Sets resolve to the true boolean value when reducted, except for empty sets.

Sets can embed all kind of expressions. However, if a set contains only one item, its definition has to end with a trailing comma:

Example:

// Empty set

()

// Set with one item

( strings.lower("SINGLE"), )

Integer ranges

Ranges handle a set of integers in a compact form: such expressions only memorize start and end points.

Example:

// Define a set of 1024 integers

1 .. 1024

// Define all valid values for a signed char

-128 .. 127

// Create a range for each byte of the scanned data

1 .. datasize

A range reduces to a boolean value only if its start point is smaller than or equal to its end point.

Building conditional bricks

The condition for validating a rule may involve a sequence of expressions which have to compute a final boolean state.

Expressions can be built or mixed with several operators, from basic values or modules outputs, and are aimed to get combined in order to create a more complex expression.

The following table provides information about all the available operators and how the ROST grammar handles them. The smaller the precedence of an operator is, the more higher is the priority given to an expression involving this operator when the reduction process gets performed.

Precedence	Operator	Description	Associativity
1	`[]`	Array subscripting	Left-to-right
1	`.`	Structure member access	Left-to-right
2	`-`	Unary minus	Right-to-left
2	`~`	Bitwise not	Right-to-left
3	`*`	Multiplication	Left-to-right
	`/`	Division
	`%`	Remainder
4	`+`	Addition	Left-to-right
4	`-`	Subtraction	Left-to-right
5	`<<`	Bitwise left shift	Left-to-right
5	`>>`	Bitwise right shift	Left-to-right
6	`&`	Bitwise AND	Left-to-right
7	`^`	Bitwise XOR	Left-to-right
8	`\|`	Bitwise OR	Left-to-right
9	`<`	Lesser than	Left-to-right
	`<=`	Lesser than or equal to
	`>`	Greater than
	`>=`	Greater than or equal to
10	`contains`	String contains substring	Left-to-right
	`startswith`	String starts with substring
	`endswith`	String ends with substring
	`icontains`	Like `contains` but case-insensitive
	`istartswith`	Like `startswith` but case-insensitive
	`iendswith`	Like `endswith` but case-insensitive
	`iequals`	Case-insensitive string comparison
	`matches`	String matches a regular expression
11	`==`	Equal to	Left-to-right
11	`!=`	Not equal to	Left-to-right
12	`and`	Logical AND	Left-to-right
13	`or`	Logical OR	Left-to-right

Performing computation

Arithmetic operations

The ROST grammar supports the classic arithmetic operators +, -, *, / and %, with their relative priority.

Example:

rule MulFirst

{

// Should always be validated

condition:

1 + 4 * (3 + 2) == 21

and

(1 + 4) * (3 + 2) == 25

}

Bitwise operations

Integers are stored internally as 64-bit values, unsigned by default. So small values get promoted and this has to be kept in mind, especially when dealing with bitwise operators: ~0x01 is not 0xfe but 0xfffffffffffffffe during rule processing.

Extra operations

Mixing boolean expressions

If one of the relational operators == or != is used and one of the operands is a boolean expression, then a cast into a boolean value is performed against the second operand, and the relationship gets then computed.

Operations with strings

Various operations can be performed against strings:

<haystack> contains <needle>
<haystack> startswith <needle>
<haystack> endswith <needle>
<haystack> icontains <needle>
<haystack> istartswith <needle>
<haystack> iendswith <needle>
<haystack> matches <regex>
<haystack> iequals <needle>

Each keyword returns true if haystack contains, starts with or ends with a needle string. The "i" prefix denotes a case-insensitive operation.

The matches keyword implies a regular expression.

Case-insensitive string comparison can be done with the iequals keyword. For case-sensitive comparison, the == operator has to be used.

Available core functions

Quantity of items

The count function compute the size of given sets. Such sets may be produced by other functions, but also by strings (which are sets of chararacters) and search patterns.

The function can handle one or several arguments. The computed result is the sum of all items inside the provided arguments.

Here are some illustrations of use cases with the count function:

Example:

rule AlwaysTrue

{

bytes:

$int_01 = "\x01"

$int_02 = "\x02"

$int_3 = "\x03"

condition:

count("AB", "C") == count("ABC")

and count($int_0*, $int_3) == #int_*

}

Size of available data

The quantity of scanned data can be retrieved with the datasize keyword, which returns a size expressed in bytes as an integer value. For compatibily with Yara, a filesize keyword is introduced as an alias to the generic datasize.

To ease writing, suffixes can be involved: KB for 1024 bytes, MB for 2^20 bytes and GB for 2^30 bytes. These suffixes are case-insensitive and can be used for any integer constants.

In order to speed up data processing, filtering content by size is a good way to skip scans for large contents without any expected matchs:

Example:

rule KeepSmallFiles

{

condition:

datasize <= 200KB

}

Most available item in a set

The number of the most available item in a set can be retrieved with the maxcommon function.

These sets can be provided as a list of arguments or by other functions, such as modpath.

Here is a condition example which should always drive to a match:

Example:

rule AlwaysTrue

{

condition:

maxcommon("AAA", "A") == 1

and maxcommon("AA", 1, "AA", 2) == 2

and maxcommon("AA", 1, "AA", 2, 1, False, 1) == 3

}

Read values from binary content

Words of data can be read from the scanned data in order to produce literal integer expressions. These expressions can then get used for rule conditions.

Several functions are available:

uint8(<offset>)
int8(<offset>)
uint16(<offset>)
int16(<offset>)
uint16be(<offset>)
int16be(<offset>)
uint32(<offset>)
int32(<offset>)
uint32be(<offset>)
int32be(<offset>)
uint64(<offset>)
int64(<offset>)
uint64be(<offset>)
int64be(<offset>)

All these functions read 8, 16, 32 or 64 bits integers, signed or unsigned, from data at a given offset position from the start of the scanned binary content. The considered encoding is little-endian by default; big-endian can be handled with the functions with a "be" suffix.

The offset argument is an expression which can be either a literal integer value or an expression providing such a value.

For instance, here is a rule matching PE files:

Example:

rule IsPE

{

condition:

// MZ signature at offset 0 and ...

uint16(0) == 0x5a4d and

// ... PE signature at offset stored in the MZ header at offset 0x3c

uint32(uint32(0x3c)) == 0x00004550

}

Conditions on bytes patterns

Pattern properties

For a given pattern definition $var, several properties can be implied in a conditional expression:

Expression	Meaning	Production	Comment
`$var`	Main reference to the results of the searched pattern.	[empty] set of bytes	-
`#var`	Number of times the pattern `$var` has been identified.	integer	The expression is an alias for `count($var)`.
`@var`	Start location of all matched patterns.	[empty] set of integers	-
`!var`	Length of all matches, which may vary when regular expressions are used as patterns.	[empty] set of integers	-
`~var`	End location of all matched patterns.	[empty] set of integers	-

Fuzzy labels can be used to point to several patterns at once: for instance, @var* could refer to @var1, @var2 and @var_X. More sources are then considered while processing and extend the produced results.

Each prefix can also be applied to individual items pointed by an expression which should reduces itself to an integer:

Expression	Meaning	Production	Comment
`$var[<expr>]`	Selected result of the searched pattern.	bytes	-
`#var[<expr>]`	Number of appearance of one identified pattern.	integer (1)	This kind of valid expression should not be very useful as the resulting value will always be `1` if `count($var) >= <expr>`.
`@var[<expr>]`	Location of a selected indentified pattern.	integer	-
`!var[<expr>]`	Length of a selected indentified pattern.	integer	-
`~var[<expr>]`	End of a selected indentified pattern.	integer	This location provides an offset pointing to the byte next after the last matched byte of the pattern.

For instance, the follow rule triggers only if:

there are any amount of matches for $a, $b1 and $b2;
there are at leat 11 matches for the $c pattern;
the 11^th match for $c does not end at offset 123.

Example:

rule DummyMatching

{

bytes:

$a = "dummyA"

$b1 = "dummyB_1"

$b2 = "dummyB_2"

$c = "dummyc"

condition:

$a and $b* and ~c[10] != 123

}

Filtering on global status

<expected> of <set>

The set definition can be replaced by the special keyword them, refering to all defined search patterns.

The expected part is defined using one of the following expressions:

Expression	Meaning
`none`	None of the provided patterns has be found.
`any`	At least one pattern has one or more matches.
`all`	All the provided patterns have been found at least once.
`n`	At least n patterns have one or more matches.

Special keywords

Strings and referenced rules

Referencing other rules

Because writing rules share common points with coding, some basic bricks can be defined within simple rules. More complex rules can then be built by combining other rules, and each of these simple rules can be linked into the complex one like external variables:

Example:

rule SimpleRule

{

condition:

uint32(0) == 0x464c457f and datasize < 1MB

}

rule ComplexRule

{

strings:

$d = "dummy"

condition:

$d and SimpleRule

}

In the previous example, the ComplexRule rule matches all files beginning with an ELF, smaller than 1Mb and containing the bytes sequence "dummy".

Note

As names inside conditions are resolved at runtime, it is not required to define a rule before referencing it into other rules.

Indeed, when the resolution process is launched, all rules are loaded and all their names are known.

A wildcard can be used to match several rules at once:

Example:

rule ElfHeader_byte0

{

condition:

uint8(0) == 0x7f

}

rule ElfHeader_byte1

{

condition:

uint8(1) == 0x45

}

rule ElfHeader_byte2

{

condition:

uint8(2) == 0x4c

}

rule ElfHeader_byte3

{

condition:

uint8(3) == 0x46

}

rule ElfHeader

{

condition:

all of (ElfHeader_byte*)

}