ROST rules' condition block
Basics for conditions
Literal values
Sets of expressions
Integer ranges
Building conditional bricks
Performing computation
Mixing boolean expressions
Operations with strings
Available core functions
Conditions on bytes patterns
Pattern properties
Filtering on global status
Special keywords
Strings and referenced rules
Referencing other rules
ROST rules' condition block
This page provides documentation about the condition:
part of a ROST rule definition.
This final match condition is composed of one or several expressions which have to produce a final analysis status: does the scanned content drive to a match regarding to the conditions specified by a rule?
Each expression is processed during the scan and can build a more simple expression of itself when possible. For instance, a call to hash.md5()
is replaced by a string literal as soon as the scan begins ; however, the (i + 1)th match $var[i]
, if available, for the $var
pattern definition may require some time before being translated.
Some expressions naturally reduce into booleans:
- the expression
true and false
produces afalse
value; - the expression
(6 - 3 * 2)
is equal to0
, which a common synonym offalse
; - empty intersection of sets equals to
false
; - expressions leading to error or undefined value produce a
false
state.
Basics for conditions
Basic expressions share a common feature: they all can be implcitly converted into booleans while the reduction process.
Literal values
A rule condition can rely on several basic types:
- booleans:
true
,false
; - signed and unsigned integers:
-1
,0x10
; - strings, which are handle as bytes as they may contain any kind of characters:
"Hello\tWorld"
,"!"
; - regular expressions:
/kb|mb|gb/i
.
Strings can be handled as sets of characters if needed, so strings provide all the features of a regular set:
Example:
rule AlwaysTrue
{
condition:
"ABC"[2] == "B" and "x" in "x"
}
Regular expressions in the condition:
section are not limited in the ones from the bytes:
section: they follow the full POSIX Extended Regular Expression syntax.
All these basic types resolve to boolean values when reducted:
Type... | resolves to... | only if... |
boolean | true |
true |
integer | false |
its value is 0 . |
string | false |
empty |
regular expression | false |
Regular expressions never translate to true directly. |
Sets of expressions
In some cases it may be interesting to group items aimed to get processed by common operations computing a boolean state as condtion.
Such items are collected between brackets; for items relative to strings or rules, wildcards can also be applied:
Example:
// Define a list from the first prime numbers
(1, 3, 5, 7)
// Match the string names $foo1, $foo2, $foo3 and $bar
($foo*, $bar)
// Create a list of the MD5 checksum
( "e1faffb3e614e6c2fba74296962386b7", "2bb225f0ba9a58930757a868ed57d9a3" )
Each kind of sets provides an iterator over the contained expressions.
Sets resolve to the true
boolean value when reducted, except for empty sets.
Sets can embed all kind of expressions. However, if a set contains only one item, its definition has to end with a trailing comma:
Example:
// Empty set
()
// Set with one item
( strings.lower("SINGLE"), )
Integer ranges
Ranges handle a set of integers in a compact form: such expressions only memorize start and end points.
Example:
// Define a set of 1024 integers
1 .. 1024
// Define all valid values for a signed char
-128 .. 127
// Create a range for each byte of the scanned data
1 .. datasize
A range reduces to a boolean value only if its start point is smaller than or equal to its end point.
Building conditional bricks
The condition for validating a rule may involve a sequence of expressions which have to compute a final boolean state.
Expressions can be built or mixed with several operators, from basic values or modules outputs, and are aimed to get combined in order to create a more complex expression.
The following table provides information about all the available operators and how the ROST grammar handles them. The smaller the precedence of an operator is, the more higher is the priority given to an expression involving this operator when the reduction process gets performed.
Precedence | Operator | Description | Associativity |
1 | [] |
Array subscripting | Left-to-right |
. |
Structure member access | ||
2 | - |
Unary minus | Right-to-left |
~ |
Bitwise not | ||
3 | * |
Multiplication | Left-to-right |
/ |
Division | ||
% |
Remainder | ||
4 | + |
Addition | Left-to-right |
- |
Subtraction | ||
5 | << |
Bitwise left shift | Left-to-right |
>> |
Bitwise right shift | ||
6 | & |
Bitwise AND | Left-to-right |
7 | ^ |
Bitwise XOR | Left-to-right |
8 | | |
Bitwise OR | Left-to-right |
9 | < |
Lesser than | Left-to-right |
<= |
Lesser than or equal to | ||
> |
Greater than | ||
>= |
Greater than or equal to | ||
10 | contains |
String contains substring | Left-to-right |
startswith |
String starts with substring | ||
endswith |
String ends with substring | ||
icontains |
Like contains but case-insensitive |
||
istartswith |
Like startswith but case-insensitive |
||
iendswith |
Like endswith but case-insensitive |
||
iequals |
Case-insensitive string comparison | ||
matches |
String matches a regular expression | ||
11 | == |
Equal to | Left-to-right |
!= |
Not equal to | ||
12 | and |
Logical AND | Left-to-right |
13 | or |
Logical OR | Left-to-right |
Performing computation
Arithmetic operations
The ROST grammar supports the classic arithmetic operators +
, -
, *
, /
and %
, with their relative priority.
Example:
rule MulFirst
{
// Should always be validated
condition:
1 + 4 * (3 + 2) == 21
and
(1 + 4) * (3 + 2) == 25
}
Bitwise operations
Integers are stored internally as 64-bit values, unsigned by default. So small values get promoted and this has to be kept in mind, especially when dealing with bitwise operators: ~0x01
is not 0xfe
but 0xfffffffffffffffe
during rule processing.
Extra operations
Mixing boolean expressions
If one of the relational operators ==
or !=
is used and one of the operands is a boolean expression, then a cast into a boolean value is performed against the second operand, and the relationship gets then computed.
Operations with strings
Various operations can be performed against strings:
- <haystack> contains <needle>
- <haystack> startswith <needle>
- <haystack> endswith <needle>
- <haystack> icontains <needle>
- <haystack> istartswith <needle>
- <haystack> iendswith <needle>
- <haystack> matches <regex>
- <haystack> iequals <needle>
Each keyword returns true
if haystack contains, starts with or ends with a needle string. The "i" prefix denotes a case-insensitive operation.
The matches
keyword implies a regular expression.
Case-insensitive string comparison can be done with the iequals
keyword. For case-sensitive comparison, the ==
operator has to be used.
Available core functions
Quantity of items
The count
function compute the size of given sets. Such sets may be produced by other functions, but also by strings (which are sets of chararacters) and search patterns.
The function can handle one or several arguments. The computed result is the sum of all items inside the provided arguments.
Here are some illustrations of use cases with the count
function:
Example:
rule AlwaysTrue
{
bytes:
$int_01 = "\x01"
$int_02 = "\x02"
$int_3 = "\x03"
condition:
count("AB", "C") == count("ABC")
and count($int_0*, $int_3) == #int_*
}
Size of available data
The quantity of scanned data can be retrieved with the datasize
keyword, which returns a size expressed in bytes as an integer value. For compatibily with Yara, a filesize
keyword is introduced as an alias to the generic datasize
.
To ease writing, suffixes can be involved: KB
for 1024 bytes, MB
for 2^20 bytes and GB
for 2^30 bytes. These suffixes are case-insensitive and can be used for any integer constants.
In order to speed up data processing, filtering content by size is a good way to skip scans for large contents without any expected matchs:
Example:
rule KeepSmallFiles
{
condition:
datasize <= 200KB
}
Most available item in a set
The number of the most available item in a set can be retrieved with the maxcommon
function.
These sets can be provided as a list of arguments or by other functions, such as modpath
.
Here is a condition example which should always drive to a match:
Example:
rule AlwaysTrue
{
condition:
maxcommon("AAA", "A") == 1
and maxcommon("AA", 1, "AA", 2) == 2
and maxcommon("AA", 1, "AA", 2, 1, False, 1) == 3
}
Read values from binary content
Words of data can be read from the scanned data in order to produce literal integer expressions. These expressions can then get used for rule conditions.
Several functions are available:
- uint8(<offset>)
- int8(<offset>)
- uint16(<offset>)
- int16(<offset>)
- uint16be(<offset>)
- int16be(<offset>)
- uint32(<offset>)
- int32(<offset>)
- uint32be(<offset>)
- int32be(<offset>)
- uint64(<offset>)
- int64(<offset>)
- uint64be(<offset>)
- int64be(<offset>)
All these functions read 8, 16, 32 or 64 bits integers, signed or unsigned, from data at a given offset
position from the start of the scanned binary content. The considered encoding is little-endian by default; big-endian can be handled with the functions with a "be" suffix.
The offset
argument is an expression which can be either a literal integer value or an expression providing such a value.
For instance, here is a rule matching PE files:
Example:
rule IsPE
{
condition:
// MZ signature at offset 0 and ...
uint16(0) == 0x5a4d and
// ... PE signature at offset stored in the MZ header at offset 0x3c
uint32(uint32(0x3c)) == 0x00004550
}
Conditions on bytes patterns
Pattern properties
For a given pattern definition $var
, several properties can be implied in a conditional expression:
Expression | Meaning | Production | Comment |
$var |
Main reference to the results of the searched pattern. | [empty] set of bytes | - |
#var |
Number of times the pattern $var has been identified. |
integer | The expression is an alias for count($var) . |
@var |
Start location of all matched patterns. | [empty] set of integers | - |
!var |
Length of all matches, which may vary when regular expressions are used as patterns. | [empty] set of integers | - |
~var |
End location of all matched patterns. | [empty] set of integers | - |
Fuzzy labels can be used to point to several patterns at once: for instance, @var*
could refer to @var1
, @var2
and @var_X
. More sources are then considered while processing and extend the produced results.
Each prefix can also be applied to individual items pointed by an expression which should reduces itself to an integer:
Expression | Meaning | Production | Comment |
$var[<expr>] |
Selected result of the searched pattern. | bytes | - |
#var[<expr>] |
Number of appearance of one identified pattern. | integer (1) | This kind of valid expression should not be very useful as the resulting value will always be 1 if count($var) >= <expr> . |
@var[<expr>] |
Location of a selected indentified pattern. | integer | - |
!var[<expr>] |
Length of a selected indentified pattern. | integer | - |
~var[<expr>] |
End of a selected indentified pattern. | integer | This location provides an offset pointing to the byte next after the last matched byte of the pattern. |
For instance, the follow rule triggers only if:
- there are any amount of matches for
$a
,$b1
and$b2
; - there are at leat 11 matches for the
$c
pattern; - the 11th match for
$c
does not end at offset 123.
Example:
rule DummyMatching
{
bytes:
$a = "dummyA"
$b1 = "dummyB_1"
$b2 = "dummyB_2"
$c = "dummyc"
condition:
$a and $b* and ~c[10] != 123
}
Filtering on global status
<expected> of <set>
The set definition can be replaced by the special keyword them
, refering to all defined search patterns.
The expected part is defined using one of the following expressions:
Expression | Meaning |
none |
None of the provided patterns has be found. |
any |
At least one pattern has one or more matches. |
all |
All the provided patterns have been found at least once. |
n |
At least n patterns have one or more matches. |
Special keywords
Strings and referenced rules
Referencing other rules
Because writing rules share common points with coding, some basic bricks can be defined within simple rules. More complex rules can then be built by combining other rules, and each of these simple rules can be linked into the complex one like external variables:
Example:
rule SimpleRule
{
condition:
uint32(0) == 0x464c457f and datasize < 1MB
}
rule ComplexRule
{
strings:
$d = "dummy"
condition:
$d and SimpleRule
}
In the previous example, the ComplexRule rule matches all files beginning with an ELF, smaller than 1Mb and containing the bytes sequence "dummy".
As names inside conditions are resolved at runtime, it is not required to define a rule before referencing it into other rules.
Indeed, when the resolution process is launched, all rules are loaded and all their names are known.
A wildcard can be used to match several rules at once:
Example:
rule ElfHeader_byte0
{
condition:
uint8(0) == 0x7f
}
rule ElfHeader_byte1
{
condition:
uint8(1) == 0x45
}
rule ElfHeader_byte2
{
condition:
uint8(2) == 0x4c
}
rule ElfHeader_byte3
{
condition:
uint8(3) == 0x46
}
rule ElfHeader
{
condition:
all of (ElfHeader_byte*)
}