Definition of search patterns for ROST
String patterns
Basic syntax
Available modifiers
Final flags
Binary patterns
Rejecting specific values
Definition of search patterns for ROST
This page provides documentation about the bytes:
part of a ROST rule definition.
This bytes:
section defines search patterns aimed to identify portions of binary contents. The match or rejection conditions are later stated in the the condition:
part of a rule.
There are three kinds of patterns:
- textual ones, to target human-readable content;
- hexadecimal ones, suitable for binary files or executable codes for instance;
- regular expressions based ones, providing a way to build complex textual patterns.
String patterns
Basic syntax
String patterns are the most simple way to create a search pattern.
The syntax is the following one: $<id> = "<text>" [<modifiers>] [<flags>]
.
The last two fields are optional and detailed in the next sections, so a simple rule to look for any "hello" bytes inside a text file is:
Example:
rule Hello
{
bytes:
$str = "hello"
condition:
$str
}
The choice of the pattern identifier ($str
here) is up to the writer, but has to be unique inside the containing rule.
The text defined in a string pattern is always interpreted as ASCII encoded, and therefore processed as raw bytes.
This should be fine in most cases, but any modifier working character per character may not produce the expected final pattern with UTF-8 encoded text for instance.
The use of a backslash helps to insert some interpreted sequences into a string pattern, if needed:
Sequence | Generated byte | Description |
\a |
0x07 | Alert (bell) |
\b |
0x08 | Backspace |
\t |
0x09 | Horizontal tabulation |
\n |
0x0a | New line |
\v |
0x0b | Vertical tabulation |
\f |
0x0c | Form feed |
\r |
0x0d | Carriage return |
\e |
0x1b | Escape |
\" |
0x22 | Double quote |
\\ |
0x5c | Backslash |
\xNN |
0xNN | Raw byte from a two-digit hexadecimal value |
Available modifiers
String patterns defined in a rule act as sources which can be used to produce other patterns automatically thanks to various modifiers.
Here is a list of these modifiers, and their relative effect:
Modifier | Operation | Plugin | Production for "AbcDef" |
base64 |
All possible base64 encodings | encodings | QWJjRGVm , FiY0RlZg , BYmNEZWY |
hex |
Hexadecimal version using 2-digit representation | - | 416263446566 |
plain |
Source as this, which may be useful in case of chained multi modifiers | - | AbcDef |
rev |
Orginal text in reversed order | - | feDcbA |
ror13 |
ror13 operation on patterns | apihashing | \x871.I' |
wide |
Emulation of UTF-16 by interleaving null (0x00) bytes | - | A\x00b\x00c\x00D\x00e\x00f\x00 |
xor |
XOR operation with a 1-byte key | - | AbcDef , @cbEdg , C`aFgd , ... |
If defined, the quoted plugin is the ROST internal source of the involved modifier. This source is provided for information only as it is fully transparent for the user.
Some modifiers may accept one or several optional arguments:
base64
: a 64-byte long custom alphabet;xor
: a 256-bit value or a range of values.
If a list of arguments is provided, the modifier is called as many times as there are items in the list.
Example:
rule Hello
{
bytes:
// The next line will procude 1 + 1 + 6 + 3 = 11 final patterns to search
$str = "hello" rev xor(0xfe) xor(0xaa, 1 .. 0x5) base64
condition:
$str
}
Final flags
Located at the end of a definition, a few flags can be linked to a string pattern:
Label | Meaning |
fullword | Match has to be not preceded or followed by an alphanumeric character |
nocase | Ignore the case of the string pattern |
private | Results are only used to build match condition but will never be displayed |
As its name suggests, the fullword
keyword is aimed to select matches which refer to plain words. For instance, with the following rule, "Othello is worldwide known" gives a match for the $str_0
pattern but not for the $str_1
one.
Example:
rule Hello
{
bytes:
$str_0 = "hello"
$str_1 = "world" fullword
condition:
all of ($str_*)
}
As a note, if the goal of the rule is to match the "hello world" string, the following rule is more suitable and optimized? With the extra nocase
flag, the rule will match "hello world" and "Hello World!" as well:
Example:
rule OptimizedHello
{
bytes:
$str = "hello world" fullword nocase
condition:
$str
}
The private
keyword allows to hide some results from the final output, as shown by the following shell commands:
Shell:
$ cat > /tmp/Hello.rost << EOF
rule Hello
{
bytes:
\$str_0 = "hello" private
\$str_1 = "world"
condition:
\$str_0 and \$str_1 and console.log(#str_0, " ", #str_1)
}
EOF
$ echo "Othello is worldwide known" > /tmp/Hello.txt
$ rost -s /tmp/Hello.rost /tmp/Hello.txt
0xb:$str_1: world
0x1 0x1
Rule 'Hello' has matched!
Binary patterns
Rejecting specific values
The ~
operator allows to reject a defined binary pattern. It can be applied to bytes and masked bytes; for instance:
~41
rejects matches with the letter "A";~?0
only matches bytes whose the second nibble is not zero.
So the following rules provides the same results, even if the later is more optimized then the former:
Example:
rule Search
{
bytes:
$want = { 41 ?? ?? 64 }
$dont_want = { 41 ?? 63 64 }
condition:
$want and for all of @want : ( not( $ in @dont_want ) )
}
rule OptimizedSearch
{
bytes:
$want = { 41 ?? ~63 64 }
condition:
$want
}
but provides no distributive property.
Example:
rule Hello
{
bytes:
$a = { ~41 ~?2 }
$b = { ~( 41 ?2 ) }
condition:
all of them
}
Involving ranges
A regular range [m-n]
stands for any byte sequence of length between m and n.
The negative version produced by the ~
operator then drives to any byte sequence of length not between m and n.
The following table summarizes all the possible cases involving ranges:
Definition pattern | Description |
~[m-n] alone |
datasize |
~[m-n] pattern |
Extend the current matches when the pattern is not starting at an offset between m and n after the previous pattern (or after the beginning of the content if the scan gets started). |
pattern ~[m-n] |
Extend the current matches when the pattern is not ending at an offset between m and n before the next matches (or before the end of the content if there is no next pattern). |
~( pattern ) |
Extend the current matches when pattern (which may include ranges) does not match |
As computing and building all the combinations of byte sequences excluding only one byte range may be very expensive, the first case from the previous table gets forbidden.
The negative version produced by the ~
operator drives then any byte sequence of length not between m and n.
So { ?? }
is a valid binary pattern definition and { ~?? }
is not.
For instance, ~41
will match bytes which are not "A" and ~?2
will match any byte whose second nibble is not equal to 2.
All the previously kinds of patterns are handled, but those which refer to any bytes (in other words any number of wildcards ??
or any range [n-m]
) get a special meaning: