Definition of search patterns for ROST

This page provides documentation about the bytes: part of a ROST rule definition.

This bytes: section defines search patterns aimed to identify portions of binary contents. The match or rejection conditions are later stated in the the condition: part of a rule.

There are three kinds of patterns:

textual ones, to target human-readable content;
hexadecimal ones, suitable for binary files or executable codes for instance;
regular expressions based ones, providing a way to build complex textual patterns.

String patterns

Basic syntax

String patterns are the most simple way to create a search pattern.

The syntax is the following one: $<id> = "<text>" [<modifiers>] [<flags>].

The last two fields are optional and detailed in the next sections, so a simple rule to look for any "hello" bytes inside a text file is:

Example:

rule Hello

{

bytes:

$str = "hello"

condition:

$str

}

The choice of the pattern identifier ($str here) is up to the writer, but has to be unique inside the containing rule.

Warning

The text defined in a string pattern is always interpreted as ASCII encoded, and therefore processed as raw bytes.

This should be fine in most cases, but any modifier working character per character may not produce the expected final pattern with UTF-8 encoded text for instance.

The use of a backslash helps to insert some interpreted sequences into a string pattern, if needed:

Sequence	Generated byte	Description
`\a`	0x07	Alert (bell)
`\b`	0x08	Backspace
`\t`	0x09	Horizontal tabulation
`\n`	0x0a	New line
`\v`	0x0b	Vertical tabulation
`\f`	0x0c	Form feed
`\r`	0x0d	Carriage return
`\e`	0x1b	Escape
`\"`	0x22	Double quote
`\\`	0x5c	Backslash
`\xNN`	0xNN	Raw byte from a two-digit hexadecimal value

Available modifiers

String patterns defined in a rule act as sources which can be used to produce other patterns automatically thanks to various modifiers.

Here is a list of these modifiers, and their relative effect:

Modifier	Operation	Plugin	Production for "AbcDef"
`base64`	All possible base64 encodings	encodings	`QWJjRGVm`, `FiY0RlZg`, `BYmNEZWY`
`hex`	Hexadecimal version using 2-digit representation	-	`416263446566`
`plain`	Source as this, which may be useful in case of chained multi modifiers	-	`AbcDef`
`rev`	Orginal text in reversed order	-	`feDcbA`
`ror13`	ror13 operation on patterns	apihashing	`\x871.I'`
`wide`	Emulation of UTF-16 by interleaving null (0x00) bytes	-	`A\x00b\x00c\x00D\x00e\x00f\x00`
`xor`	XOR operation with a 1-byte key	-	`AbcDef`, `@cbEdg`, C`aFgd, ...

If defined, the quoted plugin is the ROST internal source of the involved modifier. This source is provided for information only as it is fully transparent for the user.

Some modifiers may accept one or several optional arguments:

base64: a 64-byte long custom alphabet;
xor: a 256-bit value or a range of values.

If a list of arguments is provided, the modifier is called as many times as there are items in the list.

Example:

rule Hello

{

bytes:

// The next line will procude 1 + 1 + 6 + 3 = 11 final patterns to search

$str = "hello" rev xor(0xfe) xor(0xaa, 1 .. 0x5) base64

condition:

$str

}

Final flags

Located at the end of a definition, a few flags can be linked to a string pattern:

Label	Meaning
fullword	Match has to be not preceded or followed by an alphanumeric character
nocase	Ignore the case of the string pattern
private	Results are only used to build match condition but will never be displayed

As its name suggests, the fullword keyword is aimed to select matches which refer to plain words. For instance, with the following rule, "Othello is worldwide known" gives a match for the $str_0 pattern but not for the $str_1 one.

Example:

rule Hello

{

bytes:

$str_0 = "hello"

$str_1 = "world" fullword

condition:

all of ($str_*)

}

As a note, if the goal of the rule is to match the "hello world" string, the following rule is more suitable and optimized? With the extra nocase flag, the rule will match "hello world" and "Hello World!" as well:

Example:

rule OptimizedHello

{

bytes:

$str = "hello world" fullword nocase

condition:

$str

}

The private keyword allows to hide some results from the final output, as shown by the following shell commands:

Shell:

$ cat > /tmp/Hello.rost << EOF
rule Hello
{
    bytes:
        \$str_0 = "hello" private
        \$str_1 = "world"

    condition:
        \$str_0 and \$str_1 and console.log(#str_0, " ", #str_1)
}
EOF

$ echo "Othello is worldwide known" > /tmp/Hello.txt

$ rost -s /tmp/Hello.rost /tmp/Hello.txt
0xb:$str_1: world
0x1 0x1
Rule 'Hello' has matched!

Binary patterns

Rejecting specific values

The ~ operator allows to reject a defined binary pattern. It can be applied to bytes and masked bytes; for instance:

~41 rejects matches with the letter "A";
~?0 only matches bytes whose the second nibble is not zero.

So the following rules provides the same results, even if the later is more optimized then the former:

Example:

rule Search
{
    bytes:
        $want = { 41 ?? ?? 64 }
        $dont_want = { 41 ?? 63 64 }

    condition:
        $want and for all of @want : ( not( $ in @dont_want ) )
}

rule OptimizedSearch
{
    bytes:
        $want = { 41 ?? ~63 64 }

    condition:
        $want
}

but provides no distributive property.

Example:

rule Hello

{

bytes:

$a = { ~41 ~?2 }

$b = { ~( 41 ?2 ) }

condition:

all of them

}

Involving ranges

A regular range [m-n] stands for any byte sequence of length between m and n.

The negative version produced by the ~ operator then drives to any byte sequence of length not between m and n.

The following table summarizes all the possible cases involving ranges:

Definition pattern	Description
~~`~[m-n]` alone~~	~~Create matches with all the byte sequences with size from 0 to (m - 1) and (n + 1) to `datasize`~~
`~[m-n] pattern`	Extend the current matches when the pattern is not starting at an offset between m and n after the previous pattern (or after the beginning of the content if the scan gets started).
`pattern ~[m-n]`	Extend the current matches when the pattern is not ending at an offset between m and n before the next matches (or before the end of the content if there is no next pattern).
`~( pattern )`	Extend the current matches when pattern (which may include ranges) does not match

Warning

As computing and building all the combinations of byte sequences excluding only one byte range may be very expensive, the first case from the previous table gets forbidden.

The negative version produced by the ~ operator drives then any byte sequence of length not between m and n.

So { ?? } is a valid binary pattern definition and { ~?? } is not.

For instance, ~41 will match bytes which are not "A" and ~?2 will match any byte whose second nibble is not equal to 2.

from 4 to 6 bytes can occupy the position of the jump. Any of the following strings will match the pattern: F4 23 01 02 03 04 62 B4 F4 23 00 00 00 00 00 62 B4 F4 23 15 82 A3 04 45 22 62 B4 Any jump [X-Y] must meet the condition 0 <= X <= Y. In previous versions of YARA both X and Y must be lower than 256, but starting with YARA 2.0 there is no limit for X and Y.

All the previously kinds of patterns are handled, but those which refer to any bytes (in other words any number of wildcards ?? or any range [n-m]) get a special meaning: