DEV IN PROGRESS

Core modules

This page describes namespaces which are part of the ROST core grammar.

The namespaces bring utility functions to match conditions and may support different kinds of arguments. The extra features are sorted by modules.

Console

The console module provides a way to log information during the evaluation of match conditions.

By default, the messages are written to the standard output (stdout) but this behavior can be changed in command line with the --log switch or from the Python API.

In a similar way, logging can be disabled as a whole in a Python script or with the --quiet parameter.

log(arg, [arg, [arg...]])

Write the provided arguments to the current output and append a line feed to the produced text.

Each arg can be a literal value (boolean, integer, string, aso.) or an expression, which will be reduced by the scanner into a literal value before calling the log function. Integers are printed using a decimal format.

Any error prints ???; the function always returns true.

Example:

console.log("Hello: ", uint32(0) ^ 0xffffffff)

log_hex(arg, [arg, [arg...]])

Provide the same implementation as the log() function, but integers are printed using a hexadecimal format.

Magic

The magic module tries to identify the class of provided data.

The module relies on the Magic Number Recognition Library (libmagic) included in the file command, so some tests can be done from a shell prompt before writing scan conditions involving the magic module.

Warning

ROST can scan both file and memory contents; the file shell command does not. ROST uses the generic magic_buffer() API function in order to cover all its needs, whereas the file command uses the magic_file() function.

Unfortunately this slight difference between API calls drives to differences with the results. Indeed, the manual page for libmagic states in its bug section:

The results from magic_buffer() and magic_file() where the buffer and the file contain the same data can produce different results, because in the magic_file() case, the program can lseek(2) and stat(2) the file descriptor.

The buffer version is usually shorter. For instance with the /bin/ls program:

  • content types:
    • magic_buffer() produces: LSB shared object, x86-64, version 1 (SYSV)
    • magic_file() produces: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=15dfff3239aa7c3b16a71e6b2e3b6e4009dab998, for GNU/Linux 3.2.0, stripped
  • MIME types:
    • magic_buffer() produces: application/x-sharedlib
    • magic_file() produces: application/x-pie-executable

So the user is advised to check his rules using the console module output facilities rather than the file command output.

mime_encoding()

Extract a content encoding by scanning the first n bytes of the data (n = 64 kb by default).

Guessed encodings can be utf8, ucs32 or latin1 for instance. Extra examples of encoding can be extrated from the encoding.c file of the original file command.

The result should be the same as the one provided by the file --mime-encoding command.

Encoding examples:

$ file --mime-encoding fe20091e32e612a1b5b7043895ddf7d0131a544a6f86d177218645241070f32d

fe20091e32e612a1b5b7043895ddf7d0131a544a6f86d177218645241070f32d.exe: binary


$ file --mime-encoding pefile.py

pefile.py: us-ascii


$ file --mime-encoding README

README: utf-8

mime_type()

Describe the scanned content using a MIME type instead of a human-friendly text.

IANA is the official registry of MIME media types and maintains a list of all the official MIME types.

The returned string should be similar to the output of the file --mime-type command.

Type examples:

$ $ file --mime-type fe20091e32e612a1b5b7043895ddf7d0131a544a6f86d177218645241070f32d.exe

fe20091e32e612a1b5b7043895ddf7d0131a544a6f86d177218645241070f32d.exe: application/vnd.microsoft.portable-executable


$ LANG=C file --mime-type pefile.py

pefile.py: text/x-script.python

type()

Summarize the scanned content by providing its key points.

The returned string should be similar to the output of the file command.

Type examples:

$ file malware.apk

malware.apk: Android package (APK), with APK Signing Block


$ file README

README: Unicode text, UTF-8 text, with very long lines (359)


$ file peutils.py

peutils.py: Python script, ASCII text executable

Math

The math computes values from the scanned binary content or from direct values provided by a rule.

to_string(int, [base])

Convert an integer into a string value, according a given base (10 by default).

Only a few values are supported as base: 2 for binary, 8 for octal, 10 for decimal and 16 for hexadecimal. A prefix may get prepended to the resulting string: 0b for binary, 0 for octal, 0x for hexadecimal.

Examples:

math.to_string(123) == "123"

math.to_string(291, 16) == "0x123"

math.to_string(-83, 8) == "-0123"

math.to_string(123, 2) == "0b1111011"

String

The string module provides simple functions performing operations on characters or bytes.

The module is aimed to be used inside the condition: section of a rule, so input data comes from other functions or from match results after a successful scan.

Data is processed as bytes not characters, so ASCII is the only supported encoding.

lower(data)

Convert all uppercase characters in a byte sequence into lowercase characters.

Example:

string.lower("ABcd123! Z") == "abcd123! z"

to_int(str, [base])

Convert a string into an integer.

The base argument, if defined, must be between 2 and 36 inclusive.

The final value may or may not be signed: the input string may begin with an arbitrary amount of white spaces followed by a single optional + or - sign.

If base is undefined or 16, the string may then include a 0x or 0X prefix, and the number will be read in base 16; otherwise, a zero base is taken as 10 (decimal) unless the next character is 0, in which case it is taken as 8 (octal).

Examples:

string.to_int("123") == 123

string.to_int("123", 16) == 291

string.to_int("0x123") == 291

string.to_int("-0123") == -83

upper(data)

Convert all lowercase characters in a byte sequence into uppercase characters.

As all hashes produce lowercased values, the upper() is able to change this default behaviour when needed.

Example:

string.upper("81e073b428b50247daba38531dcf412a") == "81E073B428B50247DABA38531DCF412A"

wide(data)

Translate the provided data into UTF-16 encoding.

This feature may be useful when dealing with some (Windows) UTF-16 fields: comparisons can be performed with readable operands, relying on ROST to run the appropriate translation..

Example:

// Look for a path actually containing the

// c\x00m\x00d\x00.\x00e\x00x\x00e\x00 byte sequence

kaitai.lnk.rel_path.str endswith string.wide("cmd.exe")

Time

The time module introduces a few functions to deal with timestamps for instance, for building time-relative conditions.

make(year, month, day, [hour, min, sec])

Compute the seconds elapsed since the Epoch from a given date.

The year, month, day, hour, min, sec arguments have to be bound into obvious intervals: [1900:-] for years, [1:12] for months, [1:31] for days, [0:23] for hours and [0:59] for minutes and seconds.

Example:

// Keep all PE files built after a given date

pe.timestamp >= time.make(2023, 7, 14, 22, 50, 8) and pe.timestamp <= time.now()

now()

Compute the number of seconds elapsed since the Epoch (1970-01-01 00:00:00 +0000 (UTC)).

The returned integer can be suitable to filter timestamps which point to the past or to the future.

Example:

// Detect forged faked timestamp

pe.timestamp > time.now()