DEV IN PROGRESS

Module pychrysalide.format

Documentation

This module contains the basic definitions requiered for dealing with file formats.

Support for specific formats (such as ELF files for instance) needs extra definitions in a specific module.

Sub modules

Classes

Class BinFormat

The BinFormat class is the major poart of binary format support. It is the core class used by loading most of the binary files.

One item has to be defined as class attribute in the final class:

  • _endianness: a SourceEndian value indicating the endianness of the format.

Calls to the __init__ constructor of this abstract object expect no particular argument.

Hierarchy

builtins.object
 ╰── gi._gi.GObject
      ╰── pychrysalide.format.KnownFormat
           ╰── pychrysalide.format.BinFormat

Implements: pychrysalide.analysis.storage.SerializableObject

Known subclass: pychrysalide.format.ExeFormat

Methods

add_error(self, type, addr, desc)

Extend the list of detected errors linked to the format.

The type of error has to be one of the BinaryFormatError flags. The location of the error is a vmpa instance and a one-line description should give some details about what has failed.

add_symbol(self, symbol)

Register a new symbol for the format.

The symbol has to be a BinSymbol instance.

find_next_symbol_at(self, addr)

Find the symbol next to the one found at a given address, provided as a vmpa instance.

The result is a BinSymbol instance, or None if no symbol was found.

find_symbol_at(self, addr)

Find the symbol located at a given address, provided as a vmpa instance.

The result is a BinSymbol instance, or None if no symbol was found.

find_symbol_by_label(self, label)

Find the symbol with a given label, provided as a string.

The result is a BinSymbol instance, or None if no symbol was found.

has_flag(self, flag)

Test if a binary format has a given property.

This property is one of the values listed in the of FormatFlag enumeration.

The result is a boolean value.

register_code_point(self, point, level)

Register a virtual address as entry point or basic point.

The point is an integer value for the virtual memory location of the new (entry) point. The type of this entry has to be a DisassPriorityLevel value.

remove_symbol(self, symbol)

Unregister a symbol from the format.

The symbol has to be a BinSymbol instance.

resolve_symbol(self, addr, strict)

Search for a position inside a symbol by a given address.

The result is a couple of (BinSymbol, offset) values, or None if no symbol was found. The offset is the distance between the start location of the symbol and the location provided as argument.

If the search is run in strict mode, then the offset is always 0 upon success.

set_flag(self, flag)

Add a property from a binary format.

This property is one of the values listed in the of FormatFlag enumeration.

If the flag was not set before the operation, True is returned, else the result is False.

unset_flag(self, flag)

Remove a property from a binary format.

This property is one of the values listed in the of FormatFlag enumeration.

If the flag was not set before the operation, False is returned, else the result is True.

Attributes

endianness

Endianness of the format. The return value is of type SourceEndian.

errors

List of all detected errors which occurred while loading the binary.

The result is a tuple of (BinaryFormatError, vmpa, string) values, providing a location and a description for each error.

flags

Provide all the flags set for a format. The return value is of type FormatFlag.

symbols

Iterable list of all symbols found in the binary format.

The returned iterator is a SymIterator instance and remains valid until the list from the format does not change.

Constants

BinaryFormatError

Flags for error occurring while loading a binary format.

4= 4

FormatFlag

Extra indications for formats.

1= 0x1

Class BinSymbol

BinSymbol represents all kinds of symbols, such as strings, routines or objects. If something can be linked to a physical or virtual location, it can be a symbol.

Instances can be created using the following constructor:

    BinSymbol(range, stype)

Where range is a memory space defined by mrange and stype a SymbolType value.

The following methods have to be defined for new classes:

The object can be compared using rich methods (like <= or !=).

Hierarchy

builtins.object
 ╰── gi._gi.GObject
      ╰── pychrysalide.format.BinSymbol

Implements:

Known subclasses:

Methods

_get_label(self)

Abstract method used to provide the default label for a symbol.

The returned value has to be a string.

has_flag(self, flag)

Test if a binary symbol has a given property.

This property is one of the values listed in the of SymbolFlag enumeration.

The result is a boolean value.

set_flag(self, flag)

Add a property from a binary symbol.

This property is one of the values listed in the of SymbolFlag enumeration.

If the flag was not set before the operation, True is returned, else the result is False.

unset_flag(self, flag)

Remove a property from a binary symbol.

This property is one of the values listed in the of SymbolFlag enumeration.

If the flag was not set before the operation, False is returned, else the result is True.

Attributes

flags

Provide all the flags set for a symbol. The return value is of type SymbolFlag.

label

Label of the symbol, provided by the internal component or by the user.

nm_prefix

Single-byte string for an optional nm prefix, or None if any.

range

Memory range covered by the symbol.

This property is a mrange instance.

status

Status of the symbol's visibility, as a value of type SymbolStatus.

stype

Type of the current symbol, as a value of type SymbolType.

Constants

SymbolFlag

Extra indications for symbols.

1= 0x1

SymbolStatus

Status of a symbol visibility.

0= 0
1= 1
2= 2
3= 3

SymbolType

Available values for symbol types.

0= 0
1= 1
2= 2
3= 3
4= 4
5= 5
6= 6
7= 7

Class ExeFormat

PyChrysalide executable format

Hierarchy

builtins.object
 ╰── gi._gi.GObject
      ╰── pychrysalide.format.KnownFormat
           ╰── pychrysalide.format.BinFormat
                ╰── pychrysalide.format.ExeFormat

Implements: pychrysalide.analysis.storage.SerializableObject

Known subclasses:

Methods

register_user_portion(self, portion)

Remember a given user-defined binary portion as part of the executable format content.

translate_address_into_vmpa(self, addr)

Translate a physical offset to a full location.

translate_offset_into_vmpa(self, off)

Translate a physical offset to a full location.

Class FlatFormat

FlatFormat is suitable for all executable contents without a proper file format, such as shellcodes ou eBPF programs.

Instances can be created using the following constructor:

    FlatFormat(content, machine, endian)

Where content is a BinContent object, machine defines the target architecture and endian provides the right endianness of the data.

Hierarchy

builtins.object
 ╰── gi._gi.GObject
      ╰── pychrysalide.format.KnownFormat
           ╰── pychrysalide.format.BinFormat
                ╰── pychrysalide.format.ExeFormat
                     ╰── pychrysalide.format.FlatFormat

Implements: pychrysalide.analysis.storage.SerializableObject

Class KnownFormat

KnownFormat is a small class providing basic features for recognized formats.

One item has to be defined as class attribute in the final class:

  • _key: a string providing a small name used to identify the format.

The following methods have to be defined for new classes:

The following method may also be defined for new classes too:

Calls to the __init__ constructor of this abstract object expect only one argument: a binary content, provided as a BinContent instance.

Hierarchy

builtins.object
 ╰── gi._gi.GObject
      ╰── pychrysalide.format.KnownFormat

Implements: pychrysalide.analysis.storage.SerializableObject

Known subclasses:

Methods

_analyze(self, gid, status)

Abstract method used to start the analysis of the known format and return its status.

The identifier refers to the working queue used to process the analysis. A reference to the main status bar may also be provided, as a StatusStack instance if running in graphical mode or None otherwise.

The expected result of the call is a boolean.

_complete_analysis(self, gid, status)

Abstract method used to complete an analysis of a known format.

The identifier refers to the working queue used to process the analysis. A reference to the main status bar may also be provided, as a StatusStack instance if running in graphical mode or None otherwise.

_get_description(self)

Abstract method used to build a description of the format.

The result is expected to be a string.

analyze(self, gid, status)

Start the analysis of the known format and return its status.

Once this analysis is done, a few early symbols and the mapped sections are expected to be defined, if any.

The identifier refers to the working queue used to process the analysis. A reference to the main status bar may also be provided, as a StatusStack instance if running in graphical mode or None otherwise.

The return value is a boolean status of the operation.

complete_analysis(self, gid, status)

Complete an analysis of a known format.

This process is usually done once the disassembling process is completed.

The identifier refers to the working queue used to process the analysis. A reference to the main status bar may also be provided, as a StatusStack instance if running in graphical mode or None otherwise.

The return value is a boolean status of the operation.

Attributes

content

Binary content linked to the known format.

The result is a BinContent instance.

description

Human description of the known format, as a string.

key

Internal name of the known format, provided as a (tiny) string.

Class PreloadInfo

The PreloadInfo object stores all kinds of disassembling information available from the analysis of a file format itsself.

Instances can be created using the following constructor:

    PreloadInfo()

Hierarchy

builtins.object
 ╰── gi._gi.GObject
      ╰── pychrysalide.format.PreloadInfo

Known subclass: pychrysalide.arch.ProcContext

Class StrSymbol

StrSymbol is a special symbol object dedicated to strings.

Instances can be created using one of the following constructors:

    StrSymbol(encoding, format=KnownFormat, range=mrange)
    StrSymbol(encoding, string=string, addr=vmpa)

The first constructor is aimed to be used for read-only strings available from the raw data of the analyzed binary. The format provides the raw content, and the memory range specifies the location of the string.

The second constructor is useful for strings which can not be extracted directly from the original content, such as obfuscted strings. A dynamic string is then provided here, and the start point of this string has to be provided.

In both cases, the encoding remains the first argument, as a StringEncodingType value.

Hierarchy

builtins.object
 ╰── gi._gi.GObject
      ╰── pychrysalide.format.BinSymbol
           ╰── pychrysalide.format.StrSymbol

Implements:

Attributes

encoding

Encoding of the string, provided as a StringEncodingType value.

raw

Raw data of the string, provided as bytes.

structural

True if the string symbol is linked to the file structure, else False.

utf8

String content as UTF-8 data.

Constants

StringEncodingType

Kinds of encoding for strings.

0= 0
1= 1
2= 2
3= 3
4= 4

Class SymIterator

Iterator for Chrysalide symbols registered in a given format.

This iterator is built when accessing to the symbols field.

Hierarchy

builtins.object
 ╰── pychrysalide.format.SymIterator

Methods

__iter__(self)

Implement iter(self).

__next__(self)

Implement next(self).