DEV IN PROGRESS

Python tutorial

There are two ways to use Chrysalide from Python:

  • as a standalone extension from the Python interpreter.
  • as an embedded extension when running the GUI.

Here are some basic steps to introduce both of these usages.

Loading a binary file

First case: known format

If the binary format is known and fixed, loading process can be straightforward:

1
2
3
4
5
6
from pychrysalide.features import *

cnt = FileContent('/path/to/binary_file')
fmt = ElfFormat(cnt)
binary = LoadedBinary(fmt)
binary.analyze_and_wait()
Here is a detailed explanation:
  • line 1: import all items available from Chrysalide bindings.
    For a more selective loading, the following lines import only used module items.
    1
    2
    3
    from pychrysalide.analysis.contents import FileContent
    from pychrysalide.analysis import LoadedBinary
    from pychrysalide.format.elf import ElfFormat
    
  • line 3: load content from a binary file.
    Others sources could have been memory or content encapsulated in another content.
  • line 4: setup a matching file format.
    ElfFormat could have been replaced by DexFormat depending on the case.
  • line 5: create an abstract layer to deal with all high level analysis requests.
    The relative ArchProcessor class can be retrieved from this loaded binary if requested.
  • line 6: analysis of file format and code instructions starts here.
    Execution flow will wait for the end of the analysis.

Base for other cases

Here is the general process to load binaries, which remains quite simple:

1
2
3
4
5
6
7
8
9
from pychrysalide.features import *

prj = StudyProject()

cnt = FileContent('/path/to/binary_file')
prj.discover(cnt)
wait_for_all_global_works()

binary = prj.contents[0]
Once again some explanation:
  • line 3: create a fresh and empty new project, which will become a placeholder for the loaded binary.
  • line 5: load content from a binary file.
    Others sources could have been memory or content encapsulated in another content.
  • line 6: two processes are launched at once:
    • one to explore the provided binary and its inner contents (for targets like APK files).
    • another one to resolve discovered contents (to match a given file format such as ELF for instance).
  • line 7: we must wait for the end of these two processes before dealing with loaded binary contents.
  • line 9: for a simple file format without inner binaries, only one format is resolved and only one binary content is loaded and analyzed.

Working with binaries

Browsing symbols

Strings are a special kind of symbols, which can be handled with code like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
for s in binary.format.symbols:

    if not(s.stype in [ SymbolType.RO_STRING, SymbolType.DYN_STRING ]):
        continue

    print('0x%04x - %s' % (s.range.addr.phys, s.label))

    origin = binary.content.read_raw(s.range.addr, s.range.length)

    print('  -> origin:', origin)
    print('  -> raw:', s.raw)
    print('  -> utf8:', s.utf8)

    assert(s.stype == SymbolType.DYN_STRING or s.utf8 == origin.decode('utf-8'))
A few comments:
  • line 3: even if it belongs to the BinSymbol class, the SymbolType enumeration can here be accessed directly, thanks to the initial import of all the features.
    RO_STRING is the type of strings loaded directly from the original binary content.
    DYN_STRING marks strings rebuilt during analysis.
  • line 8: if the string has been encrypted using a xor-like algorithm, the original source data can be read from the binary content.
  • line 11: s.raw provides the raw content of the current string value.
    If the string was decrypted by Chrysalide, then the displayed bytes are the plain text values.

Analysing instructions

Disassembled instructions can be accessed using the processor attribute of loaded binaries, but also from basic blocks:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
biggest = None

for s in binary.format.symbols:

    if s.stype != SymbolType.ROUTINE:
        continue

    if biggest is None or s.basic_blocks.count > biggest.basic_blocks.count:
        biggest = s


def show_block(blk, grp):

    first, last = blk.boundaries

    print('Block @ 0x%x: %s - %s' \
          % (first.range.addr.phys, first.keyword, last.keyword), end='')

    for db, dt in blk.destinations:

        desc = str(dt)

        # desc is for instance: 'InstructionLinkType.JUMP_IF_TRUE'
        desc = desc[desc.find('.') + 1:]

        print('  |-> 0x%x (%s)' % (db.boundaries[0].range.addr.phys, desc), end='')

    print()


for bb in biggest.basic_blocks:
    show_block(bb, biggest.basic_blocks)
Here are the key points:
  • line 1: basic blocks are stored in the routine owning them.
    This first loop only looks for the biggest routine, according to the number of basic blocks.
  • line 8: basic_blocks is an iterable object, and it also provides one useful attribute to directly count the quantity of blocks.
  • line 14: block boundaries refer to a start instruction and an ending instruction.
  • line 16: instructions are classes with some attributes such as keywords and locations.
  • line 19: for sources and destinations, links between code blocks are a pair of linked block and link type.
  • line 26: each instruction link type (such as JUMP_IF_TRUE for instance) is defined using a constant value.
    For Python these values are exported in the ArchInstruction class using the InstructionLinkType enumeration.

GUI integration

Custom panel

There are plenty of panels in the GUI main window.

Creating a new plugin is a way to build a new kind of panel. The first step is to setup a directory for this new plugin :

1
2
mkdir hellopanel
echo "from hellopanel.core import HelloPlugin as AutoLoad" > hellopanel/__init__.py

The directory containing the hellopanel directory has to be in $PYTHONPATH.

At startup Chrysalide will load the HelloPlugin class as plugin because AutoLoad is an alias for it.

The content of the hellopanel/core.py file is the following one:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
from pychrysalide.features import *
from .panel import HelloPanel

class HelloPlugin(PluginModule):
    """Simple demo plugin to build a GUI panel."""

    def __init__(self):
        """Initialize the plugin for Chrysalide."""

        interface = {

            'name' : 'HelloPanel',
            'desc' : 'Say hello in the main GUI',
            'version' : '0.1',

            'actions' : ( )

        }

        super(HelloPlugin, self).__init__(**interface)

        p = HelloPanel()

        register_panel(p)
  • line 2: import the class providing the widget used for the panel.
  • line 10: a dictionary is defined to provide the plugin description.
  • line 20: the call to super().__init__() actualy build the new object.
  • line 24: the built panel is registered as a legit panel for Chrysalide.

The last file is hellopanel/panel.py; its content is:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
from gi.repository import Gtk
from pychrysalide.features import *

DEFAULT_MSG = 'No active loaded binary'

class HelloPanel(PanelItem):

    def __init__(self):
        """Initialize the GUI panel."""

        self._label = Gtk.Label()
        self._label.set_text(DEFAULT_MSG)

        params = {

            'name' : 'Hello',
            'widget' : self._label,

            'personality' : PanelItem.PIP_SINGLETON,
            'lname' : 'Hello panel description',
            'dock' : True,
            'path' : 'MEN'

        }

        super(HelloPanel, self).__init__(**params)

    def _change_content(self, old, new):
        """Get notified about loaded content change."""

        if type(new) is LoadedBinary:

            count = len(list(new.processor.instrs))

            self._label.set_text('Loaded binary with %u instructions' % count)

        else:

            self._label.set_text(DEFAULT_MSG)

A few last words:

  • line 6: the panel has to be a subclass of class PanelItem.
  • line 26: some properties are transmitted to the parent constructor:
    • name: label for the GUI tab.
    • widget: GTK widget for the panel.
    • personality: defines how many panels can be created at the same time.
    • lname: long description for tooltips.
    • dock: True if the panel should be displayed at startup.
    • path: location of the panel in the tiled grid (M = Main area, E = East, N = North, aso).
  • line 28: if defined, the _change_content method will be called each time a new content gets active in the GUI.
    In this case, the panel will be docked at the upper right corner of the main window.

Other resources

This tutorial does not explain all the Python API yet.

The reference documentation provides information about all available features.

The snippets repository contains more advanced examples. Some real case plugins are also defined in the Chrysalide's source code.