There are two ways to use Chrysalide from Python:
- as a standalone extension from the Python interpreter.
- as an embedded extension when running the GUI.
Here are some basic steps to introduce both of these usages.
Loading a binary file
First case: known format
If the binary format is known and fixed, loading process can be straightforward:
1 2 3 4 5 6 7 8 9 | from pychrysalide.features import *
cnt = FileContent('/path/to/binary_file')
fmt = ElfFormat(cnt)
binary = LoadedBinary(fmt)
binary.analyze_and_wait()
|
-
Line 1: import all items available from Chrysalide bindings.
For a more selective loading, the following lines import only used module items.1 2 3
from pychrysalide.analysis.contents import FileContent from pychrysalide.analysis import LoadedBinary from pychrysalide.format.elf import ElfFormat
-
Line 3: load content from a binary file.
Others sources could have been memory or content encapsulated in another content. -
Line 5: setup a matching file format.
ElfFormat
could have been replaced byDexFormat
depending on the case. -
Line 7: create an abstract layer to deal with all high level analysis requests.
The relativeArchProcessor
class can be retrieved from this loaded binary if requested. -
Line 9: analysis of file format and code instructions starts here.
Execution flow will wait for the end of the analysis.
Base for other cases
Here is the general process to load binaries, which remains quite simple:
1 2 3 4 5 6 7 8 9 10 11 | from pychrysalide.features import *
prj = StudyProject()
cnt = FileContent('/path/to/binary_file')
prj.discover(cnt)
wait_for_all_global_works()
binary = prj.contents[0]
|
- Line 3: create a fresh and empty new project, which will become a placeholder for the loaded binary.
-
Line 5: load content from a binary file.
Others sources could have been memory or content encapsulated in another content. -
Line 7: two processes are launched at once:
- one to explore the provided binary and its inner contents (for targets like APK files).
- another one to resolve discovered contents (to match a given file format such as ELF for instance).
- Line 9: we must wait for the end of these two processes before dealing with loaded binary contents.
- Line 11: for a simple file format without inner binaries, only one format is resolved and only one binary content is loaded and analyzed.
Working with binaries
Browsing symbols
Strings are a special kind of symbols, which can be handled with code like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | for s in binary.format.symbols:
if not(s.target_type in [ BinSymbol.STP_RO_STRING, BinSymbol.STP_DYN_STRING ]):
continue
print('0x%04x - %s' % (s.range.addr.phys, s.label))
origin = binary.content.read_raw(s.range.addr, s.range.length)
print(' -> origin:', origin)
print(' -> raw:', s.raw)
print(' -> utf8:', s.utf8)
assert(s.target_type == BinSymbol.STP_DYN_STRING or s.utf8 == origin.decode('utf-8')[:-1])
|
-
Line 3:
STP_RO_STRING
is the type of strings loaded directly from the original binary content.
STP_DYN_STRING
marks strings rebuilt during analysis. - Line 8: if the string has been encrypted using a xor-like algorithm, the original source data can be read from the binary content.
-
Line 12:
s.raw
provides the raw content of the current string value and will not include the trailing nul byte.
If the string was decrypted by Chrysalide, then the displayed bytes are the plain text values. - Line 14: same thing for the UTF-8 string value: no null byte here!
Analysing instructions
Disassembled instructions can be accessed using the processor
attribute of loaded binaries, but also from basic blocks:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | biggest = None
for s in binary.format.symbols:
if s.target_type != BinSymbol.STP_ROUTINE:
continue
if biggest is None or s.basic_blocks.count > biggest.basic_blocks.count:
biggest = s
def link_type_to_str(t):
links = [ getattr(ArchInstruction, a) for a in dir(ArchInstruction) if a.startswith('ILT_') ]
return str(links[links.index(t)])[4:]
def show_block(blk, grp):
first, last = blk.boundaries
print('Block @ 0x%x: %s - %s' % (first.range.addr.phys, first.keyword, last.keyword), end='')
for db, dt in blk.destinations:
print(' |-> 0x%x (%s)' % (db.boundaries[0].range.addr.phys, link_type_to_str(dt)), end='')
print()
for bb in biggest.basic_blocks:
show_block(bb, biggest.basic_blocks)
|
-
Line 1: basic blocks are stored in the routine owning them.
This first loop only looks for the biggest routine, according to the number of basic blocks. -
Line 8:
basic_blocks
is an iterable object, and it also provides one useful attribute to directly count the quantity of blocks. -
Line 14: each instruction link type (such as
ILT_JUMP_IF_TRUE
for instance) is defined using a constant value in the C code.
For Python these values are exported in theArchInstruction
class using the same name. -
Line 16: instruction link types are created using a special number object which handles a value and the
str()
function.
To get a link description from any number, this type identifier has to be translated into the special number object first. - Line 21: block boundaries refer to a start instruction and an ending instruction.
- Line 23: instructions are classes with some attributes such as keywords and locations.
- Line 25: for sources and destinations, links between code blocks are a pair of linked block and link type.
-
Line 26: as link types get more likely processed than printed, their value is a simple integer.
In order to translate this value into a human readable string, some extra processing has to be done!
GUI integration
Custom panel
There are plenty of panels in the GUI main window.
Creating a new plugin is a way to build a new kind of panel. The first step is to setup a directory for this new plugin :
1 2 | mkdir hellopanel
echo "from hellopanel.core import HelloPlugin as AutoLoad" > hellopanel/__init__.py
|
The directory containing the hellopanel
directory has to be in $PYTHONPATH
.
At startup Chrysalide will load the HelloPlugin
class as plugin because AutoLoad
is an alias for it.
The content of the hellopanel/core.py
file is the following one:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | from pychrysalide.features import *
from .panel import HelloPanel
class HelloPlugin(PluginModule):
"""Simple demo plugin to build a GUI panel."""
def __init__(self):
"""Initialize the plugin for Chrysalide."""
interface = {
'name' : 'HelloPanel',
'desc' : 'Say hello in the main GUI',
'version' : '0.1',
'actions' : ( )
}
super(HelloPlugin, self).__init__(**interface)
p = HelloPanel()
register_panel(p)
|
- Line 2: import the class providing the widget used for the panel.
- Line 12: a dictionary is defined to provide the plugin description.
-
Line 22: the call to
super().__init__()
actualy build the new object. - Line 26: the built panel is registered as a legit panel for Chrysalide.
The last file is hellopanel/panel.py
; its content is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | from gi.repository import Gtk
from pychrysalide.features import *
DEFAULT_MSG = 'No active loaded binary'
class HelloPanel(PanelItem):
def __init__(self):
"""Initialize the GUI panel."""
self._label = Gtk.Label()
self._label.set_text(DEFAULT_MSG)
params = {
'name' : 'Hello',
'widget' : self._label,
'personality' : PanelItem.PIP_SINGLETON,
'lname' : 'Hello panel description',
'dock' : True,
'path' : 'MEN'
}
super(HelloPanel, self).__init__(**params)
def _change_content(self, old, new):
"""Get notified about loaded content change."""
if type(new) is LoadedBinary:
count = len(list(new.processor.instrs))
self._label.set_text('Loaded binary with %u instructions' % count)
else:
self._label.set_text(DEFAULT_MSG)
|
A few last words:
-
Line 8: the panel has to be a subclass of class
PanelItem
. -
Line 28: some properties are transmitted to the parent constructor:
- name: label for the GUI tab.
- widget: GTK widget for the panel.
- personality: defines how many panels can be created at the same time.
- lname: long description for tooltips.
- dock: True if the panel should be displayed at startup.
- path: location of the panel in the tiled grid (M = Main area, E = East, N = North, aso).
-
Line 31: if defined, the
_change_content
method will be called each time a new content gets active in the GUI.
In this case, the panel will be docked at the upper right corner of the main window.
Other resources
This tutorial does not explain all the Python API yet.
The reference documentation provides information about all available features.
The snippets repository contains more advanced examples. Some real case plugins are also defined in the Chrysalide's source code.