DEV IN PROGRESS

More than one year has passed since the last blog post.

No news, good news.

A lot of improvements have been committed, as the statistics show:

git diff --stat 3d2576f..HEAD | tail -1
 1818 files changed, 62736 insertions(+), 68424 deletions(-)

A small Python script has also been created to plot the development activity for 2018:

The number of past and incoming evolutions is quite huge, so here is a quick summary of three major changes.

The following article is based on commit ce43a13d, so you can give this version of Chrysalide a try by running:

git clone http://git.0xdeadc0de.fr/chrysalide.git
cd chrysalide
git checkout ce43a13d

As usual, the next step is to follow the installation procedure.

Instruction definition enhancement

The first major change in the definition file format is the new @assert tag. It allows to specify alternative syntaxes for a given encoding.

For instance, in the ARM architecture, the S bit specifies whether the instruction sets the CPU flags or not. Thus there may be "add" and "adds" instructions for additions.

In the previous version, the final keyword was computed dynamically from a global addition instruction, and that came with a memory cost.

Now the @assert extension allows to distinct the two cases and to define a final static keyword from the definition files.

To illustrate the current definition syntax, here is the resulting definition for the third encoding of ADD (SP plus register, Thumb) (A8.8.10):

@encoding (T3) {

        @word 1 1 1 0 1 0 1 1 0 0 0 S(1) 1 1 0 1 0 imm3(3) Rd(4) imm2(2) type(2) Rm(4)

        @syntax {

                @assert {

                        S == 0

                }

                @conv {
 
                        reg_D = Register(Rd)
                        reg_SP = Register(13)
                        reg_M = Register(Rm)
                        shift = DecodeImmShift(type, imm3:imm2)

                }

                @asm add.w ?reg_D reg_SP reg_M ?shift

         }

        @syntax {

                @assert {
 
                        S == 1

                }

                 @conv {

                        reg_D = Register(Rd)
                        reg_SP = Register(13)
                        reg_M = Register(Rm)
                        shift = DecodeImmShift(type, imm3:imm2)

                }
 
                @asm adds.w ?reg_D reg_SP reg_M ?shift

        }

}

There is also a new @id tag which provides an unique identifier for each instruction. This will be useful for binary diffing (see the Small Primes Product method).

It also allows to save memory at runtime.

Before the change, instructions contained a pointer for their keywords and another one for their descriptions. After the change, instructions only store a small unique number, which is used as access index in global arrays of keywords (when available) or descriptions.

The second major change lies in the way definition files are translated into source code.

The past situation was clean but rather complex.

For instance, the Dalvik architecture offers plenty of "move" instructions: "move/from16", "move/16", "move-wide", and so on.

Each one produced a template C file, and all these templates were gathered into a final "move.c" C file.

The game of Makefile rules made it right: if a definition was modified, its template was updated and, as the final C file depended on all its templates, all the templates were then merged again.

All of this worked well but the generated Makefile was quite hard to read...

The new d2c compiler has been improved to directly produce the final merged files. But as the dependencies system is lost, one has to clean all the opcodes directory for each definition update.

This is a fair tradeoff as definitions are not aimed to change very often. And the compiler will maybe be updated again to restore the old behavior for updates.

Python bindings

A massive effort has began for the Python bindings in Chrysalide. The development of the monolithic plugin has stopped.

As all features which do not belong to the core features (mainly formats, architectures and debugging protocols support) are being moved into external plugin, only one plugin for the Python bindings can not address all features easily.

Thus the Python plugin now only provides bindings for the core features, and each plugin is free to extend the provided Python API with its own features. This extra code is located in the python directory in the plugins code source.

The Pythonic way of coding is kept in mind during the process of creating nice Python bindings, but there is still a lot of work!

An automatically generated documentation of the current state of the Python API is available online and has been updated to match the current state of the bindings.

Moreover, a new git repository has been populated to show some cases of binary usages from Python: https://git.0xdeadc0de.fr/cgi-bin/cgit.cgi/snippets.git/.

American Fuzzy Lop

AFL is a well know fuzzer, and it helped a lot to track hidden bugs in Chrysalide. The fuzzing campaign mainly targeted the Dex format and the Dalvik VM so far, and several bugs have been fixed.

The use of AFL is allowed by the new ability of Chrysalide to process files from command line using the batch mode. Some GUI features are disabled to speed up analysis in these cases.

Quite all the found bugs are located in error code paths, one of them is relative to a beautiful integer overflow (shame on me!) and another one is an interesting case of mismatch between the GLib documentation and the underlying mechanisms.

After a few warming hours, here is the list of all fixed bugs:

  • Dalvik:
    • lock the register singleton list during multithreaded updates (commit a585058f).
    • fix a size overflow in the global register list (commit ce422fd3).
  • Dex:
    • rewrite a simple copy-pasting reference counter mistake in error path (commit 52354297).
    • do not use a partial demangled type (commit 1001a818).
    • respect the specification alignments when reading Dex structures (commit f2f74f95).
    • avoid memory exhaustion by handling corrupted Dex headers (commit 8327e2ed).
  • ARMv7:
    • avoid to fetch a value the CPU can fetch (commit be8f899a).
  • Core:
    • discover an interesting case of optional lock, unless a "predictable scheduling behavior" is expected (commit 5d99d1bb).
    • fix a double free in error path (commit ec14d57b).

Posted on April 30, 2018 at 16:18