CHDK Wiki
Advertisement

Introduction

finsig_thumb2 is a tool used automatically identify functions and variables in Digic 6 firmware dumps. It serves the same role as the original Signature finder but aside from some shared utility code is a completely implementation due the fact Digic 6 uses the thumb2 instruction set.

Configuration

You must configure the capstone library as described in Capdis Disassembly Tool.

Usage

Once configure, use is the same as for original sig finder. Build with OPT_GEN_STUBS=1 and the firmware PRIMARY.BIN either in the in the source tree or pointed to by PRIMARY_ROOT

It is normal for finsig_thumb2 to produce some warnings. In general, they are not a problem. If a specific function match is referenced, inspecting the addresses reported may be useful to find manually finding the functions.

Note: if you see "WARNING! Incorrect disassembly is likely", it means your capstone library has not been correctly patched. This must be fixed, or addresses reported by finsig_thumb2 will be incorrect.

Development

Overview

finsig_thumb2 consists of two major components

  • firmware_load_ng.c is a generic library for analyzing firmware dumps with capstone. The public functions are mostly documented in firmware_load_ng.h.
  • finsig_thumb2.c provides the methods to identify specific firmware functions and variables.

The main purpose of finsig_thumb2 is to find functions, variables and constants and output them to stubs_entry.S. The items to be found are defined sig_names (for functions) and misc_vals (for most other values.)

Additional functions used to find required functions and aid reverse engineering are found automatically and added to sig_names.

Unlike the original finsig_dryos, matching is done by executing defined rules and hard-coded bootstrap functions in the order specified in code.

Because the variable instruction size and alignment of thumb2 code makes searching for particular instruction sequences inefficient, full searches of the firmware code are avoided as much as possible.

Disassembly basics

Disassembly is done using functions that operate on the inter_state_t structure, typically named is in the code. This structure encapsulates

  • The most recently disassembled instruction (insn member)
  • Current address
  • Current ARM/thumb state
  • A history of recently disassembled addresses, to allow back-tracking which would otherwise be unreliable for variable instruction size thumb2 code.
  • Capstone disassembly state

To analyze firmware code, the state is initialized to a specific address with disasm_iter_init (or disasm_iter_set). Each subsequent call to disasm_iter disassembles one instruction and advances the current address by the size of the instruction.

Matches

Most values are identified using "match rules" defined in a arrays of sig_rule_t. sig_rules_initial defines matches used to bootstrap eventprocs and task identification.

After sig_rules_initial is processed, find_generic_funcs identifies eventprocs and tasks.

sig_rules_main defines matches for the remaining values.

Rules

Rules consist of

  • A match function
  • The name of the value to find
  • A reference string whose meaning is defined by the match function, typically either a function name or firmware string
  • options passed to the match function
  • options to restrict the rule to particular DryOS versions

Match functions identify the target value and add it to sig_names or misc_vals. Most only add the function in the name field, but a few have side effects where multiple related values are obtained from the same firmware code.

Match functions may be generic, using the name and reference string to identify the target value, or specific to a particular function.

Rules in each least are called in the order they appear.

By convention, specific match functions are name sig_match_...

Generic rules

The most common generic rules are sig_match_named and sig_match_near_str

sig_match_named

The reference string for sig_match_named is the name of an already known function. The options (SIG_NAMED... macros) define whether it simply an alias for an eventproc (no options) or a call from the named function.

sig_match_near_str

The reference string is a string referenced by firmware code, where the target function is assumed to be "near" the where the string is referenced. The SIG_NEAR... macros are used to define where the function found in relation to the string reference.

Writing match functions

Match functions are passed

  • Firmware data object fw required for calls to most analysis functions
  • An iter_state_t state is to use for disassembling with capdis
  • The rule object rule for flags and reference sting

On success, match functions should add the target(s) using save_sig_with_j (for functions) or save_misc_val for other values and return 1. On failure, they return 0.

Most match functions initialize is to the function named by the reference string using init_disasm_sig_ref or somewhere near the reference string using find_str_bytes and disasm_iter_init

After initializing is, match functions disassemble forward using search and analysis functions to identify "signposts" (e.g. calls to know functions, references to known strings etc) and the target value.

Existing match functions provide many examples, and new matches can often be pieced together by finding previous matches that do similar things. To understand how a match works, it's usually a good idea to try to follow the logic in the disassembly of a known firmware.

Analysis functions overview

Many of the basic analysis functions are briefly documented in firmware_load_ng.h

The description below are intended to provide general overview of commonly used functions and jumping off points to the code, not comprehensive documentation.

Disassembly control and instruction search functions

Most match functions involve starting disassembly at a known point and then attempting to match an expected sequence of instructions. The functions listed below control the disassembly process.

disasm_iter_init

Initializes the iter_state to a given address. This prepares for disassembly starting at the specified address, but does not disassemble anything. This is often used to "follow" a call using an address provided by get_branch_call_insn_target.

Note the thumb bit specifies whether disassembly should be in arm or thumb mode.

init_disasm_sig_ref

Initializes the iter_state to the start of the named function.

disasm_iter

Disassemble and advance the iter_state_t by one instruction. After disassembling, insn member can be used get information about the current instruction. firmware_load_ng contains various helpers to identify specific classes of instructions and extract operands.

Returns 1 on success, 0 if disassembly failed

find_next_sig_call

Disassemble up to the specified number of bytes (NOT instructions, unlike many other functions) looking for calls to the an already identified firmware functions.

The iter_state is left pointing to the call if found, or the last instruction analyzed.

Returns 1 on success, 0 if not found or disassembly failed.

insn_match_find_next

Disassemble up to the specified number of instructions looking for any of the instructions defined in the match argument.

Returns 1 on success, 0 if not found or disassembly failed.

insn_match_find_nth

As above, but find the Nth matching instruction.

insn_match_find_next_seq

Disassemble up to the specified number of instructions looking for the sequence of instructions defined in the match argument.

Returns 1 on success, 0 if not found or disassembly failed.

fw_search_insn

Call the specified callback for every instruction in the given address range. If disassembly fails, the address is advanced by 2 bytes (for thumb) or 4 (arm). Frequently used with search_disasm_const_ref find a reference to a string.

Instruction matching

firmware_load_ng contains a variety of functions for identifying specific types of instructions.

Instruction match structure

Arrays of insn_match_t structure is used to define instruction matches for insn_match_* functions.

Depending on the function called, the match may represent a sequence of instructions (for ...find_next_seq) or a list of alternative instructions.

Matches are defined using the MATCH_... macros from firmware_load_ng.h

Each definition consists of

  • An instruction match, defined using the MATCH_OP or or MATCH_OP_CC macros. The instruction match defines the instruction and number of operands to match. MATCH_OPCOUNT_IGNORE ignores all operands. MATCH_OPCOUNT_ANY matches any number of operands, but requires that all specified operand match.
  • 0 or more operand matches, defined using the MATCH_OP_... macros. Operand matches can match by type (register, immediate etc) or specific register or value.

Match definitions end with ARM_INS_ENDING.

Instruction identification functions

Separate from the matching functions, firmware_load_ng also provides variour functions for identifying classes of instructions. This are generally named ismnemonic_operands. An "x" at the end of the mnemonic indicates that the function checks for a class of similar instructions, so isSUBx_imm identifies all subtract instructions (SUB, SUBW, SUBS etc) with an immediate operand.

... to be continued ...

Advertisement