FANDOM


IntroductionEdit

finsig_thumb2 is a tool used automatically identify functions and variables in Digic 6 firmware dumps. It serves the same role as the original Signature finder but aside from some shared utility code is a completely implementation due the fact Digic 6 uses the thumb2 instruction set.

ConfigurationEdit

You must configure the capstone library as described in Capdis Disassembly Tool.

UsageEdit

Once configured, use is the same as for original sig finder. Build with OPT_GEN_STUBS=1 and the firmware PRIMARY.BIN either in the in the source tree or pointed to by PRIMARY_ROOT

It is normal for finsig_thumb2 to produce some warnings. In general, they are not a problem. If a specific function match is referenced, inspecting the addresses reported may be useful to find manually finding the functions.

Note: if you see "WARNING! Incorrect disassembly is likely", it means your capstone library has not been correctly patched. This must be fixed, or addresses reported by finsig_thumb2 will be incorrect.

DevelopmentEdit

OverviewEdit

finsig_thumb2 consists of two major components

  • firmware_load_ng.c is a generic library for analyzing firmware dumps with capstone. The public functions are mostly documented in firmware_load_ng.h.
  • finsig_thumb2.c provides the methods to identify specific firmware functions and variables.

The main purpose of finsig_thumb2 is to find functions, variables and constants and output them to stubs_entry.S. The items to be found are defined sig_names (for functions) and misc_vals (for most other values.)

Additional functions used to find required functions and aid reverse engineering are found automatically and added to sig_names.

Unlike the original finsig_dryos, matching is done by executing defined rules and hard-coded bootstrap functions in the order specified in code.

Also unlike finsig_dryos, finsig_thumb2 only supports one match per function, and does not score partial matches.

Because the variable instruction size and alignment of thumb2 code makes searching for particular instruction sequences inefficient, full searches of the firmware code are avoided as much as possible.

Disassembly basicsEdit

Disassembly is done using functions that operate on the inter_state_t structure, typically named is in the code. This structure encapsulates

  • The most recently disassembled instruction (insn member)
  • Current address
  • Current ARM/thumb state
  • A history of recently disassembled addresses, to allow back-tracking which would otherwise be unreliable for variable instruction size thumb2 code.
  • Capstone disassembly state

To analyze firmware code, the state is initialized to a specific address with disasm_iter_init (or disasm_iter_set). Each subsequent call to disasm_iter disassembles one instruction and advances the current address by the size of the instruction.

The loaded firmware dump and information about it are encapsulated in the firmware structure, typically called fw.

Capstone cs_insnEdit

Most analysis is done using the cs_insn structure which is populated by capstone disassembly. The capstone headers capstone.h and arm.h can be helpful to understand the fields and constants.

MatchesEdit

Most values are identified using "match rules" defined in a arrays of sig_rule_t. sig_rules_initial defines matches used to bootstrap eventprocs and task identification.

After sig_rules_initial is processed, find_generic_funcs identifies eventprocs and tasks.

sig_rules_main defines matches for the remaining values.

RulesEdit

Rules consist of

  • A match function
  • The name of the value to find
  • A reference string whose meaning is defined by the match function, typically either a function name or firmware string
  • options passed to the match function
  • options to restrict the rule to particular DryOS versions

Match functions identify the target value and add it to sig_names or misc_vals. Most only add the function in the name field, but a few have side effects where multiple related values are obtained from the same firmware code.

Match functions may be generic, using the name and reference string to identify the target value, or specific to a particular function.

Rules in each least are called in the order they appear.

By convention, specific match functions are name sig_match_...

Generic rulesEdit

The most common generic rules are sig_match_named and sig_match_near_str

sig_match_namedEdit

The reference string for sig_match_named is the name of an already known function. The options (SIG_NAMED... macros) define whether it simply an alias for an eventproc (no options) or a call from the named function.

sig_match_near_strEdit

The reference string is a string referenced by firmware code, where the target function is assumed to be "near" the where the string is referenced. The SIG_NEAR... macros are used to define where the function found in relation to the string reference.

sig_rule match functions overviewEdit

Match functions are passed

  • Firmware data object fw required for calls to most analysis functions
  • An iter_state_t state is to use for disassembling with capdis
  • The rule object rule for the function name, flags and reference string

On success, match functions should add the target(s) using save_sig_with_j (for functions) or save_misc_val for other values and return 1. On failure, they return 0.

Most match functions initialize is to the function named by the reference string using init_disasm_sig_ref or somewhere near the reference string using find_str_bytes and disasm_iter_init

After initializing is, match functions disassemble forward using search and analysis functions to identify "signposts" (e.g. calls to know functions, references to known strings etc) and the target value.

Existing match functions provide many examples, and new matches can often be pieced together by finding previous matches that do similar things. To understand how a match works, it's usually a good idea to try to follow the logic in the disassembly of a known firmware.

The firmware structure contains an iter_state fw->is, which is used by some functions that need an additional temporary state is required, e.g. for backtracking. This state can be used as a temporary state in match code, but care must be taken not to call functions which also modify this state.

Analysis functions overviewEdit

Many of the basic analysis functions are briefly documented in firmware_load_ng.h

The description below are intended to provide general overview of commonly used functions and jumping off points to the code, not comprehensive documentation.

Disassembly control and instruction search functionsEdit

Most match functions involve starting disassembly at a known point and then attempting to match an expected sequence of instructions. The functions listed below control the disassembly process.

disasm_iter_initEdit

Initializes the iter_state to a given address. This prepares for disassembly starting at the specified address, but does not disassemble anything. This is often used to "follow" a call using an address provided by get_branch_call_insn_target.

Note: the thumb bit specifies whether disassembly should be in arm or thumb mode.

init_disasm_sig_refEdit

Initializes the iter_state to the start of the named function.

disasm_iterEdit

Disassemble and advance the iter_state_t by one instruction. After disassembling, insn member can be used get information about the current instruction. firmware_load_ng contains various helpers to identify specific classes of instructions and extract operands.

Returns 1 on success, 0 if disassembly failed

find_next_sig_callEdit

Disassemble up to the specified number of bytes (NOT instructions, unlike many other functions) looking for calls to the an already identified firmware functions.

The iter_state is left pointing to the call if found, or the last instruction analyzed.

Returns 1 on success, 0 if not found or disassembly failed.

insn_match_find_nextEdit

Disassemble up to the specified number of instructions looking for any of the instructions defined in the match argument.

Returns 1 on success, 0 if not found or disassembly failed.

insn_match_find_nthEdit

As above, but find the Nth matching instruction.

insn_match_find_next_seqEdit

Disassemble up to the specified number of instructions looking for the sequence of instructions defined in the match argument.

Returns 1 on success, 0 if not found or disassembly failed.

fw_search_insnEdit

Call the specified callback for every instruction in the given address range. If disassembly fails, the address is advanced by 2 bytes (for thumb) or 4 (arm). Frequently used with search_disasm_const_ref find a reference to a string.

Instruction matchingEdit

firmware_load_ng contains a variety of functions for identifying specific types of instructions.

Instruction match structureEdit

Arrays of insn_match_t structure is used to define instruction matches for insn_match_* functions.

Depending on the function called, the match may represent a sequence of instructions (for ...find_next_seq) or a list of alternative instructions.

Matches are defined using the MATCH_... macros from firmware_load_ng.h

Each definition consists of

  • An instruction match, defined using the MATCH_OP or or MATCH_OP_CC macros. The instruction match defines the instruction and number of operands to match. MATCH_OPCOUNT_IGNORE ignores all operands. MATCH_OPCOUNT_ANY matches any number of operands, but requires that all specified operand match.
  • 0 or more operand matches, defined using the MATCH_OP_... macros. Operand matches can match by type (register, immediate etc) or specific register or value.

Match definitions end with ARM_INS_ENDING.

Instruction identification functionsEdit

Separate from the matching functions, firmware_load_ng also provides variour functions for identifying classes of instructions. This are generally named isMNEMONIC_operands. An "x" at the end of the mnemonic indicates that the function checks for a class of similar instructions, so isSUBx_imm identifies all subtract instructions (SUB, SUBW, SUBS etc) with an immediate operand.

Extracting valuesEdit

Functions are provided to obtain values of operands, variables etc.

Note: many of the functions dealing with firmware address (jump targets, ADR, PC relative LDR) return 0 on failure. While 0 could theoretically be a valid value, it is unlikely to be in practice.

ADRx2adrEdit

Extract address calculated by various ADD and SUB instructions using PC as an operand to generate a nearby address. Returns 0 if instruction isn't ADR-like.

LDR_PC2valEdit

Return the value that would be loaded by a PC relative load.

B_target and similarEdit

B_target, CBx_target and similar return the target of various immediate branch instructions, or zero if the instruction is not of the specified type.

Note: these function return the address as it is encoded in the underlying instruction, without modifying the thumb bit. In general, this means the thumb bit is not set, except for LDR_PC_PC_target. The get_ ... functions described below are more convenient in many cases.

get_direct_jump_targetEdit

Checks if the code starting at is_init is direct jump (e.g. B, LDR PC,#const, or multi-instruction variants involving IP). These kinds of instructions are frequently generated as veneers in the thumb2 code.

Returns the target address with the thumb bit set appropriately, or zero if not matching instruction is found.

Modifies fw->is, does not modify is_init

get_branch_call_insn_targetEdit

Checks if the current instruction of is is a single instruction function call or branch instruction.

Returns the target address with the thumb bit set appropriately, or zero if the instruction does not match.

get_call_const_argsEdit

Uses the address and history in is_init to disassemble backwards attempting to identify constant values that would end up as function arguments (in r0-r3).

Returns a bitmask of the registers for which values were identified, and stores the values identified in res

Modifies fw->is, does not modify is_init

Note: this function works reasonably well in practice, but there are many cases in which it can produce incorrect results.

Storing found valuesEdit

When a match is found, it needs to saved in the sig_names or misc_vals structure.

save_sig_with_jEdit

Saves an address for a function that already exists in sig_names. If the passed address is a veneer (a direct jump to another address), the veneer is saved with the name j_name and the function is added with the target of the veneer. This currently only handles one level of veneer. Adding matches via a commonly used veneer is helpful for reverse engineering.

The given address must have the thumb bit set appropriately. Typical usage from a match function is

return save_sig_with_j(fw,rule->name,get_branch_call_insn_target(fw,is));

add_func_nameEdit

Adds new function to sig_names. Used for functions which are automatically identified and named using generic analysis, like event procs and tasks.

save_misc_valEdit

Saves the address for the named misc value. The address is specified as a base and offset, to document values that are structure members in the stubs file. A reference address may also be provided to indicate where the value was found.

Typical usage from a match function is

save_misc_val(rule->name,is->insn->detail->arm.operands[1].imm,0,(uint32_t)is->insn->address);

... to be continued ...

Ad blocker interference detected!


Wikia is a free-to-use site that makes money from advertising. We have a modified experience for viewers using ad blockers

Wikia is not accessible if you’ve made further modifications. Remove the custom ad blocker rule(s) and the page will load as expected.