Don't optimize prematurely
Consider what overall impact the execution of your code will have. There's no point in saving a few instructions if you are going to spend several seconds reading/writing flash memory. If your code will only be executed when the user is changes a setting in a menu, a few instructions more or less will make no difference.
Optimize for size unless you really need speed
CHDK operates in very limited memory. This means that limiting the compiled code and static data sizes is frequently more important than execution speed.
See CHDK Coding Guidelines for general recommendations regarding memory use.
Optimizing for speed
What follows are general guidelines. If you optimize, you should always verify your optimization with actual testing. If you are trying to optimize C code, check the assembler output.
- Optimize your algorithms first.
- The first four function parameters are passed in registers, so functions with four or less parameters are very cheap.
- Use full words rather than bytes or half-words when operating on bulk data. Even if you need to access individual bytes, it is frequently faster to load full words, and then manipulate the individual bytes in registers, many of those techniques are based in SIMD and SWAR, they can magically improve performance when correctly applied.
- Avoid memory access. A few more instructions that operating directly on registers may be faster than less instructions with more memory accesses.
- Interleave memory access and calculation. This is only relevant in assembler, since GCC will re-order things as it sees fit.
- TODO EXAMPLE
- Use the multiple register forms of load and store (LDM/STM). GCC isn't smart about this, so re-writing memory intensive code in assembler can result in significant gains. Multiple loads are pipelined, so the lowest register will be available in roughly the same time as single load.
- Avoid using division or modulo, since they are done in software.
- Operations on small constants or values that can be generated by shifting a small constant do not require an additional memory access. GCC tries to do this, but can do a poor job in many circumstances. Using constants that are powers of two, or otherwise expressible as a shift of a small constant (ARM allows only those that can be expressed by shifting a 1byte constant) results in code that is both faster and smaller.
- Multiplications that can be expressed as a short sequences of shifts and adds are cheap.
- Shifts are extremely cheap. As said above, when multiplying by constants shifts and adds might be used, GCC tries to do this, sometimes you're not multiplying by constants but shift can be applied as well. You can check SIMD and SWAR techniques above, some of them provide clever uses for shifts that might improve code size and speed.
- TODO MORE DETAIL
- Optimize for cache use. All CHDK supported cameras tested so far have independent 4KB caches for data and instructions.
- Unroll short loops - with max 8 items. Note that unrolling large, complicated loops conflicts with the goal of optimizing for size. For bulk data processing, unrolling fits naturally with using LDM.
- try to use as much DECREASING loops as you can. (?)
- When optimizing for speed make sure you take into account the effects of using thumb code.
- ARM Arch Reference manual is an excellent place to find how long an instruction takes to execute, this information is very useful when you would like to interleave memory accesses with other operations.
TODO HOW GOOD IS GCC ABOUT DOING THE THINGS MENTIONED ABOVE FOR YOU
TODO ADD SOME LINKS TO GENERAL ARM OPTIMIZATION INFORMATION