Speed Gain
nOn image of 256x256 pixels
nOld C code executes 26*256*256 instructions = 1,703,936 instructions
nOptimised mmx code executes 6*256*32 instructions = 49,152
nNote that no compiler currently will give the optimised code. It has to be hand assembled.