>> LONG LIVE ASSEMBLER - CANNOT BEAT IT FOR PEFORMANCE
001100010010011110100001101101110011*
I have always wondered if programmers like myself are a dying breed
with the focus on high level languages such as Python, Java, C# and the
variants that have been written within their runtime environments -
but what about good old C and low level assembly programming?
Thankfully, with the surge of IoT and the use of low powered, resource
stricken micro controllers there is a great opportunity for a number of
us to enjoy programming the good old fashioned way - even such that it
is considered that using C is too high level.
Consider the following piece of C to do a shift operation >>
on a 256 bit byte stream:
unsigned char *p, val, carry;
int i;
p = (unsigned char *)x;
carry = 0;
for (i=0; i<32; i++) // 32 * 8 = 256 bit
{
val = p[i];
p[i] = ((val >> 1) & 0x7f) | carry;
if (val & 1) carry = 0x80; else carry = 0x00;
}
Pretty standard stuff, go through every byte and shift one bit, apply a carry
if applicable and detect if the least significant bit was set to define
the carry status for the next byte. It is completely normal to assume
that when targeting an 8bit Atmel based device such as the arduino - using
8bit code would be the most optimal. Unfortunately, this isn't the case as
you can quite easily see by using the -S compile flag and reviewing
the assembly produced.
The C programming language requires that an int data type
be at least 16 bits in size - so any operations requiring bit wise operations
in C would be done as if they were on 16 bit integers. There are cases
where you can force gcc to use 8 bit - using the -mint8
compile option but it is documented that if you do, you are on your own
and use at your own risk.
In contrast, here is the same code in AVR assembly:
movw r26, r24 ; load x within X (r27:r26)
ldi r18, 32 ; i = 32
clc ; clear carry bit
BigInt_shiftRight_loop:
ld r23, X ; load r23 with value at X
ror r23 ; rotate right r23, with carry
st X+, r23 ; store r23 in X, move to next byte
dec r18 ; loop until r18 is zero
brne BigInt_shiftRight_loop
The resulting difference? For starters, the code is much smaller
and so much faster. The ror instruction does all the work
handling the carry logic for us - and since the dec instruction
doesn't affect the carry flag the code is kept to a pure minimum.
When I updated most of my low level "BigInt" functions to assembler,
I saw a speed up improvement of almost two fold.
The bottom line is that when you are dealing with 8bit 16Mhz micro
controllers and there is very limited space to actually write your
applications, you simply need to consider assembly level programming
or your code just wont cut it if you intend to do CPU intensive operations.
On the other hand, it is specific to 8bit processors as the compiler
optimization in gcc for larger sized processors is quite
impressive, which has been in active development for almost every
processor made - dating back to its
origin
back in 22 March 1987.