Where is the minimum Hello World limit on AVR?



Warning: Dirty hacks are commonly used in this article. It can be taken only as a “how not to do” allowance!

As soon as I saw the article “Little Hello World for a small microcontroller - 24 bytes” , my internal assembler was filled with indignation: “Is it possible to scatter precious bytes like this ?!”. And although I switched to C a long time ago, it doesn’t interfere in critical places by checking the bypass code of the compiler and, if everything is bad, sometimes you can slightly change the C code and get a noticeable gain in speed and / or space. Or just rewrite this piece in assembler.

So, the conditions of our task:

  1. AVR microcontroller, I had the most ATMega48 in the bins, let it be;
  2. Clocking from an internal source. The fact is that outwardly one can clock an AVR with an arbitrarily low frequency, and this immediately puts our task into the category of unsportsmanlike;
  3. We blink an LED with a frequency visible to the eye;
  4. The size of the program should be minimal;
  5. All the outstanding power of the microcontroller rushes to the task.


For indication, we connect an LED with a resistor between the V CC power bus and the B7 pin of our little mega.

We will write in AVR Studio.

In order not to rush right into the wilds of asm, here is the first obvious pseudo-code in C:

int main(void)
{
volatile uint16_t x;
	while (1) {					// Бесконечный цикл
		while (++x)				// Задержка
			;
		DDRB ^= (1 << PB7);			// Изменение состояния вывода B7 на противоположное
	}
}

Since we do not need to be distracted by other tasks, the use of timers is clearly redundant. The usual delay function _delay_us () for GCC is based on something similar to the inner while loop given here. We immediately handled the variable x badly - we make a cycle based on its overflow, which is unacceptable in real problems.

We look at the listing, we are horrified by the wastefulness of the compiler and create a project based on assembler. We throw out the excess from the one compiled by the compiler, it remains:

	.include "m48def.inc"		; Используем ATMega48
	.CSEG				; Кодовый сегмент
	ldi		r16, 0x80		; r16 = 0x80
start:
	adiw	x, 1			; Сложение регистровой пары [r26:r27] с 1
	brcc	start			; Переход, если нет переноса
	in		r28, DDRB		; r28 = DDRB
	eor		r28, r16		; r28 ^= r16
	out		DDRB, r28		; DDRB = r28
	rjmp	start			; goto start

For the non-use of interrupts, we will place the code directly in place of the table thereof, because Reset will lead us to the address 0x0000. When x goes from 0xFFFF to 0x0000, the carry (overflow) flags C and the zero result flag Z are set, you can catch any using brne or brcc.

We got 14 bytes of machine code and the counter cycle time = 4 clock cycles. Since the x is double-byte, the half-period of the LED blinking is 65536 * 4 = 262144 clock cycles. Let's choose the internal timer more slowly, namely the 128kHz RC oscillator. Then our half-cycle is 262144/128000 = 2.048 s. The task conditions are met, but the size of the firmware can clearly be reduced.

First, we sacrifice reading the status of the direction of the DDRB port, why do we need it, we already know that there is always either 0x00 or 0x80. Yes, doing so is not good, but here everything is under control! And secondly, the other conclusions of port B are not used, it’s okay if garbage is recorded there!

Let's pay attention to the high order of the variable x : it changes strictly after 65536/2 * 4 = 131072 clock cycles. Well , let's output its high byte xh to the port , getting rid of the inner loop and the r16 variable :

start:
	adiw	x, 1			; Сложение регистровой пары [r26:r27] с 1
	out		DDRB, xh		; DDRB = r27
	rjmp	start			; goto start

Perfectly! We hit 6 bytes! We calculate the timings: (2 + 1 + 2) * 65536/2 = 163840, which means that the LED will flash with a half period of 163840/128000 = 1.28 s. The remaining legs of Port B will twitch much faster, we just close our eyes to this.

And one could calm down on this, however, a real assembler has in his sleeve an even more dirty trick than all the previous ones combined! Why don't we throw away this rjmp, which occupies (think only) a third of the program ?! Turn to the depths. After erasing the flash memory of the microcontroller, all the cells take the value 0xFF, i.e., after the processor goes beyond the program limits, it only comes across 0xFFFF instructions, they are undocumented, but they are executed like 0x0000 (nop), namely, the processor does not does nothing but increase the register pointer of the executable instruction (Program counter). After reaching the limit value, in our case it is the program memory size 4096 - 1 = 4097, it overflows and again becomes equal to 0, indicating the beginning of the program, where the execution goes! Now the delay will be determined by the passage throughout the program memory, these are 2048 two-byte instructions that are executed on one clock cycle. Therefore, take a single-byte counter variable:

	inc		r16			; r16++
	out		DDRB, r16		; DDRB = r16

Or in C:

uint_8 b
	DDRB = ++b;

The half-time of the LED blinking is 2048 * 256/2 = 262144 cycles or 2.048 s (as in the first example).

In total, the size of our program is 4 bytes , it is functional, however, this victory was achieved at such a price that we are ashamed to look in the mirror. By the way, the size of the original C program was 110 bytes with the compilation option -Os (fast and compact code).

conclusions


We looked at several ways to shoot in the leg.
If you feel cramped within the language - go down to the bottom, there is nothing complicated. Having studied how the processor works, it becomes much easier with top-level languages. Yes, abstraction is now in vogue: frameworks, Linux in a coffee maker, even the built-in x86, however, the assembler is not going to give up positions in cases where you need hard realtime, maximum performance, limited resources, etc. Despite poor portability (sometimes even within the family), modifiability, ease of losing understanding of what is happening and the difficulty of writing large programs, assembler quickly and small functions and inserts are written quite successfully, and it seems that it will never be knocked out of this niche! Although this applies primarily to embedders, and in the life of most x86 programmers, assembler mainly occurs during debugging,

For me, Asm vs C does not exist, I use them together, and C prevails significantly.

Using a sword implies utmost care.

Thanks for attention!

UPD1
Not too lazy, poured into the iron - yes, it works!

UPD2
But never do that!
In view of the fact that the thought of cutting the program further does not leave our minds, we will continue.

I have not tried it myself, but some people on the Internet say that if you write in the PINx register, then the PORTx value will change to the opposite (except for the oldest AVR microcontrollers). This means that an internal pull-up resistor is connected / disconnected between the V CC and the output.
Take the LED more sensitive to low currents and connect it between the terminal B0 and ground.
We program the fuse CKDIV8, the clock frequency will drop another 8 times - up to 16 kHz. (Only now not every programmer can reprogram the microcontroller, for example, the original AVRISP mkII - maybe, but I can’t vouch for its clones).
Let's bring the program to 1 command ( 2 bytes ):
	sbi		PINB, 0		; PINB = 0x01 или PORTB ^= 0x01

We flash, and we observe in the dark a faint flicker. Frequency 16000/2049/2 ≈ 4 Hz. For a microcontroller with a large amount of flash memory, this frequency will be correspondingly lower - up to such a blink.

UPD3 Let's
move on.
Can the AVR microcontroller signal my work without a program at all?
Of course! It is enough to program the fuse CKOUT, and then the CLKO pin (again PB0) will generate a clock signal, including the internal one, and if its frequency is reduced by a pre-selector, a delayed one will be output.
So we erase the crystal, do not write our program to 0 bytes , flash the fuses. But applying 16 kHz to an LED with a resistor makes little sense, although we notice that it was lit with half brightness.
However, besides the visual low-frequency Hello World, there is a high-frequency audio! This option, of course, does not correspond to our initial TK, but it completely signals the work of MK. We catch the piezoelectric element between the terminal B0 and the ground or the power bus, and we “enjoy” the nasty squeak.

Also popular now: