Firecore - a fun game on AVR

I want to share another evening long-term construction, which shows that you can make games even on weak hardware.

About what you had to do, how it was decided, and how to do something more than another Pong clone - welcome to Cat.

Caution: great article, traffic and multiple code inserts!

Briefly about the game

Shoot`em up! - now on AVR.

In fact, this is another shmap, so once again the main character ~~Shepard~~ must save the galaxy from a sudden attack by unknown people, making his way through space through the stars and fields of asteroids simultaneously clearing each star system.
The whole game is written in C and C ++ without using the Wire library from Arduino.

The game has 4 ships to choose from (the latter is available after passing), each with its own characteristics:

maneuverability;
strength;
gun power.

Also implemented:

2D color graphics;
power up for weapons;
bosses at the end of levels;
levels with asteroids (and their animation of rotation);
background color change at levels (and not just black space);
the movement of stars in the background at different speeds (for the effect of depth);
scoring and saving in EEPROM;
the same sounds (shots, explosions, etc.);
a sea of identical opponents.

Platform

The return of the ghost.

I’ll clarify in advance that this platform should be perceived as the old game console of the ~~first~~ third generation (80s, shiru8bit ).

Also, hardware modifications over the original hardware are prohibited, which ensures that the launch on any other identical board is straight out of the box.
This game was written for the Arduino Esplora board, but transferring to GBA or any other platform, I think, will not be difficult.
Nevertheless, even on this resource this board was covered only a couple of times, and other boards were not worth mentioning at all, despite the rather large community of each:

GameBuino META:
Pokitto;
makerBuino;
Arduboy;
UzeBox / FuzeBox;
and many others.

To begin with, what is not on Esplora:

a lot of memory (ROM 28kb, RAM 2.5kb);
power (8 bits CPU at 16 MHz);
DMA
character generator;
allocated memory areas or special registers. Destinations (palette, tiles, background, etc.);
control the brightness of the screen (oh, so many effects in the trash);
address space expanders (mappers);
debugger ( ~~but who needs it when there is a whole screen!~~ ).

I will continue with the fact that there is:

hardware SPI (can run at F_CPU / 2 speed);
screen based on ST7735 160x128 1.44 ";
a pinch of timers (only 4 pcs.);
a pinch of GPIO;
a handful of buttons (5pcs. + two-axis joystick);
few sensors (lighting, accelerometer, thermometer);
piezo buzzer ~~irritation~~ emitter.

Apparently there is almost nothing there. It is not surprising that no one wanted to do anything with her except the Pong clone and a couple of three games for all this time!
Perhaps the fact is that writing under the ATmega32u4 controller (and the like) is similar to programming for Intel 8051 (which is almost 40 years old at the time of publication), where you need to observe a huge number of conditions and resort to various tricks and tricks.

Peripheral processing

One for everything!

Having looked at the circuit, it was clearly visible that all peripherals are connected through the GPIO expander (74HC4067D multiplexer further MUX) and switched using the GPIO PF4, PF5, PF6, PF7 or the senior PORTF nibble, and the MUX output is read on GPIO - PF1.
It is very convenient to switch the input by simply assigning values to the PORTF port by mask and by no means forgetting the minor nibble:

uint16_t getAnalogMux(uint8_t chMux)
{
  MUX_PORTX = ((MUX_PORTX & 0x0F) | ((chMux<<4)&0xF0));
  return readADC();
}

Button click poll:

#define SW_BTN_MIN_LVL 800boolreadSwitchButton(uint8_t btn){
  bool state = true;
  if(getAnalogMux(btn) > SW_BTN_MIN_LVL) { // low state == pressed
    state = false;
  }
  return state;
}

The following are the values for port F:

#define SW_BTN_1_MUX   0#define SW_BTN_2_MUX   8#define SW_BTN_3_MUX   4#define SW_BTN_4_MUX   12

Adding a little more:

#define BUTTON_A   SW_BTN_4_MUX#define BUTTON_B   SW_BTN_1_MUX#define BUTTON_X   SW_BTN_2_MUX#define BUTTON_Y   SW_BTN_3_MUX#define buttonIsPressed(a)  readSwitchButton(a)

You can safely interview the right cross:

voidupdateBtnStates(void){
  if(buttonIsPressed(BUTTON_A))
    btnStates.aBtn = true;
  if(buttonIsPressed(BUTTON_B))
    btnStates.bBtn = true;
  if(buttonIsPressed(BUTTON_X))
    btnStates.xBtn = true;
  if(buttonIsPressed(BUTTON_Y))
    btnStates.yBtn = true;
}

Please note that the previous state is not reset, otherwise you can miss the fact of pressing the key (it also works as an additional protection against chatter).

Sfx

A buzzing bit.

What if there is no DAC, no chip from Yamaha, and there is only a 1-bit PWM rectangle for sound?
At first, it seems not so much, but, despite this, it uses a cunning PWM to recreate the technique of “PDM audio” and with it you can do this.

Something similar is provided by the library from Gamebuino and all that is needed is to transfer the popping generator to another GPIO and the timer to Esplora (timer4 and OCR4D output). For correct operation, timer1 is also used to generate interrupts and reload the OCR4D register with new data.

The Gamebuino engine uses sound patterns (as in tracker music), which saves a lot of space, but you need to do all the samples yourself, there are no libraries with ready-made ones.
It is worth mentioning that this engine is tied to an update period of about 1/50 sec or 20 frames / sec.

To read the sound patterns, after reading the Wiki on audio format, I sketched a simple GUI on Qt. It does not output sound in the same way, but gives an approximate idea of how the pattern will sound and allows you to load, save and edit it.

Graphics

Immortal Pixelart.

The display encodes colors in two bytes (RGB565), but since images in this format will take up a lot, all of them have been made indexed by the palette to save space, which I have already described more than once in my earlier articles.
Unlike Famicom / NES, there are no color limits for the image and there are more colors available in the palette.

Each image in the game is an array of bytes in which the following data is stored:

width height;
start data marker;
dictionary (if any, but more on that later);
payload;
end of data marker.

For example, such a picture (enlarged 10 times):

in the code it will look like this:

pic_t weaponLaserPic1[] PROGMEM = {
  0x0f,0x07,
  0x02,
  0x8f,0x32,0xa2,0x05,0x8f,0x06,0x22,0x41,0xad,0x03,0x41,0x22,0x8f,0x06,0xa2,0x05,
  0x8f,0x23,0xff,
};

Where without a ship in this genre? After hundreds of test sketches with a pixel difference, only these ships remained for the player:

It is noteworthy that the ships do not have a flame in tiles (here it is for clarity), it is applied separately to create an exhaust animation from the engine.

Don’t forget about the pilots of each ship at all:

The variation of the enemy’s ships is not too big, but I’ll remind you that there aren’t too many places, so here are three ships:

Without canonical bonuses in the form of improving weapons and restoring health, the player will not last long:

Of course, with growth the power of the guns changes the type of emitted shells:

As it was written at the beginning, the game has a level with asteroids, it comes after every second boss. It is interesting in that there are many moving and rotating objects of different sizes. In addition, when a player hits them, they partially collapse, becoming smaller in size.

Hint: Large asteroids earn more points.

To create this simple animation, 12 small images

are enough: They are divided into three for each size (large, medium and small) and for each rotation angle you need 4 more rotated 0, 90, 180 and 270 degrees. In the game, it is enough to replace the pointer to the array with the image at an equal interval thereby creating the illusion of rotation.

voidrotateAsteroid(asteroid_t &asteroid){
  if(RN & 1) {
    asteroid.sprite.pPic = getAsteroidPic(asteroid);
    ++asteroid.angle;
  }
}
voidmoveAsteroids(void){
  for(auto &asteroid : asteroids) {
    if(asteroid.onUse) {
      updateSprite(&asteroid.sprite);
      rotateAsteroid(asteroid);
...

This is done only because of the lack of hardware capabilities, and a software implementation like Affine transformation will take more than the images themselves and will be very slow.

A piece of satin for those who are interested.

You can notice part of the prototypes and what appears only in the credits after passing the game.

In addition to simple graphics, space glyphs and all glyphs up to 30 and after 127 bytes of ASCII were thrown out of the font to save space and add a retro effect.

Important!
Do not forget that const and constexpr on AVR does not mean at all that the data will be in the program memory, here for this you need to additionally use PROGMEM.
This is due to the fact that the AVR core is based on the Harvard architecture, so special access codes for the CPU are needed to access the data.

Squeezing the galaxy

The easiest way to pack is RLE.

Having studied the packed data, you can notice that the most significant bit in the payload byte in the range from 0x00 to 0x50 is not used. This allows you to add the data and the start marker for the start of the repeat (0x80), and the next byte to indicate the number of repetitions, which allows you to pack a series of 257 (+2 from the fact that RLE of two bytes is stupid) of identical bytes in only two.
Unpacker implementation and display:

voiddrawPico_RLE_P(uint8_t x, uint8_t y, pic_t *pPic){
  uint16_t repeatColor;
  uint8_t tmpInd, repeatTimes;
  alphaReplaceColorId = getAlphaReplaceColorId();
  auto tmpData = getPicSize(pPic, 0);
  tftSetAddrWindow(x, y, x+tmpData.u8Data1, y+tmpData.u8Data2);
  ++pPic; // make offset to picture datawhile((tmpInd = getPicByte(++pPic)) != PIC_DATA_END) { // get color index or repeat timesif(tmpInd & RLE_MARK) { // is it color index?
      tmpInd &= DATA_MARK; // get color index to repeat
      repeatTimes = getPicByte(++pPic)+1; // zero RLE does not exist!
    }
    ++repeatTimes;
    // get color from colorTable by color index
    repeatColor = palette_RAM[(tmpInd == ALPHA_COLOR_ID) ? alphaReplaceColorId : tmpInd];
    do {
      pushColorFast(repeatColor);
    } while(--repeatTimes);
  }
}

The main thing is not to display the image outside the screen, otherwise it will be garbage, since there is no border check here.
The test image is unpacked in ~ 39ms. at the same time, occupying 3040 bytes, while without compression it would take 11,200 bytes or 22,400 bytes without indexing.

Test image (enlarged 2 times):

Interlace can be seen in the image above, but on the screen it is smoothed out by hardware, creating an effect similar to CRT and at the same time significantly increasing the compression ratio.

RLE is not a panacea

We are treated for deja vu.

As you know, RLE goes well with LZ-like packers. WiKi came to the rescue with a list of compression methods. The impetus was the video from "GameHut" about the analysis of the impossible intro in Sonic 3D Blast.
Having studied many packers (LZ77, LZW, LZSS, LZO, RNC, etc.), I came to the conclusion that their unpackers:

require a lot of RAM for unpacked data (at least 64kb. and more);
bulky and slow (some need to build Huffman trees for each subunit);
have a low compression ratio with a small window (very stringent RAM requirements);
ambiguities with licensing.

After months of futile adaptations, it was decided to modify the existing packer.
By analogy with LZ-like packers, to achieve maximum compression, dictionary access was used, but at the byte level - the most frequently repeated pairs of bytes are replaced with one byte pointer in the dictionary.
But there is a catch, how to distinguish a byte of “how many repetitions” from a “dictionary marker”?
After a long sitting with a piece of paper and a magical game with bats, this appeared:

“Dictionary marker” is a RLE marker (0x80) + data byte (0x50) + position number in the dictionary;
limit the byte “how many repetitions” to the size of the dictionary marker - 1 (0xCF);
the dictionary cannot use the value 0xff (it is for the marker for the end of the image).

Applying all this, we get a fixed dictionary size: no more than 46 pairs of bytes and RLE reduction to 209 bytes. Obviously, not all images can be packaged like this, but they won’t become any more.
In both algorithms, the structure of the packed image will be as follows:

1 byte per width and height;
1 byte for the size of the dictionary, it is a marker pointer to the beginning of the packed data;
from 0 to 92 bytes of the dictionary;
1 to N bytes of packed data.

The resulting packer utility on D (pickoPacker) is enough to put in a folder with indexed * .png files and run from the terminal (or cmd). If you need help, run with the option “-h” or “--help”.
After the utility has worked, the output is * .h files, the contents of which are convenient to transfer to the desired location in the project (therefore there is no protection).

Before unpacking, preparing the screen, dictionary and reading the initial data:

voiddrawPico_DIC_P(uint8_t x, uint8_t y, pic_t *pPic){
  auto tmpData = getPicSize(pPic, 0);
  tftSetAddrWindow(x, y, x+tmpData.u8Data1, y+tmpData.u8Data2);
  uint8_t tmpByte, unfoldPos, dictMarker;
  alphaReplaceColorId = getAlphaReplaceColorId();
  auto pDict = &pPic[3];             // save dictionary pointer
  pPic += getPicByte(&pPic[2]);  // make offset to picture datado {
    unfoldPos = dictMarker = 0;
    do {
      if((tmpByte = getPicByte(++pPic)) != PIC_DATA_END) {
        if(tmpByte < DICT_MARK) {
          buf_packed[unfoldPos] = tmpByte;
        } else {
          dictMarker = 1;
          setPicWData(&buf_packed[unfoldPos]) = getPicWData(pDict, tmpByte);
          ++unfoldPos;
        }
        ++unfoldPos;
      } else {
        break;
      }
    } while((unfoldPos < MAX_UNFOLD_SIZE) //&& (unfoldPos)
            && ((tmpByte > DATA_MARK) || (tmpByte > MAX_DATA_LENGTH)));
    if(unfoldPos) {
      buf_packed[unfoldPos] = PIC_DATA_END; // mark end of chunk
      printBuf_RLE( dictMarker ? unpackBuf_DIC(pDict) : &buf_packed[0] ); // V2V3 decoder
    }
  } while(unfoldPos);
}

A read piece of data can be packed in a dictionary, so we check and unpack it:

inline uint8_t findPackedMark(uint8_t *ptr){
  do {
    if(*ptr >= DICT_MARK) {
       return1;
    }
  } while(*(++ptr) != PIC_DATA_END);
  return0;
}
inline uint8_t *unpackBuf_DIC(constuint8_t *pDict){
  bool swap = false;
  bool dictMarker = true;
  auto getBufferPtr = [&](uint8_t a[], uint8_t b[]) {
    return swap ? &a[0] : &b[0];
  };
  auto ptrP = getBufferPtr(buf_unpacked, buf_packed);
  auto ptrU = getBufferPtr(buf_packed, buf_unpacked);
  while(dictMarker) {
    if(*ptrP >= DICT_MARK) {
      setPicWData(ptrU) = getPicWData(pDict, *ptrP);
      ++ptrU;
    } else {
      *ptrU = *ptrP;
    }
    ++ptrU;
    ++ptrP;
    if(*ptrP == PIC_DATA_END) {
      *ptrU = *ptrP; // mark end of chunk
      swap = !swap;
      ptrP = getBufferPtr(buf_unpacked, buf_packed);
      ptrU = getBufferPtr(buf_packed, buf_unpacked);
      dictMarker = findPackedMark(ptrP);
    }
  }
  return getBufferPtr(buf_unpacked, buf_packed);
}

Now from the received buffer we unpack RLE in a familiar way and display it on the screen:

inlinevoidprintBuf_RLE(uint8_t *pData){
  uint16_t repeatColor;
  uint8_t repeatTimes, tmpByte;
  while((tmpByte = *pData) != PIC_DATA_END) { // get color index or repeat timesif(tmpByte & RLE_MARK) { // is it RLE byte?
      tmpByte &= DATA_MARK; // get color index to repeat
      repeatTimes = *(++pData)+1; // zero RLE does not exist!
    }
    ++repeatTimes;
    ++pData;
    // get color from colorTable by color index
    repeatColor = palette_RAM[(tmpByte == ALPHA_COLOR_ID) ? alphaReplaceColorId : tmpByte];
    do {
      pushColorFast(repeatColor);
    } while(--repeatTimes);
  }
}

Surprisingly, replacing the algorithm did not greatly affect the unpacking time and is ~ 47ms. This is almost 8ms. longer, but the test image takes only 1650 bytes!

Until the last measure

Almost everything can be done faster!

Despite the presence of hardware SPI, the AVR core delivers a sea of headaches when using it.
It has long been known that SPI on AVR, in addition to running at F_CPU / 2 speed, also has a data register of only 1 byte (it is not possible to load 2 bytes at once).
Moreover, almost all the SPI code on AVR that I met works according to this scheme:

Download SPDR data
interrogate the SPIF bit in the SPSR in a loop.

As you can see, the continuous supply of data, as is done on the STM32, does not smell here. But, even here you can speed up the output of both unpackers by ~ 3ms!

By opening the datasheet and looking at the “Instruction set clocks” section, you can calculate the CPU costs when transmitting a byte via SPI:

1 cycle for register loading with new data;
2 beats per bit (or 16 beats per byte);
1 bar per clock line magic (a bit later about “NOP”);
1 clock cycle for checking the status bit in SPSR (or 2 clock cycles for branching);

In total, to transmit one pixel (two bytes), 38 clock cycles or ~ 425600 clock cycles for the test image (11,200 bytes) should be spent.
Knowing that F_CPU == 16 MHz we get ~~0.0000000625~~ 62.5 nanoseconds per clock cycle ( Process0169 ), multiplying the values, we get ~ 26 milliseconds. The question arises: “From whence did I write earlier that the unpacking time is 39ms. and 47ms. "? Everything is simple - unpacker logic + interrupt handling.

Here is an example of output with interruptions:

and without interruptions:

The graphs show that the time between setting the address window in the VRAM screen and the beginning of data transfer in the version without interruptions is less and there are almost no gaps between bytes during transmission (the graph is uniform).
Unfortunately, you cannot disable interrupts for each image output, otherwise the sound and core of the whole game will break (more on that later).

It was written above about a certain “magic NOP” for a clock line. The fact is that to stabilize the CLK and set the SPIF flag, exactly 1 clock cycle is needed and by the time this flag is read, it is already set, which avoids branching into 2 clock cycles on the BREQ instruction.
Here is an example without NOP:

and with it:

The difference seems insignificant, just a few microseconds, but if you take a different scale:
Without NOP it’s big:

and with it it’s also big:

the difference becomes much more noticeable, reaching ~ 4.3ms.

Now let's do the following dirty trick:

We swap the order of loading and reading the registers and you can not wait on every second byte of the SPIF flag, but check it only before loading the first byte of the next pixel.

We apply knowledge and deploy the function "pushColorFast (repeatColor);":

#define SPDR_TX_WAIT(a)  asm volatile(a); while((SPSR & (1<<SPIF)) == 0);typedefunion {
  uint16_t val;
  struct {uint8_t lsb;
    uint8_t msb;
  };
} SPDR_t;
...    
    do {
#ifdef ESPLORA_OPTIMIZE
      SPDR_t in = {.val = repeatColor};
      SPDR_TX_WAIT("");
      SPDR = in.msb;
      SPDR_TX_WAIT("nop"); 
      SPDR = in.lsb;
#else
      pushColorFast(repeatColor);
#endif
    } while(--repeatTimes);
  }
#ifdef ESPLORA_OPTIMIZE 
  SPDR_TX_WAIT("");  // dummy wait to stable SPI#endif
}

Despite the interruption from the timer, using the trick above gives a gain of almost 6ms .:

This is how simple knowledge of iron allows you to squeeze a little more out of it and output something like this:

Colosseum collisions

The battle of the boxes.

To begin with, the whole set of objects (ships, shells, asteroids, bonuses) are structures (sprites) with parameters:

current X, Y coordinates;
new coordinates X, Y;
pointer to the image.

Since the image stores the width and height, there is no need to duplicate these parameters; moreover, such an organization simplifies the logic in many aspects.

The calculation itself is made simple to the banal - based on the intersection of the rectangles. Although it is not accurate enough and does not calculate future conflicts, this is more than enough.
The verification takes place alternately on the X and Y axes. Due to this, the absence of intersection on the X axis reduces the calculation of the collision.

First, the right side of the first rectangle with the left side of the second rectangle is checked for the common part of the X axis. If successful, a similar check is performed for the left side of the first and right side of the second rectangle.
After successful detection of intersections along the X axis, a check is carried out in the same way for the upper and lower sides of the rectangles along the Y axis.

The above looks much easier than it seems:

boolcheckSpriteCollision(sprite_t *pSprOne, sprite_t *pSprTwo){
  auto tmpDataOne = getPicSize(pSprOne->pPic, 0);
  auto tmpDataTwo = getPicSize(pSprTwo->pPic, 0);
  /* ----------- Check X position ----------- */uint8_t objOnePosEndX = (pSprOne->pos.Old.x + tmpDataOne.u8Data1);
  if(objOnePosEndX >= pSprTwo->pos.Old.x) {
    uint8_t objTwoPosEndX = (pSprTwo->pos.Old.x + tmpDataTwo.u8Data1);
    if(pSprOne->pos.Old.x >= objTwoPosEndX) {
      returnfalse; // nope, different X positions
    }
    // ok, objects on same X lines; Go next...
  } else {
    returnfalse; // nope, absolutelly different X positions
  }
  /* ---------------------------------------- *//* ----------- Check Y position ----------- */uint8_t objOnePosEndY = (pSprOne->pos.Old.y + tmpDataOne.u8Data2);
  if(objOnePosEndY >= pSprTwo->pos.Old.y) {
    uint8_t objTwoPosEndY = (pSprTwo->pos.Old.y + tmpDataTwo.u8Data2);
    if(pSprOne->pos.Old.y <= objTwoPosEndY) {
      // ok, objects on same Y lines; Go next...// yep, if we are here// then, part of one object collide wthith another objectreturntrue;
    } else {
      returnfalse; // nope, different Y positions
    }
  } else {
    returnfalse; // nope, absolutelly different Y positions
  }
}

It remains to add this to the game:

voidcheckInVadersCollision(void){
  decltype(aliens[0].weapon.ray) gopher;
  for(auto &alien : aliens) {
    if(alien.alive) {
      if(checkSpriteCollision(&ship.sprite, &alien.sprite)) {
        gopher.sprite.pos.Old = alien.sprite.pos.Old;
        rocketEpxlosion(&gopher); // now make gopher to explode \(^_^)/
        removeSprite(&alien.sprite);
        alien.alive = false;
        score -= SCORE_PENALTY;
        if(score < 0) score = 0;
      }
    }
  }
}

Bezier curve

Space rails.

As in any other game with this genre, enemy ships must move along curves.
It was decided to implement quadratic curves as the simplest for the controller and this task. Three points are enough for them: the initial (P0), final (P2) and imaginary (P1). The first two specify the beginning and end of the line, the last point describes the type of curvature.
Great article on curves.
Since this is a parametric Bezier curve, it also needs another parameter - the number of intermediate points between the start and end points.

So we get this structure:

typedefstruct {// 7 bytesposition_t P0;
  position_t P1;
  position_t P2;
  uint8_t totalSteps;
} bezier_t;

In it, position_t is a structure of two bytes of coordinates X and Y.
Finding a point for each coordinate is calculated by the following formula (thx Wiki):
B = ((1.0 - t) ^ 2) P0 + 2t (1.0 - t) P1 + (t ^ 2) P2,
t [> = 0 && <= 1]

For a long time, its implementation was solved head-on without a fixed point math:

...
float t = ((float)pItemLine->step)/((float)pLine->totalSteps);
pPos->x = (1.0 - t)*(1.0 - t)*pLine->P0.x + 2*t*(1.0 - t)*pLine->P1.x + t*t*pLine->P2.x;
pPos->y = (1.0 - t)*(1.0 - t)*pLine->P0.y + 2*t*(1.0 - t)*pLine->P1.y + t*t*pLine->P2.y;
...

Of course, this cannot be left. After all, getting rid of the float could not only give an improvement in speed, but also free up the ROM, so the following implementations were found:

avrfix;
stdfix;
libfixmath;
fixedptc.

The first remains a dark horse, as it is a compiled library and did not want to mess with the disassembler.

The second candidate from the GCC bundle also did not work out, since the avr-gcc used was not patched and the "short _Accum" type remained unavailable.

The third option, despite the fact that it has a large number of mat. functions, it has hard-coded bit operations on specific bits to the Q16.16 format, which makes it impossible to control the values of Q and I.

The latter can be considered a simplified version from “fixedmath”, but the main advantage is the ability to control not only the size of the variable, which is 32 bits by default with Q24.8 format, but also with Q and I.

Test results at different settings:

Type of	IQ	Additional flags	ROM byte	Tms. *
float	-	-	4236	35
fixedmath	16.16	-	4796	119
fixedmath	16.16	FIXMATH_NO_OVERFLOW	4664	89
fixedmath	16.16	FIXMATH_OPTIMIZE_8BIT	5036	92
fixedmath	16.16	_NO_OVERFLOW + _8BIT	4916	89
fixedptc	24.8	FIXEDPT_BITS 32	4420	64
fixedptc	9.7	FIXEDPT_BITS 16	3490	31

* The check was carried out on the pattern: "195,175,145,110,170,70,170" and the key "-Os".

It can be seen from the table that both libraries took up more ROM and showed themselves worse than the compiled code from GCC when using float.
It is also seen that a small revision for the Q9.7 format and a decrease in the variable to 16bit gave an acceleration of 4ms. and freeing ROM at ~ 50 bytes.

The expected effect was a decrease in accuracy and an increase in the number of errors:

which in this case is uncritical.

Allocating resources

Work on Tuesday and Thursday is only an hour.

In most cases, all calculations are performed every frame, which is not always justified, since there may not be enough time in the frame to shorten something and you will have to trick with alternating, counting frames or skipping them. Therefore, I went further - completely abandoned the personnel tie.

Having broken everything into small tasks, be it: calculating collisions, processing sound, buttons and displaying graphics, it is enough to perform them at a certain interval, and the inertia of the eye and the ability to update only part of the screen will do the trick.

We manage all of this not once with the OS, but with the state machine that I created a couple of years ago, or, more simply, not the crowding-out tinySM task manager.

I will repeat the reasons for using it instead of any of the RTOS:

lower ROM requirements (~ 250 bytes core);
lower RAM requirements (~ 9 bytes per task);
simple and clear working principle;
determinism of behavior;
less CPU time is wasted;
leaves access to iron;
platform independent;
written in C and easy to wrap in C ++;
~~needed my own bike.~~

As I once described, tasks for it are organized into an array of pointers to structures, where a pointer to a function and its call interval are stored. Such a grouping simplifies the description of the game in separate stages, which also allows you to reduce the number of branches and dynamically switch the set of tasks.
For example, during the start screen, 7 tasks are performed, and during the game there are already 20 tasks (all tasks are described in the gameTasks.c file).

First you need to define some macros for your convenience:

#define T(a) a##Task#define TASK_N(a)     const taskParams_t T(a)#define TASK(a,b)     TASK_N(a) PROGMEM = {.pFunc=a, .timeOut=b}#define TASK_P(a)     (taskParams_t*)&T(a)#define TASK_ARR_N(a) const tasksArr_t a##TasksArr[]#define TASK_ARR(a)   TASK_ARR_N(a) PROGMEM#define TASK_END      NULL

The task declaration is actually creating a structure, initializing its fields and placing it in ROM:

TASK(updateBtnStates, 25);

Each such structure occupies 4 bytes of ROM (two per pointer and two per interval).
A nice bonus to macros is that it’s not possible to create more than one unique structure for each function.
Having declared the necessary tasks, we add them to the array and also put them in ROM:

TASK_ARR( game ) = {
  TASK_P(updateBtnStates),
  TASK_P(playMusic),
  TASK_P(drawStars),
  TASK_P(moveShip),
  TASK_P(drawShip),
  TASK_P(checkFireButton),
  TASK_P(pauseMenu),
  TASK_P(drawPlayerWeapon),
  TASK_P(checkShipHealth),
  TASK_P(drawSomeGUI),
  TASK_P(checkInVaders),
  TASK_P(drawInVaders),
  TASK_P(moveInVaders),
  TASK_P(checkInVadersRespawn),
  TASK_P(checkInVadersRay),
  TASK_P(checkInVadersCollision),
  TASK_P(dropWeaponGift),
  TASK_END
};

When setting the USE_DYNAMIC_MEM flag to 0 for static memory, the main thing to remember is to initialize the pointers to the task store in RAM and set the maximum number of them that will be executed:

...
tasksContainer_t tasksContainer;
taskFunc_t tasksArr[MAX_GAME_TASKS];
...
initTasksArr(&tasksContainer, &tasksArr[0], MAX_GAME_TASKS);
…

Setting tasks for execution:

...
addTasksArray_P(gameTasksArr);
…

Overflow protection is controlled by the USE_MEM_PANIC flag, if you are sure about the number of tasks, you can disable it to save ROM.

It remains only to run the handler:

...
runTasks();
...

Inside is an infinite loop that contains the basic logic. Once inside it, the stack is also restored thanks to "__attribute__ ((noreturn))".
In the loop, the elements of the array are alternately scanned for the need to call the task after the interval has expired.
The countdown of the intervals was made on the basis of timer0 as a system with a quantum of 1ms ...

Despite the successful distribution of tasks in time, they were sometimes superimposed (jitter), which caused short-term fading of everything and everything in the game.
It definitely had to be decided, but how? About how everything was profiled the next time, but for now try to find the Easter egg in the source.

the end

So, using a lot of tricks (and many more of which I have not described), everything turned out to fit in 24kb ROM and 1500 bytes of RAM. If you have any questions, I will be glad to answer them.

For those who did not find or did not look for an Easter egg:

dig to the side:

voidinvadersMagicRespawn(void){
  for(auto &alien : aliens) {
    if(!alien.alive) {
      alien.respawnTime = 1;
    }
  }
}

Nothing remarkable, right?

Raaaaazvorachivaem macro invadersMagicRespawn:

voidaction(){
  tftSetTextSize(1);
  for(;;) {
    tftSetCP437(RN & 1);
    tftSetTextColorBG((((RN % 192 + 64) & 0xFC) << 3), COLOR_BLACK);
    tftDrawCharInt(((RN % 26) * 6), ((RN & 15) * 8), (RN % 255));
    tftPrintAt_P(32, 58, (constchar *)creditP0);
  }
} a(void)
{
  for(auto &alien : aliens) {
    if(!alien.alive) {
      alien.respawnTime = 1;
    }
  }
}

Получаем что «а(void)» не более чем пустышка, а «action()» вызывается только после входа в режим паузы больше чем 10 раз, счетчик скрыт в «disablePause();». Программа остается в бесконечном цикле и выводит случайные символы в случайном месте в стиле «Matrix Falling code» с текстом по центру. Вот такая простая обсурфикация занимающая всего 130 байт ROM.

To build and run, just put the folder (or link) "esploraAPI" in "/ arduino / libraries /".

References:

PS You can see and hear how it all looks a bit later when I make an acceptable video.

Only registered users can participate in the survey. Please come in.

Have you been looking for an easter egg?

Tags: