Development of the game in 115 kb - hacks, bugs and annoyance


In early November, I participated in the 115th account of the Independent Games Developers Contests (IGDC) community contest, the theme of which was the development of an arcade shooter with a limit of 115 kilobytes per week. Under the cut, the history of the development of the game on OpenGL + Free Pascal, experiments with LZO, bypassing the bugs of the FPC compiler for uFMOD, the simplest generation of textures and an annoying bug on NVidia video cards, which spoiled everything.

Video, binary for Windows and source code are also attached - look at the end of the article.

Lyrical introduction


Game development is my main, favorite and very old hobby. I rotate in amateur gamedev without much success, high-profile releases and titanic protracted about 10 years. A lot of frozen projects brought to mind are few. At some point I despaired that nothing came of it. And then I realized that I like not only the final result, but also the game development process itself. From this moment on, life has become calmer, but I still do not let go of the thought that someday I will overpower myself and bring some project to a commercial level.

At one point, I came across an IGDC community ,where short (from a couple of days up to 3 weeks) competitions on game development on a given topic are held. Very, you know, warm and lamp contests, in which the main thing is participation, not a prize. The experience and pleasure from the work done, not marketing and monetization.

Nowadays, when the threshold for entry into gamedev has decreased by several times, and the flow of frankly unsuccessful games floods mobile platforms, the authors of which dream of earning a comfortable life with another Flappy Bird ... At such a time, it is difficult to find people involved in game development for fun, not profit.

Of course, there is the mighty Ludum Dare , but his hardcore terms still go against my family life.

Start


So, on November 7, 2014, the next contest is announced on the community site. Conditions:

  • “Pysch-pyshch” and enemies - these are the words the moderator described the arcade shooter
  • Size - strictly up to 115 kilobytes, because the competition - the 115th in a row
  • Deadline - a week

Permanent mandatory conditions are added to these conditions: it should work offline and without installing third-party packages and redistributable. All that is required to start the game and is not bundled with the system should be supplied with the release and fit into those same 115 kilobytes.

The restriction is indecently large for true demoscene (4k, 32k ...), but sufficient to properly "pervert". A cursory analysis of development tools gives a rather impressive list of what fits into these requirements:

  • Flash
  • html5 + js
  • C, C ++, C #
  • Delphi, FreePascal
  • ...

Unity, the Cry Engine, the Unreal Engine, the JVM-based languages ​​(jre preinstalled required), as well as most of the game designers are overboard.

About Flash
Despite the fact that Flash requires the installed Adobe Flash Player, it is still allowed for use, as an exception (it is believed that Adobe Flash Player is still in the majority). It happened historically.

During my life I managed to try many languages ​​and technologies, and most of the list above is familiar to me firsthand. But your humble servant chose not the easiest option - Free Pascal. Why not the easiest?

Firstly, in 2014 Free Pascal (like Delphi) is considered unfashionable - as a result, the FPC compiler has few users and quite a few bugs, despite Open Source and cross-platform. Secondly, the size of the compiled exe for the Lazarus IDE + FPC bundle is a reason for a separate page in the wiki . Thirdly, there is very little syntactic sugar, which is acutely felt when constantly using many other languages ​​and technologies.

Of course, there are pluses:

  • Properly prepared exe is self-sufficient, while comparable to exe from C / C ++ with static linking CRT
  • With the default settings, it does not allow you to shoot yourself in the foot, as in C / C ++
  • With the right settings, it shoots both legs completely (which sometimes you want)
  • Quite by chance , I already have a mini-framework on Free Pascal + OpenGL

And I already did a free remake of it on Lunar Lander


Go!



First of all, I decided on the concept of the future game - an arcade 2D-shooter with a top view, in which the player controls the "tank" and shoots crowds of enemies in every way, from which bonuses fall down for even more fun shooting of enemies. Bacchnalia continues until death separates you from your alter-ego. The closest analogue is Crimsonland .

Having familiarized the template with a familiar movement, I created the first problem for myself - the compiled exe with my framework, taking into account all the cunning options of the compiler, took 120 kilobytes. Of course, given the fact that the framework is able (and keeping in mind that this is still FPC), this is even an achievement. But it does not suit us at all, so ruthlessly we cut the exe using UPX - 48 kilobytes. It is quite possible to work with this.

Of course, here you could still win a few kilobytes, if you cut the functionality of the framework to the most necessary. I refused this due to lack of time, as it turned out later - for good reason. As a result, the limit of 115 kilobytes was enough for me.

Implement LZO


A rare game can do without displaying text or numeric information. Although, initially it was in my mind that the idea to create such a game was vital, but it was not possible to come up with an interesting implementation.

Therefore, the task - to display text on the screen by OpenGL. Without resorting to the archaic method of outputting vector text, you should use bitmap fonts. My framework already had support for text output using pre-generated and carefully "baked" bitmap fonts. Problem solved?

Briefly about implementation
Есть собственная велосипедная утилита, которая пакует нужные символы относительно компактно, а затем «запекает» в bmp-файл, в конец которому безжалостно дописывается служебная информация о метрике символов (координаты, размер, оригинальный размер, etc). Любой графический редактор не видит подвоха и вполне корректно открывает файл. Еще бы научить эти самые редакторы не перезаписывать весь файл целиком при сохранении, и можно было бы накладывать на такой шрифт пост-эффекты…


No, the task is complicated. The resulting file with Russian and Latin letters (plus special characters and numbers) took 135 kilobytes. We remove Russian characters, reduce the physical size of the font itself, the image is halved in one dimension and, accordingly, double in size - 67 kilobytes. But this is still no good, since in total with the “empty” project it gives exactly 115 kilobytes.

Now I realize that the most correct and simple step would be to simply take and generate the font right at launch, from the system font, the benefit of the “copy-paste” code is simple. Moreover, in my previous framework, this is how fonts were generated - in runtime from system fonts or otf / ttf files.

But the soul wanted romance, and the fifth point - torment. And I remembered that fellow XProgerback in 2010, he committed an act of violence against the MiniLZO library, jerking its dump and wrapping it in simple asm instructions. It looks like this, in the case of extraction:

functionlzo_decompress(const CData; CSize: LongInt; var Data; var Size: LongInt): LongInt; cdecl;
asm
  DB $51
  DD $458B5653,$C558B08,$F08BD003,$33FC5589,$144D8BD2,$68A1189,$3C10558B,$331C7611,$83C88AC9
  DD $8346EFC1,$820F04F9,$1C9,$8846068A,$75494202,$3366EBF7,$460E8AC9,$F10F983,$8D83,$75C98500,$8107EB18
  DD $FFC1,$3E804600,$33F47400,$83068AC0,$C8030FC0,$83068B46,$28904C6,$4904C283,$F9832F74,$8B217204,$83028906
  DD $C68304C2,$4E98304,$7304F983,$76C985EE,$46068A14,$49420288,$9EBF775,$8846068A,$75494202,$8AC933F7
  DD $F983460E,$C12B7310,$828D02E9,$FFFFF7FF,$C933C12B,$C1460E8A,$C12B02E1,$8840088A,$88A420A,$420A8840
  DD $288008A,$113E942,$F9830000,$8B207240,$FF428DD9,$8302EBC1,$C32B07E3,$1E8ADB33,$3E3C146,$2B05E9C1
  DD $D9E949C3,$83000000,$2F7220F9,$851FE183,$EB1875C9,$FFC18107,$46000000,$74003E80,$8AC033F4,$1FC08306
  DD $F46C803,$FBC11EB7,$FF428D02,$C683C32B,$8369EB02,$457210F9,$D98BC28B,$C108E383,$C32B0BE3,$8507E183
  DD $EB1875C9,$FFC18107,$46000000,$74003E80,$8ADB33F4,$7C3831E,$F46CB03,$FBC11EB7,$83C32B02,$D03B02C6
  DD $9A840F,$2D0000,$EB000040,$2E9C11F,$2BFF428D,$8AC933C1,$E1C1460E,$8AC12B02,$A884008,$88008A42
  DD $51EB4202,$7206F983,$2BDA8B37,$4FB83D8,$188B2E7C,$8904C083,$4C2831A,$8B02E983,$831A8918,$C08304C2
  DD $4E98304,$7304F983,$76C985EE,$40188A20,$49421A88,$15EBF775,$8840188A,$188A421A,$421A8840,$8840188A
  DD $7549421A,$8AC933F7,$E183FE4E,$FC98503,$FFFE4284,$46068AFF,$49420288,$C933F775,$E9460E8A,$FFFFFECA
  DD $8B10552B,$10891445,$75FC753B,$EBC03304,$FFF8B80D,$753BFFFF,$830372FC,$5B5E04C0,$90C35D59
end;

... and a similar witch for proper compression. This works fine (although I didn’t manage to start it right away), but I wouldn’t recommend this for a production code. There is a slight inconvenience when debugging ...

After compressing the font, we get 17 kilobytes instead of 67. And we could have 2-3 kilobytes, if I had just implemented the generation on the fly. .


Use uFMOD for audio output.


No one wants to play games without sounds or, at least, music. Before this contest, I had experience with the bass library, however, I had to leave it behind - the necessary dll was eaten up as much as 97 kilobytes. In the reporting theme of the competition, uFMOD was mentioned - a miniature library for outputting xm-music written in assembler. Looking ahead, I will say that its implementation in the project in the end had virtually no effect on the size of the exe file.

But there was one little nuance. This library did not work on more or less modern versions of the FPC compiler (above 2.2.x). And the problem lies in ambiguous behavior.linker. I doubt that I can describe as accurately as possible the technical aspects of this problem - in other words, why the external functions declared in the header file are not visible for the object file that is immediately attached. This behavior remains on the conscience of compiler developers. I will give an example of a “bypass” of this behavior for one of the functions.

It was like this:

functionwaveOutClose(hwo:Pointer):LongWord; stdcall; external'winmm.dll';

And I had to wrap it like this:

functionmy_waveOutClose(hwo:Pointer):LongWord; stdcall; external'winmm.dll'name'waveOutClose';
function _waveOutClose(hwo:Pointer):LongWord; stdcall; publicname'waveOutClose';
begin
  Result := my_waveOutClose(hwo);
end;

And so for a couple of dozen functions required by the library. I will clarify that I took the implementation of uFMOD through winmm, as the most lightweight and easy to use. Of the minuses of simplicity - it allowed to play only one stream at a time. Thus, music appeared in my game, but I had to give up the sounds.

Actually, he took the xm-track right here , after which the utility from uFMOD turned it into such a pas-file, which saved me a bit more space.

Texture generation


To paraphrase a classic - which demoscene project will do without generating textures? Initially, I wanted to omit this topic - it was painfully simple and I made the generation of a whole (!) Texture “in the forehead”. But, perhaps, to someone from beginners the similar approach is useful.

Almost all sprites in the game use the same texture:



Just a striped texture with a “border” around for greater aesthetics. Taking into account color tint (that is, “multiplying” the desired color on such a texture) we get striped textures of any color.

I do not like the generation code at all, it is clearly possible to write it more optimally. Moreover, there is a persistent feeling that the border is drawn incorrectly.

functionTGame.GenerateTexture(aWidth, aHeight, aBorderSize: Integer): TglrTexture;
var
  m, m_origin: PByte;
  i, j: Integer;
  value: Byte;
begin
  m := GetMemory(aWidth * aHeight * 3);
  m_origin := m;
  for j := 0to aHeight - 1dofor i := 0to aWidth - 1dobeginif (i < aBorderSize) or (j < aBorderSize)
        or (i > aWidth - aBorderSize - 1) or (j > aHeight - aBorderSize - 1) then
        value := 196elseif ((i + j) mod16) >= 8then
          value := 255else
          value := 196;
      m^ := value; m+=1;
      m^ := value; m+=1;
      m^ := value; m+=1;
    end;
  Result := TglrTexture.Create(m_origin, aWidth, aHeight, tfRGB8);
end;

For a change, the shade of red in enemies varies slightly with random ().

Annoying bug on NVidia


After submitting the work for the competition, the presenter collects an archive of the submitted works and puts it on public display. The task of the participants is to arrange each other places, except for themselves. The evaluation is given for three days, so you can relax after emergency development. But it was not there!

But one by one the contestants unsubscribe that my game works incorrectly for them - the player’s “tank” is not visible in principle, many enemy tanks are not visible, sometimes they flash. Enemies can only be determined by the smoke from the exhaust pipe. Such problems are manifested in all contestants on Nvidia video cards. Moreover, one of the contestants has a configuration identical to mine, but my bug does not manifest at all. Someone helped launch in compatibility mode with Windows 95 (!), But there were very few.

I laid out different builds (which is slightly contrary to the rules of the contest), clearing all suspicious places in the code, advised different settings, but it was all in vain. Finally, one of the participants, the most meticulous, for which many thanks to him (hello, pelmenka!), Discovered the reason - if you turn off threading optimization in the Nvidia control panel, the game works correctly. The disappointment is that I have turned off this setting since time immemorial, when it became the cause of BSOD in some games.

Thanks to this valuable information, I was able to reproduce the bug in my room and start fixing it, although I already knew in my heart that most of the contestants had already voted, and in general, I’m angry Pinocchio to myself that I didn’t check the game on the standard Nvidia control panel settings .

I will not long describe the process of finding solutions to problems. I can only say that in the search process, I realized that:

  • Stream optimization brings more problems than benefits. A simple search for “nvidia threading optimization” provides links to game forums where it is strongly advised to disable this setting in order to avoid “blinking” of games.
  • There is no specification that explains what this optimization is doing at the driver level (or is there?)
  • A number of topics from indie developers who complained about bugs when streaming optimization is enabled. No answers except “just turn it off, mate”

Finally, my attention was attracted by the topic in which they complained about the incorrect operation of the function glBufferSubData with the enabled Stuck, Which Cannot Be Called. This gave me a clue, and after a while (debugs, checks) I isolated the essence of the problem:

The glBufferSubData function updates the data in the vertex buffer. The main purpose of this function is to update the “piece” of the buffer, but it should also be used when you need to update the entire buffer without reallocating the memory (my case).

When streaming optimization is enabled, the NVidia driver, following only the slave features, sometimes (always?) Places the calls to this function in a separate thread and immediately returns control, which leads to a disastrous result. How much data will have time to “fill” in the buffer before drawing it is unknown. And Nvidia’s OpenGL driver doesn’t see a problem. Ahead of your question, I will answer: no, the size of the transmitted data is negligible, a couple of kilobytes (especially compared to the bus bandwidth), so the matter is not in fatty data.

The OpenGL specification does not tell us that this function can be run in a separate thread. Personal initiative from the guys from Nvidia?

In the case of a single use of glBufferSubData per frame, you will not notice anything, because the driver will "cache" the drawing of this buffer and call it, waiting for all the operations on this buffer to end (probably).

And in my case one buffer is used several times per frame. I.e:

glBindBuffer(GL_ARRAY_BUFFER, BufferId);
glBufferSubData(GL_ARRAY_BUFFER, 0, Size, Data);
glDrawElements(...);
glBindBuffer(GL_ARRAY_BUFFER, 0);
...
glBindBuffer(GL_ARRAY_BUFFER, BufferId);
glBufferSubData(GL_ARRAY_BUFFER, 0, OtherSize, OtherData);
glDrawElements(...);
glBindBuffer(GL_ARRAY_BUFFER, 0);

And here you will feel stream optimization in all its glory. What will be displayed on the screen determines the blind case.

The glFinish () call helps, although this is a so-so solution. Personally, I slightly changed the logic of updating data in the buffer to avoid such delicate situations.

Q: Why not merge data and draw at once?
A: Between these calls there is a rendering of other elements, you cannot change the order of drawing.

Q: Why not use different buffers?
A: Buffer data is updated every frame. Keeping multiple buffers is a waste of resources.

Total


In the end, I got the fourth place in the competition, although I could easily compete for the second or third. It was not the fact that I flew past the “troika” of leaders, but rather the realization that I had “ruined” a fit game and 7 days of my work with one little nuisance, which caused disappointment.

Short video gameplay:



The total release size was almost 80 kilobytes. They include:

  • 62.0 kb - exe
  • 17.3 kb - font
  • 0.52 kb - shaders

- Download release (all necessary sources are attached).
- Sources on github

Also popular now: