MicroCephalis May 29, 2019 at 10:28 a.m.

Masked bugs in embedd

From the sandbox

Plugs are inevitable when developing any software. In an embedd, their generous five cents can also throw up hardware problems, but this is a separate song. But purely programmed ambushes, when you get stuck in a seemingly empty place ... For me there are three types of them.

The easiest way is when the manual, standard, or, say, the procedure for configuring the library for iron is not fully understood. It is clear here: not all moves have been exhausted, patience and work, another five or two experiments, and it will come to life. Oscilloscope and scientific tyk to help.

Choosing a frequency divider to configure the CAN bus

It is worse when the problem is a typo or a mistake in the logic that you cannot see point blank until you walk through this place twenty times with your eyes and step-by-step debugging. Then it dawns, a sonorous blow to the forehead, a cry, “Well, you kind of a babai!”, Editing. Works.

And a gloomy third view: a glitch entrenched in a foreign library and crawling out at the junction with iron. Shakespearean passions gives rise to the steady light of a monitor. “Why, it cannot, the system cannot behave this way, because it can never! Well, really! A?!" Nope. Receive, sign.

In the end, the reality is ~~Shirshov~~ ~~shiree~~ wider than expected. A couple of examples:

History No. 1. MicroSD flash drive and DMA work

Anamnesis

You need to dump the data to a file on the SD card. Of course, I have neither time nor desire to write the file system and the SDIO driver myself, so I take the finished library. I set it up for iron, and everything works fine. At first. And then it turns out that the data was recorded wildly: the volumes are accurate, but in the files themselves, separate pairs of bytes are duplicated, then disappear, without any regularity. Not good!

Experiments begin. I am writing test data - everything is ok. I am writing combat - some kind of devilry. I change the size of the data buffers, the frequency of their flushing, data templates are useless. In the buffers themselves, everything is always excellent, the data in memory is everywhere what you need. And, nevertheless, glitches on a flash drive - here they are.

It took a couple of days to dig the dog.

Diagnosis

The problem was in the interaction of the library with DMA equipment .

SD cards have a peculiarity: they are written only in blocks of 512 bytes. To do this, the library buffers the data in a 512-byte array, and upon filling it flushes from there via DMA to flash. But!

If I transfer to the record a fragment larger than <512xN + empty space in the library buffer> bytes, then the library (obviously, so as not to push the memory back and forth) does this: it replenishes its buffer, writes it to flash , and the next 512xN bytes are thrown directly into my DMA from my buffer! Well, if something is left unfinished - it again copies to its own, until the next time.

And all would be fine, but the DMA controller requires that the data be placed in memory aligned on a 4-byte boundary. The library buffer is always so aligned, the language guarantees this. But with what address, after copying a part of the data, those remaining 512xN with a small byte begin with me - God knows. And the library does not check this at all: the address, as it is, is passed to the DMA controller.

“They sent something clumsy ... A dog with him.” The controller silently resets the lower 2 bits of the transmitted address. And starts the transfer.

The address, initially not a multiple of 4, is replaced by a multiple - voila, up to the last three bytes from the library buffer are re-written to the file from mine, and the same number of bytes from my buffer are lost without a trace. As a result, the total amount of data is correct, operations go smoothly, but the disk is nonsense.

Treatment

I had to add another buffer immediately before calling the hardware recording function. If the write address is not a multiple of 4, the data is first copied to it. At the same time, the average speed increased due to a reasonable choice of buffer size. Of course, it took memory, but what is 4 kilobytes for a good cause, when you have at your disposal - boundless 192!

History No. 2. Rantime and a bunch

Prologue

After the next change, the program began to fall, and somehow it fell very hard, throwing the processor into the Hard Fault handler . And he threw it there right after the start, even before the execution got to main (), that is, not a single line of my code had time to execute.

The first impression is "the beaver is dead, the chip is for replacement." And then the programmer gave the oak. But no, the old version of the firmware works stably, but the new one falls steadily in some obscure assembly depths between the launch and my code. I had no assumptions what kind of heresy this was.

Chapter 1

Helped the Internet to watch how to get at least some additional information. The procedure for parsing the consequences of a hard default was googled: state of registers, dump stack. Dopilil. Used it.

It turned out that it crashes due to an operation error on the bus. I decided that this was again unbalanced access - a problem of the same type as in the first story, but from a different perspective. But the most opposite is where the error occurred. And it arose inside the runtime library, that is, in the code, which, in theory, was licked like the cat’s bruises on a sunny day.

Continuation of the analysis showed that the glitch is a consequence of an attempt to initialize local static variables.

Lyrical digression

By the way, considering the disassembled code, I simultaneously found out the answer to a question that I sometimes asked myself, but was too lazy to google right away: how is the situation resolved when 2 or more threads can try to initialize such a variable at the same time. It turned out that in this case, the compiler arranges initialization with semaphores, guaranteeing that only one thread at a time will go through the whole procedure, and the rest will wait until the first one finishes. This behavior has been standardized since C ++ 11. Did you know? Me not.

Chapter 2

Once the runtime is engaged in the construction of variables, it is also for him to call destructors upon completion of the program (even if the program never actually completes the work, which is the absolute norm for microcontrollers). To do this, he needs somewhere to store information about all the variables that he managed to initialize.

That's right in the place where such information is stored in some kind of internal list, the runtime also fell. Because the malloc () function, through which memory was allocated for the elements of this list and which, according to the standard, produces blocks guaranteed to be aligned at least at the 8 byte boundary , after a n-th number of successful calls, it produces a piece that is not aligned at this boundary.

Changes in the new firmware code broke malloc ?! But how is this even possible? I didn’t exactly redefine malloc; I myself don’t need it anywhere else!

Useful in the compiler options, to search for some keywords, help, but it was clearly said everywhere: malloc () guarantees the output of memory aligned along the fundamental boundary. Or null pointer in case there is not enough memory .

Chapter 3

For a long time I stuck senselessly into the code, set breakpoints, suffered and did not understand anything, until at some point it didn’t poke and I looked at the addresses returned by malloc carefully. Prior to this, the whole analysis was to see if the last digit of the address is 0x4. And now he began to compare entirely between each other addresses issued by successive calls to malloc.

And oh, a miracle!

All successful calls issued addresses from RAM space (0x20000000 and older for this stone), sequentially increasing from call to call. And the first unsuccessful one returned 0x00000036. That is, the address is not only that it was not aligned, but also was not in the address space of the RAM at all! The processor tried to write something there and naturally fell.

And, surprisingly, even if malloc () acted according to the standard and returned 0 if there was not enough space, this would not have changed anything in the sense of a program crash (unless the cause of the bug would have been clarified earlier). The value returned by malloc is still not checked in any way, but immediately goes into action. This is in runtime.

Epilogue

Increased the heap size in the configuration file, and everything was fixed.

But before that moment I didn’t even think about its volume. Whether the hell surrendered to me, I thought. Anyway, I have all the variables and objects either static or on the stack. So, just by inertia, I left 0x300 bytes under it, since some volume under the heap is allocated in all template projects. But no, C ++ runtime still needs dynamically allocated memory, and in quite noticeable amounts, by the standards of controllers.

Live and learn.

Tags: