Memory Profiling on STM32 and Other Microcontrollers: Static Stack Size Analysis

    Hello, Habr!

    In the last article , I mentioned it myself and asked in the comments - ok, well, with the help of scientific poking, we chose the size of the stack, it seems that nothing is falling, but can we somehow more reliably evaluate what it is equal to and who ate so much?

    We answer briefly: yes, but no.

    No, using the methods of static analysis it is impossible to accurately measure the size of the stack needed by the program - but, nevertheless, these methods can be useful.

    The answer is a little longer - under the cut.

    As is widely known to a narrow circle of people, the place on the stack is allocated, in fact, for local variables that the function currently executing uses - with the exception of variables with the static modifier, which are stored in statically allocated memory, in the bss area, because they must save its meanings between function calls.

    When the function is executed, the compiler adds space on the stack for the variables it needs, and upon completion, it frees up this space back. It would seem that everything is simple, but - and it is very bold but - we have several problems:

    1. functions call inside other functions that also need a stack
    2. sometimes functions call other functions not by their direct reference, but by a pointer to a function
    3. in principle, it is possible - although it should be avoided by all means - a recursive function call when A calls B, B calls C, and C inside itself calls A again
    4. at any time an interruption may occur, the handler of which is the same function that wants its own piece of the stack
    5. if you have a hierarchy of interrupts, another interrupt may happen inside the interrupt!

    Unambiguously, recursive function calls should be deleted from this list, because their presence is an excuse not to consider the stack size, but to go express your opinion to the author of the code. Everything else, alas, cannot be crossed out in the general case (although in particular there may be nuances: for example, all interrupts for you can have the same priority by design, for example, as in RIOT OS, and there will be no nested interrupts).

    Now imagine an oil painting:

    • function A, eating 100 bytes on the stack, calls function B, which needs 50 bytes
    • at the time of execution of B, A itself, obviously, has not finished yet, so its 100 bytes are not freed, so we already have 150 bytes on the stack
    • function B calls function C, and it does so by a pointer that, depending on the program logic, can point to half a dozen different functions consuming from 5 to 50 bytes of stack
    • at runtime C, an interrupt occurs with a heavy handler running relatively long and consuming 20 bytes of stack
    • during interrupt processing, another higher-priority interrupt occurs, the handler of which wants 10 bytes of stack

    In this beautiful design, with a particularly successful coincidence of all circumstances, you will have at least five simultaneously active functions - A, B, C and two interrupt handlers. Moreover, one of them does not have a stack consumption constant, because it can just be a different function in different passes, and to understand the possibility or impossibility of interrupting each other, you must at least know if you have interrupts with different priorities at all , and as a maximum - to understand whether they can overlap each other.

    Obviously, for any automatic static code analyzer this task is extremely close to overwhelming, and it can be performed only in a rough approximation of the upper estimate:

    • sum the stacks of all interrupt handlers
    • sum up stacks of functions that run in the same code branch
    • try to find all the pointers to functions and their calls, and take as the stack size the maximum stack size among the functions that these pointers point to

    In most cases, you get, on the one hand, a very high estimate, and on the other, a chance to skip some particularly tricky function call through pointers.

    Therefore, in the general case, we can simply say: this task is not automatically solved . A manual solution - a person who knows the logic of this program - requires digging quite a few numbers.

    Nevertheless, a static estimate of the size of the stack can be very useful in optimizing the software - at least with the banal purpose of understanding who eats how much and not too much.

    There are two extremely useful tools for this in the GNU / gcc toolchain:

    • flag -fstack-usage
    • cflow utility

    If you add -fstack-usage to the gcc flags (for example, to the Makefile in the line with CFLAGS), then for each compiled file% filename% .c the compiler will create the file% filename% .su, inside which there will be simple and clear text.

    Take, for example, target.su for this gigantic footcloth :

    target.c:159:13:save_settings	8	static
    target.c:172:13:disable_power	8	static
    target.c:291:13:adc_measure_vdda	32	static
    target.c:255:13:adc_measure_current	24	static
    target.c:76:6:cpu_setup	0	static
    target.c:81:6:clock_setup	8	static
    target.c:404:6:dma1_channel1_isr	24	static
    target.c:434:6:adc_comp_isr	40	static
    target.c:767:6:systick_activity	56	static
    target.c:1045:6:user_activity	104	static
    target.c:1215:6:gpio_setup	24	static
    target.c:1323:6:target_console_init	8	static
    target.c:1332:6:led_bit	8	static
    target.c:1362:6:led_num	8	static
    

    Here we see the actual consumption of the stack for each function appearing in it, from which we can draw some conclusions for ourselves - well, for example, that it’s worth trying to optimize in the first place, if we run into a lack of RAM.

    At the same time, attention, this file does not actually provide accurate information about the actual consumption of the stack for functions from which other functions are called !

    To understand the total consumption, we need to build a call tree and summarize the stacks of all functions included in each of its branches. This can be done, for example, with the GNU cflow utility by setting it on one or more files.

    The exhaust here we get an order of magnitude more weighty, I will give only part of it for the same target.c:

    olegart@oleg-npc /mnt/c/Users/oleg/Documents/Git/dap42 (umdk-emb) $ cflow src/stm32f042/umdk-emb/target.c
    adc_comp_isr() :
        TIM_CR1()
        ADC_DR()
        ADC_ISR()
        DMA_CCR()
        GPIO_BSRR()
        GPIO_BRR()
        ADC_TR1()
        ADC_TR1_HT_VAL()
        ADC_TR1_LT_VAL()
        TIM_CNT()
        DMA_CNDTR()
        DIV_ROUND_CLOSEST()
        NVIC_ICPR()
    clock_setup() :
        rcc_clock_setup_in_hsi48_out_48mhz()
        crs_autotrim_usb_enable()
        rcc_set_usbclk_source()
    dma1_channel1_isr() :
        DIV_ROUND_CLOSEST()
    gpio_setup() :
        rcc_periph_clock_enable()
        button_setup() :
            gpio_mode_setup()
        gpio_set_output_options()
        gpio_mode_setup()
        gpio_set()
        gpio_clear()
        rcc_peripheral_enable_clock()
        tim2_setup() :
            rcc_periph_clock_enable()
            rcc_periph_reset_pulse()
            timer_set_mode()
            timer_set_period()
            timer_set_prescaler()
            timer_set_clock_division()
            timer_set_master_mode()
        adc_setup_common() :
            rcc_periph_clock_enable()
            gpio_mode_setup()
            adc_set_clk_source()
            adc_calibrate()
            adc_set_operation_mode()
            adc_disable_discontinuous_mode()
            adc_enable_external_trigger_regular()
            ADC_CFGR1_EXTSEL_VAL()
            adc_set_right_aligned()
            adc_disable_temperature_sensor()
            adc_disable_dma()
            adc_set_resolution()
            adc_disable_eoc_interrupt()
            nvic_set_priority()
            nvic_enable_irq()
            dma_channel_reset()
            dma_set_priority()
            dma_set_memory_size()
            dma_set_peripheral_size()
            dma_enable_memory_increment_mode()
            dma_disable_peripheral_increment_mode()
            dma_enable_transfer_complete_interrupt()
            dma_enable_half_transfer_interrupt()
            dma_set_read_from_peripheral()
            dma_set_peripheral_address()
            dma_set_memory_address()
            dma_enable_circular_mode()
            ADC_CFGR1()
        memcpy()
        console_reconfigure()
        tic33m_init()
        strlen()
        tic33m_display_string()

    And that’s not even half the tree.

    To understand the actual consumption of the stack, we need to take the consumption for each of the functions mentioned in it and sum these values ​​for each of the branches.

    And while we still do not take into account function calls by pointers and interrupts, including nested (and specifically in this code, they can be nested).

    As you might guess, doing this every time you change the code is, to put it mildly, difficult - that's why no one usually does.

    Nevertheless, it is necessary to understand the principles of stack filling - this can lead to certain restrictions on the project code, increasing its reliability in terms of preventing stack overflows (for example, prohibition of nested interrupts or function calls by pointers), and specifically -fstack-usage can greatly help with code optimization on systems with a lack of RAM.

    Also popular now: