Energy Optimization STM32: A Practical Guide
Hi, Habr!
There are quite a few articles about the work of STM32 microcontrollers in energy-efficient devices - as a rule, they are battery-powered devices - but among them there are regrettably few who understand this topic outside of the enumeration of energy-saving modes and SPL / HAL commands that include them (however, the same claim to the vast majority of articles about working with STM32).
Meanwhile, due to the rapid development of smart homes and all kinds of IoT, the topic is becoming increasingly relevant - in such systems, many components are battery powered, and they are expected to have years of continuous operation.
We will fill this gap with the example of the STM32L1 controller, which is a very popular controller, quite economical and at the same time having some specific problems for this series. Virtually everything said will also apply to the STM32L0 and STM32L4, and in terms of common problems and approaches to other controllers on Cortex-M cores.

The bottom line should look something like the photo above (and yes, let's also talk about the applicability of multimeters and other measurement tools to similar tasks).
The basics of battery saving are basic processor power saving modes. They are different for each manufacturer and in each series of controllers (a specific set is a vendor extension of the standard Cortex-M core modes with different nuances regarding the periphery, supply voltages, etc.).
Specifically, the STM32L1, which belongs to the economical series of controllers and in this connection, among other things, has received an expanded set of power settings, we have the following:
Entry into each of the modes is quite simple - you need to set the flags in three to five registers, then (for sleep modes) call the WFI or WFE instruction, this is the standard Cortex-M instruction, meaning "Wait For Interrupt" and "Wait For Event" . Depending on the flags (they are described in the Reference Manual of the processor, for STM32L1 this is RM0038 ), the processor itself will fall on this command in the desired mode.
In addition, it would be nice to prohibit interruptions (this will not affect the ability of external and internal events to bring the processor out of sleep) and wait until the data has been saved from the registers to memory, if such happens, with a DSB command.
For example, this is how the care to Stop mode looks like:
WFI is a blocking instruction, on it the processor will go into a deep sleep and will not leave it until some interruption happens. Yes, I repeat, despite the fact that we have explicitly turned off the interrupts, the processor will react to them and wake up - but it will start processing only after we turn them back on. And this has a deep meaning.
In the code above, after WFI, it’s not just some kind of re-initialization of operating frequencies - the fact is that from deep sleep L1 is alwayscomes out at 4.2 MHz and with an internal MSI generator as the source of this frequency. In many situations, you obviously do not want the interrupt handler to wake up the processor to start running at this frequency — for example, because the frequencies of all the timers, UART, and other buses go off; therefore, we first restore the operating frequencies (or, if we want to stay on MSI, we recalculate the necessary tires under 4.2 MHz), and then we dive into interrupts.
In practice, two modes are most often used - Run and Stop. The fact is that LP Run is painfully slow and does not make sense if the processor needs to perform some calculations, and not just wait for external events, and Sleep and LP Sleep are not very economical (up to 2 mA consumption) and needed if you need save at least a little, but at the same time leave the working periphery and / or ensure the fastest reaction of the processor to events. Such requirements are, but in general, not very often.
Standby mode is usually not used, because after it, because of the reset of RAM, it is impossible to continue from the same place you stopped at, and there are some problems with external devices, which we will discuss below, and which require hardware solutions. However, if the device was designed with this in mind, Standby can be used as an “off” mode, for example, during long-term storage of this device.
Actually, on the presentation of this, most manuals usually solemnly breaks off.
The problem is that, following them, you will get sad 100-200 μA of real consumption instead of the 1.4 μA promised in Stop with the operating hours — even on Nucleo's reference debugging, which has no external chips, sensors, etc. at all. on which it could be written off.
And no, your processor is healthy, there is nothing in errata, and you did everything right.
Just not to the end.
The first problem is STM32L1, about which some articles mention, but more often they remember only on forums, when on the third day of discussion, where did those same 100-200 μA come from, someone remembers the existence of AN3430 and comes to page 19 in it - the condition of the legs by default.
I note that even STMicro itself is too casual to the question, and in most documents, where energy optimization is considered, it is limited to one or two phrases with the advice to pull unused feet to the ground or switch to analog input mode, without giving any reason.
The sadness is that by default all the legs are configured as digital inputs (0x00 in the GPIOx_MODER register). At the digital input, there is always a Schmitt trigger, which improves the noise immunity of this input, while it is completely independent - this is the simplest logical element, a buffer with hysteresis, which does not require external clocking.
In our case, this means that we turned off clocking in Stop mode, and Schmitt triggers continued to work as if nothing had happened - depending on the input signal level, they switch their outputs to 0 and 1.
At the same time, part of the processor's legs in a typical scheme is hanging in the air - that is, there is no intelligible signal to them. It would be wrong to think that the lack of a clear signal means that these legs have 0 - no, these legs due to their high input resistance are some random interference of an unspecified size, from pickups and current flowing from neighboring tracks to the First TV channel, if the foot is long enough to serve as an antenna (however, analog TV in Russia will soon be turned off, which should lead to some reduction in power consumption of incorrectly configured microcontrollers).
In accordance with these fluctuations, the leg in some random way switches between 0 and 1. CMOS logic consumes current when switching. I.e,the processor leg in the air, configured in digital input mode, consumes a noticeable current by itself .
The way out of this is simple - when starting the program, all the legs need to be configured in the analog input state; in STM32, it is formally available for all legs, without exception, regardless of whether they are connected to the ADC or not, and differs from a digital input only by the absence of a Schmitt trigger at the input.

To do this, it is enough to write the value 0xFF ... FF to all the GPIOx_MODER registers. The easiest way to do this, as mentioned above, is right at the start, and then in the course of the play you have to reconfigure the individual legs as needed in this device.
Here, however, a second-order problem arises - well, if your firmware runs on one particular controller, and therefore you always know what GPIOx is equal to x . Worse, if the firmware is universal - the STM32 can have up to 8 ports, but it can be less; If you try to write to the port that does not exist in this model of the controller, you will receive a Hard Fault, i.e. emergency stop the kernel.
However, even this case can be bypassed - Cortex-M allows checking addresses for their validity, and in the case of M3 and M4, the check is generally quite trivial, and on M0 it requires some magic, but is realizable ( we can’t read the details here , we’ll ).
That is, in the general case, the processor started up, tuned the frequencies - and immediately went through all the available GPIO ports, writing the ones in the MODER (the code below was written under RIOT OS, but is generally understandable without comment and can be shifted to any other platform).
I note that this applies only to the L1 series, in L0 and L4 the experience was taken into account, and by default they start to configure all ports as analog inputs.
Having carefully done all these procedures, you fill the firmware into the ready device ... and you get 150 μA in Stop mode on the processor and all external chips turned off, despite the fact that your most pessimistic estimates emanating from datasheets to everything that you have soldered on the board , give no higher than 10 μA.
Moreover, further you try to lead the processor into Standby instead of Stop, i.e. just turn it off almost completely - and instead of falling, the power consumption increases threefold, getting close to half a milliampere!
No need to panic. As you may have guessed, you did everything right. But not until the end.
The following problem consists of two parts.
The first is fairly obvious: if your device does not consist of a single microcontroller, then it is important not to forget that external chips also have input signals on which Schmitt triggers hang, and which, moreover, can wake up the internal logic of the chip. For example, a chip taken away and taken out of its sleep by a UART team, during any movement on this bus, will try to read data from it.
Accordingly, if all these legs hang in the air, nothing good will come to us.
Under what conditions are they in the air?
First, when the controller goes into Standby mode, all GPIOs are transferred to the High-Z state, with high resistance - that is, in fact, external chips connected to them are in the air. It is impossible to fix this programmatically in the STM32L1 (in other series and other controllers it happens differently), so the only way out is in the system in which Standby mode is supposed to be used, the inputs of external chips must be pulled to ground or powered by external resistors.
The specific level is chosen so that the line is inactive from the point of view of the chip:
Secondly, on STM32, when using Stop mode (sic!), The state of GPIO connected to the internal hardware interface blocks may be ... different. That is, the same SPI interface, being configured, in the Stop all of a sudden it turns out to be either a digital input or, in general, High-Z, with corresponding consequences for external chips hanging on it. Given that the documentation claims the preservation of the state of the legs, a priori, you can rely on this only if you use your legs as ordinary GPIO.
You can’t understand and forgive this, but you can remember and correct it: for interfaces that behave in this way, they must be forcedly switched to normal GPIO functions with interfaces corresponding to the inactive levels of this interface. After exiting sleep, interfaces can be restored.
For example, the same SPI before going to sleep (for simplicity, I take the code from the RIOT OS OS, it is clear that the same is easy to implement on registers):
Please note that the outputs here are not configured as GPIO_OUT with a level of 0 or 1, but as inputs with a pull-up to 0 or 1 - this is not a crucial moment, but provides additional security if you make a mistake and try to play with pulling-pushing external chip pulling this leg in the other direction. A short circuit can be made with GPIO_OUT, never with a GPIO_IN with a suspender.
In addition, the SPI CS signal is not affected — in this case, it is generated by software, that is, by the usual GPIO, and maintains its state in a dream confidently.
To restore the state of the legs when leaving sleep, it is enough to record the values of the registers that will be changed (MODER, PUPDR, OTYPER, OSPEEDR - look at the situation in a particular case) when entering, into variables, and when exiting the sleep from variables, roll them back into registers .
And now ... ta-yeah! Title picture. One and a half micro ampere.
But celebrate early. At the same time, we have completed a static optimization of energy consumption, and dynamic is waiting for us ahead .
What is better - eat more and run faster or eat less, but run slower? In the case of microcontrollers, the answer to this question is twice non-trivial.
Firstly, the operating frequencies can be changed in very wide limits - from 65 kHz (LP Run) to 32 MHz in the usual mode. Like any CMOS chip, the STM32 has two components in power consumption - static and dynamic; the second depends on the frequency, the first is constant. As a result, energy consumption will not decrease as fast as the operating frequency and performance, and depending on the task, the optimum frequency from the point of view of energy efficiency may be different - where you have to wait for some event, but for some reason you cannot go to sleep, there will be low frequencies are effective, where you only need to thresh the numbers - high. In typical “hospital-average” tasks, it usually does not make sense to descend below 2-4 MHz.
Secondly, and this is a less trivial moment, the speed of getting out of sleep depends on the operating frequency and the way it is received.
The worst case is getting out of sleep at a frequency of 32 MHz from external quartz (I remind you that STM32L1 wakes up on an internal generator at 4 MHz), because it consists of three stages:
Actually, the processor's exit from sleep here is the smallest problem, with a frequency of 4.2 MHz it takes about 10 μs. But stabilization of quartz can take up to 1 ms (although usually for high-speed resonators it is still faster, on the order of several hundred microseconds), access to the PLL mode is another 160 µs.
These delays may be insignificant from the point of view of energy consumption for a system that rarely wakes up (no more than once per second), but where the period between spills is tens of milliseconds or less, and the spillages themselves are short, overhead costs begin to make a completely measurable additive taking into account that during the process of waking up, the processor consumes a relatively small current.
What can be done with this? In general, the answer is obvious: try to avoid using external quartz. For example, a program in which there are rare heavy subtasks that require precise clocking (say, from the trivial - data exchange on the UART), and frequent simple subtasks, within each awakening can decide for some reason or other on whether external quartz, or it will be simpler (and faster!) to perform the current task on the MSI generator, on which the processor has already woken up, without spending a lot of time initializing the frequencies.
In this case, however, it may be necessary to adjust the clocking frequencies of the peripherals, as well as adjusting the flash memory access modes (the number of delay cycles), the processor core supply voltage (in STM32L1 it is selected from three possible values), etc. However, with regard to the operating modes of the kernel and memory, it is often possible to score on their adjustment, choosing the recommended ones for the maximum frequency used, since the non-optimal operation of the kernel at lower frequencies will not give a significant change in practical performance and power consumption due to the small amount of tasks at these frequencies performed by
Although all such measures relate to fine-tuning modes (and, for example, most operating systems and libraries do not even know anything closely similar out of the box), in some cases they can reduce the average consumption of a few percent scale, and sometimes more. Imagine, for example, a water meter that polls the reed switch contacts every 50 ms, while the actual poll itself takes several tens of microseconds - do you want to add ~ 500 μs by this time when the controller wakes up? ..
Another problem that is not directly related to energy saving, but inevitably occurs in connection with it - how to count down time intervals of less than 1 second?
The fact is that on STM32L1 there is only one timer operating in the Stop mode - this is the RTC, the regular time unit of which is 1 second. At the same time, in the programs there are constantly time intervals in units, tens and hundreds of milliseconds, to take at least the same water meter.
How to be? Run on processors with LPTIM timers capable of clocking from 32768 Hz? A good option, in fact, but not always necessary. It is possible without him.
Not at all STM32L1, but starting with Cat. 2 (these are STM32L151CB-A, STM32L151CC and newer processors), the RTC block was supplemented with a new register - SSR, SubSeconds Register. More precisely, it was not so much added, how many made it visible to the user, plus added subsecond alarms ALRMASSR and ALRMBSSR.
This register does not contain any understandable units of time, it was quickly made from a technical internal counter. In the STM32L1, the clock generator ticking at 32768 Hz passes through two counter-dividers, asynchronous and synchronous, which in total in normal mode divide it by 32768 to get a 1-second tick for the clock. So, SSR is just the current value of the second counter.
Although SSR considers not in milliseconds, but in its units, the dimension of these units can be changed by changing the ratio of the dividers of the synchronous and asynchronous counter, while keeping their total coefficient equal to 32768 to get the standard 1 second at the RTC input. Knowing these coefficients, you can calculate the price of one SSR division in milliseconds, and from here you can go on to programming subsecond alarms.
It should be noted that an asynchronous pre-counter is more economical than a synchronous SSR, and therefore it should be set to 1, and the input frequency to SSR should be divided by 32768, having received a count of only 30 µs, is energetically unprofitable. For ourselves, we determined the optimal value for the preliminary divider 7, for the synchronous one - 4095 ((7 + 1) * (4095 + 1) = 32768). With a further decrease in the preliminary divider, the power consumption of the RTC begins to grow measurably - by the fraction of the microampere, but since we compare it with the “reference” 1.4 μA in the Stop mode, even the fractions matter. By default, the STM32L1 has these values 127 and 255, i.e. the countdown price is about 4 ms, which is a bit rough.
If you want to dig into the code, then at the time we have finished the regular RTC driverfrom RIOT OS to support RTC_SSR and millisecond intervals. We have been using literally at every step since (and since we are working in the OS, a service also hangs on top of it, which allows us to hang almost any number of tasks with arbitrary periods on a single hardware timer).
The same approach is transferred to the STM32L0 and STM32L4 controllers, all models of which have the RTC_SSR register; This allows you to not mess around with the LPTIM timers and unify the code for different platforms.
Of course, after all the optimizations a legitimate question arises: what, in fact, have we achieved? Without knowing the answer to it, it would have been possible to limit one WFE with properly configured flags, go to sleep and get our 200-500 μA.
The most traditional way to measure current is, of course, a multimeter. To understand that he is lying on a load like a microcontroller with its dynamic consumption is very simple - if it is turned on, it means it is lying.
This, however, does not mean that the multimeter in this matter is useless. You just need to be able to apply it.
First of all, a multimeter is a very slow thing, the typical time for one count in it is the scale of a second, the typical time for changing the state of a microcontroller is the scale of microseconds. In a system that changes its consumption at such a pace, the multimeter will show just random values.
However, one of the non-random values we are interested in is the consumption of the microcontroller in sleep mode; if it significantly exceeds the value that we estimated on datasheets, it means that something is clearly wrong. This is the consumption of a static system , that is, it can be measured with a multimeter.
The most trivial method shown in the title photo is a multimeter in the microammeter mode, which is now in most mid-level models, and has good accuracy and excellent resolution. UT120C has a resolution of 0.1 µA with a passport accuracy of ± 1% ± 3 digits, which is enough for our eyes.
The problem with this mode is one - multimeters have a large series resistance in it, the scale is hundreds of ohms, so in normal mode the microcontroller with such a multimeter in the power circuit simply will not start. Fortunately, the “mA” and “uA” positions are practically in all instruments on the scale side by side, there are one measuring sockets for both ranges, so you can safely start the controller at the “mA” limit, and when it goes to sleep, click on “uA” "- this happens quickly enough so that the controller does not have time to lose power and reboot.
Please note that if a controller has bursts of activity, this method is not applicable. In the firmware of the device from a photo, for example, every 15 seconds the watchdog timer is reset - at these moments the multimeter has time to show something around 27 μA, which, of course, has nothing to do with anything other than the weather on Mars. If something arbitrarily short happens on your system more than once every 5-10 seconds, the multimeter will simply lie.
Another way to measure static(I here highlight this word) consumption by a multimeter is a measurement of a fall on an external shunt. If it is required to measure ultralow currents of a scale of units to tens of microamperes, then a large shunt (for example, 1 kΩ) should be put, and a Schottky diode in direct connection parallel to it. When a shunt falls on more than 0.3 V, the diode will open and limit the voltage drop, and up to 0.3 V, you can safely measure the drop by a multimeter on the millivolt range, 1 mV = 1 μA.
Measuring with a typical multimeter a drop in a low-resistance shunt, unfortunately, will not work - middle-class devices, even if they show something below 100 µV, have regrettable accuracy in this range. If you have a good desktop device capable of showing 1 µV, you no longer need my advice.
However, static is good, but what about dynamics? How to evaluate the same effect of different frequencies on the average power consumption?
Here everything is difficult.
Let's write down the basic requirements:
If we just directly translate this into numbers, we get a relatively fast and at least 18-bit ADC with an input offset of less than 30 µV, an analog frontend capable of measuring voltages from 1 µV, and a fast interface to a computer that will allow us to transmit all this and save.
And all this under a single application.
You see, yes, why such things on every corner for ten bucks do not lie? Keysight N6705C in the first approximation meets our requirements, only it costs $ 7960.
From more budget solutions, for example, SiLabs embeds current measurement into its debugs - the characteristics of their Advanced Energy Monitoring (AEM) System depend on the specific debugging model, and they have the greatest problem with the measurement speed. In the old STK3300 / 3400 starter kits, this is only 100 Hz, on newer STK3700 / 3800 debugs (easily recognizable by black textolite) - 6.25 kHz, while in older models, the DK series can go up to 10 kHz, but they also cost they are already $ 300 +. For serious tasks, SiLabs officially recommends the aforementioned Keysight.
In principle, such a device can be designed by yourself - first of all, very good op-amps are needed with a minimum input offset, like OPA2335. Such OU put on the same shunt 2-3 pieces with different gains, they all turn on different inputs of the ADC (with this approach it is possible to use the built-in microcontroller), then every time you collect data the moment is not overloaded, the readings from it are counted.
The problem of the speed of data transfer to a computer is solved quite simply - since for practical purposes, we are interested primarily in the average system consumption in real life, microsecond samples can be collected in an on-board microcontroller of the meter and output the arithmetic average for a reasonable millisecond scale.
In addition, as practice shows, it is very useful to have a meter-logger, albeit simple and not too accurate, but always at hand - so as not to get surprises with a power-saving change broken by some firmware change.
For example, we have built this into our standard USB-adapter UMDK-RF, which is constantly used when debugging firmware - it already has a SWD-programmer with support for the DAPLink protocol, a USB-UART bridge and power management logic, respectively, the consumption meter got it almost free appendage. The meter itself is a 1-ohm shunt and an INA213 amplifier (gain 50 times, typical zero offset 5 µV): The

amplifier is connected directly to the input of the microcontroller's ADC (STM32F042F6P6), the ADC works with a period of 10 µs using a hardware timer, and upwards via USB the average data for the 100-ms interval is given. As a result, having changed something in the logic of the firmware, you can just go for a smoke or drink coffee, leaving the device on the table, and returning to look at a graph like this:

Of course, the accuracy of such a “free” device is low — with a 12-bit ADC and one amplifier, the minimum quantum is 16 μA, but it is extremely useful for quick and regular assessment of the behavior of debugged devices in terms of power consumption. In the end, if you do something wrong with the firmware or device, then with a very high guarantee you will get out of the units of microamps at least hundreds, and this will be clearly seen.

A separate nice bonus is that since the data is output to the virtual COM port in text form (values in microamperes), you can position the terminal window next to the window that shows the device console and look at the power consumption at the same time as the debug messages.
I brag about it for a reason, and to offer everyone to use this minimal (and very cheap!) Programmer-debugger in their own projects.
You can draw the scheme here ( source code in DipTrace ), pull the firmware down - here (the umdk-rf branch, the UMDK-RF target when building, is based on the dap42 project ). The scheme is drawn though messy, but I hope the main points are clear, the firmware is written in C using libopencm3 and assembled with the usual arm-none-eabi-gcc. As additional functions, the firmware has power management, capturing overload signals from the control keys and entering the controller connected to it into its native bootloader by long pressing the button.
NB: if you want the boot button to set up its own programmer controller in its bootloader in a regular way, it must have the connection polarity changed, the option bytes of the controller on the first boot should be removed and the program input to the bootloader should be changed in firmware functions of this button.
You can look at how current is measured on a pair of op amps with different gain factors (for example, to improve the debugger described above for your needs), here (p. 9), a more traditional alternative is with one opamp and an expensive 24-bit ADC - TI has (EnergyTrace on page 5).
PS Please note that when debugging with a connected UART or JTAG / SWD, a small current can also flow through their legs, which will not be possible during actual operation of the device. So, UMDK-RF in SWD leaks about 15 µA (and therefore, in the title picture, measurements with a multimeter are made on the old version of the board, without SWD), and on STM32 Nucleo there were also cases with parasitic consumption through SWD of about 200 µA . The debugging boards used for measurement should be checked for such features — either by disconnecting their interface lines, if there is such a possibility, or by comparing the results with the device consumption, measured without installation for debugging, for example, with a multimeter in static mode.
I hope you already understand what mistake you made by choosing programming microcontrollers as your main specialty.
There are quite a few articles about the work of STM32 microcontrollers in energy-efficient devices - as a rule, they are battery-powered devices - but among them there are regrettably few who understand this topic outside of the enumeration of energy-saving modes and SPL / HAL commands that include them (however, the same claim to the vast majority of articles about working with STM32).
Meanwhile, due to the rapid development of smart homes and all kinds of IoT, the topic is becoming increasingly relevant - in such systems, many components are battery powered, and they are expected to have years of continuous operation.
We will fill this gap with the example of the STM32L1 controller, which is a very popular controller, quite economical and at the same time having some specific problems for this series. Virtually everything said will also apply to the STM32L0 and STM32L4, and in terms of common problems and approaches to other controllers on Cortex-M cores.

The bottom line should look something like the photo above (and yes, let's also talk about the applicability of multimeters and other measurement tools to similar tasks).
Power saving modes in STM32L1
The basics of battery saving are basic processor power saving modes. They are different for each manufacturer and in each series of controllers (a specific set is a vendor extension of the standard Cortex-M core modes with different nuances regarding the periphery, supply voltages, etc.).
Specifically, the STM32L1, which belongs to the economical series of controllers and in this connection, among other things, has received an expanded set of power settings, we have the following:
- Run - normal mode. All inclusive, all peripherals available, frequency up to 32 MHz.
- Low Power Run (LP Run) - a special mode with an operating frequency of 131 kHz and a maximum power consumption, including all peripherals , 200 μA. In LP Run mode, the processor power stabilizer switches to a special economy mode, which saves up to fifty microamps compared to running on the same frequency in Run mode.
- Sleep - suspend the core, but with all the clock frequencies. The processor periphery can continue to work if it does not need the kernel, but it can also be automatically disabled.
- Low Power Sleep (LP Sleep) - a combination of Sleep with the stabilizer in the economy mode. Clock frequency is not higher than 131 kHz, total consumption is not higher than 200 µA.
- Stop - full stop of all clock frequencies, except for the “clock” generator of 32768 Hz, external or internal. In the case of the STM32L1, only the real-time clock continues to work in this mode, everything else stops completely; in newer processors, some peripherals may be clocked from a low frequency. Almost all the legs of the processor retain their state. The contents of the RAM is saved, external interrupts continue to work.
- Standby - complete shutdown of the processor core, RAM and all peripherals, except for the real-time clock. The RAM is not saved (that is, from the software point of view, going to Standby is almost the same as juggling the power - start from the beginning), the RTC continues to tick. External interrupts do not work, except for the three special legs of WKUPx, switching from 0 to 1 awakens the processor.
Entry into each of the modes is quite simple - you need to set the flags in three to five registers, then (for sleep modes) call the WFI or WFE instruction, this is the standard Cortex-M instruction, meaning "Wait For Interrupt" and "Wait For Event" . Depending on the flags (they are described in the Reference Manual of the processor, for STM32L1 this is RM0038 ), the processor itself will fall on this command in the desired mode.
In addition, it would be nice to prohibit interruptions (this will not affect the ability of external and internal events to bring the processor out of sleep) and wait until the data has been saved from the registers to memory, if such happens, with a DSB command.
For example, this is how the care to Stop mode looks like:
/* флаг PDDS определяет выбор между Stop и Standby, его надо сбросить */
PWR->CR &= ~(PWR_CR_PDDS);
/* флаг Wakeup должн быть очищен, иначе есть шанс проснуться немедленно */
PWR->CR |= PWR_CR_CWUF;
/* стабилизатор питания в low-power режим, у нас в Stop потребления-то почти не будет */
PWR->CR |= PWR_CR_LPSDSR;
/* источник опорного напряжения Vref выключить автоматически */
PWR->CR |= PWR_CR_ULP;
/* с точки зрения ядра Cortex-M, что Stop, что Standby - это режим Deep Sleep *//* поэтому надо в ядре включить Deep Sleep */
SCB->SCR |= (SCB_SCR_SLEEPDEEP_Msk);
/* выключили прерывания; пробуждению по ним это не помешает */unsigned state = irq_disable();
/* завершили незавершённые операция сохранения данных */
__DSB();
/* заснули */
__WFI();
/* переинициализация рабочих частот */
init_clk();
/* после просыпания восстановили прерывания */
irq_restore(state);
WFI is a blocking instruction, on it the processor will go into a deep sleep and will not leave it until some interruption happens. Yes, I repeat, despite the fact that we have explicitly turned off the interrupts, the processor will react to them and wake up - but it will start processing only after we turn them back on. And this has a deep meaning.
In the code above, after WFI, it’s not just some kind of re-initialization of operating frequencies - the fact is that from deep sleep L1 is alwayscomes out at 4.2 MHz and with an internal MSI generator as the source of this frequency. In many situations, you obviously do not want the interrupt handler to wake up the processor to start running at this frequency — for example, because the frequencies of all the timers, UART, and other buses go off; therefore, we first restore the operating frequencies (or, if we want to stay on MSI, we recalculate the necessary tires under 4.2 MHz), and then we dive into interrupts.
In practice, two modes are most often used - Run and Stop. The fact is that LP Run is painfully slow and does not make sense if the processor needs to perform some calculations, and not just wait for external events, and Sleep and LP Sleep are not very economical (up to 2 mA consumption) and needed if you need save at least a little, but at the same time leave the working periphery and / or ensure the fastest reaction of the processor to events. Such requirements are, but in general, not very often.
Standby mode is usually not used, because after it, because of the reset of RAM, it is impossible to continue from the same place you stopped at, and there are some problems with external devices, which we will discuss below, and which require hardware solutions. However, if the device was designed with this in mind, Standby can be used as an “off” mode, for example, during long-term storage of this device.
Actually, on the presentation of this, most manuals usually solemnly breaks off.
The problem is that, following them, you will get sad 100-200 μA of real consumption instead of the 1.4 μA promised in Stop with the operating hours — even on Nucleo's reference debugging, which has no external chips, sensors, etc. at all. on which it could be written off.
And no, your processor is healthy, there is nothing in errata, and you did everything right.
Just not to the end.
Restless legs syndrome
The first problem is STM32L1, about which some articles mention, but more often they remember only on forums, when on the third day of discussion, where did those same 100-200 μA come from, someone remembers the existence of AN3430 and comes to page 19 in it - the condition of the legs by default.
I note that even STMicro itself is too casual to the question, and in most documents, where energy optimization is considered, it is limited to one or two phrases with the advice to pull unused feet to the ground or switch to analog input mode, without giving any reason.
The sadness is that by default all the legs are configured as digital inputs (0x00 in the GPIOx_MODER register). At the digital input, there is always a Schmitt trigger, which improves the noise immunity of this input, while it is completely independent - this is the simplest logical element, a buffer with hysteresis, which does not require external clocking.
In our case, this means that we turned off clocking in Stop mode, and Schmitt triggers continued to work as if nothing had happened - depending on the input signal level, they switch their outputs to 0 and 1.
At the same time, part of the processor's legs in a typical scheme is hanging in the air - that is, there is no intelligible signal to them. It would be wrong to think that the lack of a clear signal means that these legs have 0 - no, these legs due to their high input resistance are some random interference of an unspecified size, from pickups and current flowing from neighboring tracks to the First TV channel, if the foot is long enough to serve as an antenna (however, analog TV in Russia will soon be turned off, which should lead to some reduction in power consumption of incorrectly configured microcontrollers).
In accordance with these fluctuations, the leg in some random way switches between 0 and 1. CMOS logic consumes current when switching. I.e,the processor leg in the air, configured in digital input mode, consumes a noticeable current by itself .
The way out of this is simple - when starting the program, all the legs need to be configured in the analog input state; in STM32, it is formally available for all legs, without exception, regardless of whether they are connected to the ADC or not, and differs from a digital input only by the absence of a Schmitt trigger at the input.

To do this, it is enough to write the value 0xFF ... FF to all the GPIOx_MODER registers. The easiest way to do this, as mentioned above, is right at the start, and then in the course of the play you have to reconfigure the individual legs as needed in this device.
Here, however, a second-order problem arises - well, if your firmware runs on one particular controller, and therefore you always know what GPIOx is equal to x . Worse, if the firmware is universal - the STM32 can have up to 8 ports, but it can be less; If you try to write to the port that does not exist in this model of the controller, you will receive a Hard Fault, i.e. emergency stop the kernel.
However, even this case can be bypassed - Cortex-M allows checking addresses for their validity, and in the case of M3 and M4, the check is generally quite trivial, and on M0 it requires some magic, but is realizable ( we can’t read the details here , we’ll ).
That is, in the general case, the processor started up, tuned the frequencies - and immediately went through all the available GPIO ports, writing the ones in the MODER (the code below was written under RIOT OS, but is generally understandable without comment and can be shifted to any other platform).
#if defined(CPU_FAM_STM32L1)/* switch all GPIOs to AIN mode to minimize power consumption */
GPIO_TypeDef *port;
/* enable GPIO clock */uint32_t ahb_gpio_clocks = RCC->AHBENR & 0xFF;
periph_clk_en(AHB, 0xFF);
for (uint8_t i = 0; i < 8; i++) {
port = (GPIO_TypeDef *)(GPIOA_BASE + i*(GPIOB_BASE - GPIOA_BASE));
if (cpu_check_address((char *)port)) {
port->MODER = 0xffffffff;
} else {
break;
}
}
/* restore GPIO clock */uint32_t tmpreg = RCC->AHBENR;
tmpreg &= ~((uint32_t)0xFF);
tmpreg |= ahb_gpio_clocks;
periph_clk_en(AHB, tmpreg);
#endif
I note that this applies only to the L1 series, in L0 and L4 the experience was taken into account, and by default they start to configure all ports as analog inputs.
Having carefully done all these procedures, you fill the firmware into the ready device ... and you get 150 μA in Stop mode on the processor and all external chips turned off, despite the fact that your most pessimistic estimates emanating from datasheets to everything that you have soldered on the board , give no higher than 10 μA.
Moreover, further you try to lead the processor into Standby instead of Stop, i.e. just turn it off almost completely - and instead of falling, the power consumption increases threefold, getting close to half a milliampere!
No need to panic. As you may have guessed, you did everything right. But not until the end.
Restless Legs Syndrome - 2
The following problem consists of two parts.
The first is fairly obvious: if your device does not consist of a single microcontroller, then it is important not to forget that external chips also have input signals on which Schmitt triggers hang, and which, moreover, can wake up the internal logic of the chip. For example, a chip taken away and taken out of its sleep by a UART team, during any movement on this bus, will try to read data from it.
Accordingly, if all these legs hang in the air, nothing good will come to us.
Under what conditions are they in the air?
First, when the controller goes into Standby mode, all GPIOs are transferred to the High-Z state, with high resistance - that is, in fact, external chips connected to them are in the air. It is impossible to fix this programmatically in the STM32L1 (in other series and other controllers it happens differently), so the only way out is in the system in which Standby mode is supposed to be used, the inputs of external chips must be pulled to ground or powered by external resistors.
The specific level is chosen so that the line is inactive from the point of view of the chip:
- 1 for UART TX
- 0 for SPI MOSI
- 0 for SPI CLK with SPI Mode 0 or 1
- 1 for SPI CLK with SPI Mode 2 or 3
- 1 for SPI CS
Secondly, on STM32, when using Stop mode (sic!), The state of GPIO connected to the internal hardware interface blocks may be ... different. That is, the same SPI interface, being configured, in the Stop all of a sudden it turns out to be either a digital input or, in general, High-Z, with corresponding consequences for external chips hanging on it. Given that the documentation claims the preservation of the state of the legs, a priori, you can rely on this only if you use your legs as ordinary GPIO.
You can’t understand and forgive this, but you can remember and correct it: for interfaces that behave in this way, they must be forcedly switched to normal GPIO functions with interfaces corresponding to the inactive levels of this interface. After exiting sleep, interfaces can be restored.
For example, the same SPI before going to sleep (for simplicity, I take the code from the RIOT OS OS, it is clear that the same is easy to implement on registers):
/* specifically set GPIOs used for external SPI devices *//* MOSI = 0, SCK = 0, MISO = AIN for SPI Mode 0 & 1 (CPOL = 0) *//* MOSI = 0, SCK = 1, MISO = AIN for SPI Mode 2 & 3 (CPOL = 1) */for (i = 0; i < SPI_NUMOF; i++) {
/* check if SPI is in use */if (is_periph_clk(spi_config[i].apbbus, spi_config[i].rccmask) == 1) {
/* SPI CLK polarity */if (spi_config[i].dev->CR1 & (1<<1)) {
gpio_init(spi_config[i].sclk_pin, GPIO_IN_PU);
} else {
gpio_init(spi_config[i].sclk_pin, GPIO_IN_PD);
}
gpio_init(spi_config[i].mosi_pin, GPIO_IN_PD);
gpio_init(spi_config[i].miso_pin, GPIO_AIN);
}
}
Please note that the outputs here are not configured as GPIO_OUT with a level of 0 or 1, but as inputs with a pull-up to 0 or 1 - this is not a crucial moment, but provides additional security if you make a mistake and try to play with pulling-pushing external chip pulling this leg in the other direction. A short circuit can be made with GPIO_OUT, never with a GPIO_IN with a suspender.
In addition, the SPI CS signal is not affected — in this case, it is generated by software, that is, by the usual GPIO, and maintains its state in a dream confidently.
To restore the state of the legs when leaving sleep, it is enough to record the values of the registers that will be changed (MODER, PUPDR, OTYPER, OSPEEDR - look at the situation in a particular case) when entering, into variables, and when exiting the sleep from variables, roll them back into registers .
And now ... ta-yeah! Title picture. One and a half micro ampere.
But celebrate early. At the same time, we have completed a static optimization of energy consumption, and dynamic is waiting for us ahead .
Achilles vs the tortoise
What is better - eat more and run faster or eat less, but run slower? In the case of microcontrollers, the answer to this question is twice non-trivial.
Firstly, the operating frequencies can be changed in very wide limits - from 65 kHz (LP Run) to 32 MHz in the usual mode. Like any CMOS chip, the STM32 has two components in power consumption - static and dynamic; the second depends on the frequency, the first is constant. As a result, energy consumption will not decrease as fast as the operating frequency and performance, and depending on the task, the optimum frequency from the point of view of energy efficiency may be different - where you have to wait for some event, but for some reason you cannot go to sleep, there will be low frequencies are effective, where you only need to thresh the numbers - high. In typical “hospital-average” tasks, it usually does not make sense to descend below 2-4 MHz.
Secondly, and this is a less trivial moment, the speed of getting out of sleep depends on the operating frequency and the way it is received.
The worst case is getting out of sleep at a frequency of 32 MHz from external quartz (I remind you that STM32L1 wakes up on an internal generator at 4 MHz), because it consists of three stages:
- processor output from sleep
- stabilization of quartz generation (1-24 MHz)
- PLL generation stabilization (32 MHz)
Actually, the processor's exit from sleep here is the smallest problem, with a frequency of 4.2 MHz it takes about 10 μs. But stabilization of quartz can take up to 1 ms (although usually for high-speed resonators it is still faster, on the order of several hundred microseconds), access to the PLL mode is another 160 µs.
These delays may be insignificant from the point of view of energy consumption for a system that rarely wakes up (no more than once per second), but where the period between spills is tens of milliseconds or less, and the spillages themselves are short, overhead costs begin to make a completely measurable additive taking into account that during the process of waking up, the processor consumes a relatively small current.
What can be done with this? In general, the answer is obvious: try to avoid using external quartz. For example, a program in which there are rare heavy subtasks that require precise clocking (say, from the trivial - data exchange on the UART), and frequent simple subtasks, within each awakening can decide for some reason or other on whether external quartz, or it will be simpler (and faster!) to perform the current task on the MSI generator, on which the processor has already woken up, without spending a lot of time initializing the frequencies.
In this case, however, it may be necessary to adjust the clocking frequencies of the peripherals, as well as adjusting the flash memory access modes (the number of delay cycles), the processor core supply voltage (in STM32L1 it is selected from three possible values), etc. However, with regard to the operating modes of the kernel and memory, it is often possible to score on their adjustment, choosing the recommended ones for the maximum frequency used, since the non-optimal operation of the kernel at lower frequencies will not give a significant change in practical performance and power consumption due to the small amount of tasks at these frequencies performed by
Although all such measures relate to fine-tuning modes (and, for example, most operating systems and libraries do not even know anything closely similar out of the box), in some cases they can reduce the average consumption of a few percent scale, and sometimes more. Imagine, for example, a water meter that polls the reed switch contacts every 50 ms, while the actual poll itself takes several tens of microseconds - do you want to add ~ 500 μs by this time when the controller wakes up? ..
Unbearable long second
Another problem that is not directly related to energy saving, but inevitably occurs in connection with it - how to count down time intervals of less than 1 second?
The fact is that on STM32L1 there is only one timer operating in the Stop mode - this is the RTC, the regular time unit of which is 1 second. At the same time, in the programs there are constantly time intervals in units, tens and hundreds of milliseconds, to take at least the same water meter.
How to be? Run on processors with LPTIM timers capable of clocking from 32768 Hz? A good option, in fact, but not always necessary. It is possible without him.
Not at all STM32L1, but starting with Cat. 2 (these are STM32L151CB-A, STM32L151CC and newer processors), the RTC block was supplemented with a new register - SSR, SubSeconds Register. More precisely, it was not so much added, how many made it visible to the user, plus added subsecond alarms ALRMASSR and ALRMBSSR.
This register does not contain any understandable units of time, it was quickly made from a technical internal counter. In the STM32L1, the clock generator ticking at 32768 Hz passes through two counter-dividers, asynchronous and synchronous, which in total in normal mode divide it by 32768 to get a 1-second tick for the clock. So, SSR is just the current value of the second counter.
Although SSR considers not in milliseconds, but in its units, the dimension of these units can be changed by changing the ratio of the dividers of the synchronous and asynchronous counter, while keeping their total coefficient equal to 32768 to get the standard 1 second at the RTC input. Knowing these coefficients, you can calculate the price of one SSR division in milliseconds, and from here you can go on to programming subsecond alarms.
It should be noted that an asynchronous pre-counter is more economical than a synchronous SSR, and therefore it should be set to 1, and the input frequency to SSR should be divided by 32768, having received a count of only 30 µs, is energetically unprofitable. For ourselves, we determined the optimal value for the preliminary divider 7, for the synchronous one - 4095 ((7 + 1) * (4095 + 1) = 32768). With a further decrease in the preliminary divider, the power consumption of the RTC begins to grow measurably - by the fraction of the microampere, but since we compare it with the “reference” 1.4 μA in the Stop mode, even the fractions matter. By default, the STM32L1 has these values 127 and 255, i.e. the countdown price is about 4 ms, which is a bit rough.
If you want to dig into the code, then at the time we have finished the regular RTC driverfrom RIOT OS to support RTC_SSR and millisecond intervals. We have been using literally at every step since (and since we are working in the OS, a service also hangs on top of it, which allows us to hang almost any number of tasks with arbitrary periods on a single hardware timer).
The same approach is transferred to the STM32L0 and STM32L4 controllers, all models of which have the RTC_SSR register; This allows you to not mess around with the LPTIM timers and unify the code for different platforms.
How to understand that a multimeter is lying
Of course, after all the optimizations a legitimate question arises: what, in fact, have we achieved? Without knowing the answer to it, it would have been possible to limit one WFE with properly configured flags, go to sleep and get our 200-500 μA.
The most traditional way to measure current is, of course, a multimeter. To understand that he is lying on a load like a microcontroller with its dynamic consumption is very simple - if it is turned on, it means it is lying.
This, however, does not mean that the multimeter in this matter is useless. You just need to be able to apply it.
First of all, a multimeter is a very slow thing, the typical time for one count in it is the scale of a second, the typical time for changing the state of a microcontroller is the scale of microseconds. In a system that changes its consumption at such a pace, the multimeter will show just random values.
However, one of the non-random values we are interested in is the consumption of the microcontroller in sleep mode; if it significantly exceeds the value that we estimated on datasheets, it means that something is clearly wrong. This is the consumption of a static system , that is, it can be measured with a multimeter.
The most trivial method shown in the title photo is a multimeter in the microammeter mode, which is now in most mid-level models, and has good accuracy and excellent resolution. UT120C has a resolution of 0.1 µA with a passport accuracy of ± 1% ± 3 digits, which is enough for our eyes.
The problem with this mode is one - multimeters have a large series resistance in it, the scale is hundreds of ohms, so in normal mode the microcontroller with such a multimeter in the power circuit simply will not start. Fortunately, the “mA” and “uA” positions are practically in all instruments on the scale side by side, there are one measuring sockets for both ranges, so you can safely start the controller at the “mA” limit, and when it goes to sleep, click on “uA” "- this happens quickly enough so that the controller does not have time to lose power and reboot.
Please note that if a controller has bursts of activity, this method is not applicable. In the firmware of the device from a photo, for example, every 15 seconds the watchdog timer is reset - at these moments the multimeter has time to show something around 27 μA, which, of course, has nothing to do with anything other than the weather on Mars. If something arbitrarily short happens on your system more than once every 5-10 seconds, the multimeter will simply lie.
Another way to measure static(I here highlight this word) consumption by a multimeter is a measurement of a fall on an external shunt. If it is required to measure ultralow currents of a scale of units to tens of microamperes, then a large shunt (for example, 1 kΩ) should be put, and a Schottky diode in direct connection parallel to it. When a shunt falls on more than 0.3 V, the diode will open and limit the voltage drop, and up to 0.3 V, you can safely measure the drop by a multimeter on the millivolt range, 1 mV = 1 μA.
Measuring with a typical multimeter a drop in a low-resistance shunt, unfortunately, will not work - middle-class devices, even if they show something below 100 µV, have regrettable accuracy in this range. If you have a good desktop device capable of showing 1 µV, you no longer need my advice.
However, static is good, but what about dynamics? How to evaluate the same effect of different frequencies on the average power consumption?
Here everything is difficult.
Let's write down the basic requirements:
- current range of at least 1 μA - 100 mA (10 ^ 5)
- measurement period not more than 10 μs
- voltage drop not higher than 100 mV
- duration of measurement - unlimited
If we just directly translate this into numbers, we get a relatively fast and at least 18-bit ADC with an input offset of less than 30 µV, an analog frontend capable of measuring voltages from 1 µV, and a fast interface to a computer that will allow us to transmit all this and save.
And all this under a single application.
You see, yes, why such things on every corner for ten bucks do not lie? Keysight N6705C in the first approximation meets our requirements, only it costs $ 7960.
From more budget solutions, for example, SiLabs embeds current measurement into its debugs - the characteristics of their Advanced Energy Monitoring (AEM) System depend on the specific debugging model, and they have the greatest problem with the measurement speed. In the old STK3300 / 3400 starter kits, this is only 100 Hz, on newer STK3700 / 3800 debugs (easily recognizable by black textolite) - 6.25 kHz, while in older models, the DK series can go up to 10 kHz, but they also cost they are already $ 300 +. For serious tasks, SiLabs officially recommends the aforementioned Keysight.
In principle, such a device can be designed by yourself - first of all, very good op-amps are needed with a minimum input offset, like OPA2335. Such OU put on the same shunt 2-3 pieces with different gains, they all turn on different inputs of the ADC (with this approach it is possible to use the built-in microcontroller), then every time you collect data the moment is not overloaded, the readings from it are counted.
The problem of the speed of data transfer to a computer is solved quite simply - since for practical purposes, we are interested primarily in the average system consumption in real life, microsecond samples can be collected in an on-board microcontroller of the meter and output the arithmetic average for a reasonable millisecond scale.
In addition, as practice shows, it is very useful to have a meter-logger, albeit simple and not too accurate, but always at hand - so as not to get surprises with a power-saving change broken by some firmware change.
For example, we have built this into our standard USB-adapter UMDK-RF, which is constantly used when debugging firmware - it already has a SWD-programmer with support for the DAPLink protocol, a USB-UART bridge and power management logic, respectively, the consumption meter got it almost free appendage. The meter itself is a 1-ohm shunt and an INA213 amplifier (gain 50 times, typical zero offset 5 µV): The

amplifier is connected directly to the input of the microcontroller's ADC (STM32F042F6P6), the ADC works with a period of 10 µs using a hardware timer, and upwards via USB the average data for the 100-ms interval is given. As a result, having changed something in the logic of the firmware, you can just go for a smoke or drink coffee, leaving the device on the table, and returning to look at a graph like this:

Of course, the accuracy of such a “free” device is low — with a 12-bit ADC and one amplifier, the minimum quantum is 16 μA, but it is extremely useful for quick and regular assessment of the behavior of debugged devices in terms of power consumption. In the end, if you do something wrong with the firmware or device, then with a very high guarantee you will get out of the units of microamps at least hundreds, and this will be clearly seen.

A separate nice bonus is that since the data is output to the virtual COM port in text form (values in microamperes), you can position the terminal window next to the window that shows the device console and look at the power consumption at the same time as the debug messages.
I brag about it for a reason, and to offer everyone to use this minimal (and very cheap!) Programmer-debugger in their own projects.
You can draw the scheme here ( source code in DipTrace ), pull the firmware down - here (the umdk-rf branch, the UMDK-RF target when building, is based on the dap42 project ). The scheme is drawn though messy, but I hope the main points are clear, the firmware is written in C using libopencm3 and assembled with the usual arm-none-eabi-gcc. As additional functions, the firmware has power management, capturing overload signals from the control keys and entering the controller connected to it into its native bootloader by long pressing the button.
NB: if you want the boot button to set up its own programmer controller in its bootloader in a regular way, it must have the connection polarity changed, the option bytes of the controller on the first boot should be removed and the program input to the bootloader should be changed in firmware functions of this button.
You can look at how current is measured on a pair of op amps with different gain factors (for example, to improve the debugger described above for your needs), here (p. 9), a more traditional alternative is with one opamp and an expensive 24-bit ADC - TI has (EnergyTrace on page 5).
PS Please note that when debugging with a connected UART or JTAG / SWD, a small current can also flow through their legs, which will not be possible during actual operation of the device. So, UMDK-RF in SWD leaks about 15 µA (and therefore, in the title picture, measurements with a multimeter are made on the old version of the board, without SWD), and on STM32 Nucleo there were also cases with parasitic consumption through SWD of about 200 µA . The debugging boards used for measurement should be checked for such features — either by disconnecting their interface lines, if there is such a possibility, or by comparing the results with the device consumption, measured without installation for debugging, for example, with a multimeter in static mode.
Instead of conclusion
I hope you already understand what mistake you made by choosing programming microcontrollers as your main specialty.