Voice control: digital signal processing on 8-bit AVR using hard assembler
So, it's time to tell you something more interesting than simple crafts on AVRkami. In this article, I will tell you how to assemble a device on the ATS that performs quite serious processing of an audio signal in order to recognize voice commands.
First of all, I would like to show the result that I have come to.
The device is assembled in iron and is fully operational. The recognition probability is, of course, very low, but for such a device this is already a great achievement - I recall that its core is the 8-bit ATMega88 microcontroller, with a frequency of 20 MHz and without any DSP instructions. The device recognizes two teams (the number of teams can be expanded to a dozen, resources allow), one of which turns on the power load, the other turns it off. In addition, you can turn on / off the load from any IR remote control. The device is capable of switching up to 250V / 8A.
For the most part, I designed it out of academic interest to test whether it is possible to implement such DSP algorithms on a cheap and low-power general purpose microcontroller. The result was quite satisfactory, and the device works at my place 24/7.
If we talk about the appropriateness of using low-power microcontrollers - I’ll say briefly: in principle, it’s better not to)
For a similar task, some junior chip from the dsPIC line will be much more suitable, with sixteen-bit DSP instructions, which are now as cheap as AVRs and come with the same small number of legs. Or something from TI from the MSP430F2xxx line
But if you are also interested in what you can squeeze out of AVR as well as me, this article is for you.
Let's start with the development of the electrical circuit and try to understand what we need from it.
And you need the following:
1) Power supply from 220v. In principle, you can power it even from batteries, but having 220 volts at hand IMHO a more logical decision to take power from there.
2) Switching load in 220v / 5A with control from 5V. I took the amperes offhand, 5A is enough to power some frail kettle with kilowatt consumption. Or with a dozen incandescent bulbs of 100 watts)
3) An additional control is desirable, in case the voice lets you down, or you don’t want to make noise.
4) Capture sound with selected parameters. About the parameters a little later.
The first point is trivial - you can put any power supply you know, because circuit consumption is very low. But since it is turned on 24/7, I chose a simple and reliable half-wave transformer power supply unit, which consists of the TPG-0.7 transformer, which converts 220V to 12V, a diode bridge, a smoothing capacitor and two linear voltage stabilizers, giving me stable 5V and 9V.
5V power, of course, goes to the digital circuitry. But I needed 9V for the analog part, because The maximum voltage that the LM324 opamp can issue is equal to Epit-1.5 volts. It is not difficult to calculate that when it is powered from 5V, a maximum output of 3.5 can be obtained, this did not suit me.
We pass to point two. To switch the load, I chose a reliable and proven solid-state relay S202S02, capable of switching up to 250Vh8A.
It does not contain any mechanical parts, the switching circuit is extremely simple: the relay has 4 pins - two pins to the load, they are in a "normally open" state. When applied to the control pin log 1, the relay closes and conducts current.
Point 3 is also simple. The TSOP1736 integrated IR sensor comes to the rescue, which is a small three-legged miracle that connects to the ground and 5V power with two pins, respectively, and issues a log from the third. 1 when there is no input signal, and log 0 when an input signal is detected. The input signal is modulated IR radiation with a carrier frequency of 36 kHz, which is close to the carrier of most IR remotes. Due to the modulation, the TSOP is quite well protected from extraneous IR noise and constant light such as sunlight.
We turn to the most interesting part, the audio capture part. Immediately present the developed scheme:
So, as I said, the analog part is powered from 9 volts. The scheme is based on the Texas Instruments apnotes devoted to unipolar inclusion of operatives. As an operational amplifier, he chose the LM324, a penny “quad-core” op-amp. You can buy everywhere, no more than 10 rubles, so the entire analog part is built on a single chip.
The signal from the electret microphone, fed through the resistor R4, enters through the decoupling capacitor to the input of the preamplifier, and then to the amplifier (the upper "floor" of the circuit). The amplifiers are turned on by an inverting unipolar circuit, so half the supply voltage from the divider is applied to the non-inverting inputs.
After the first amp, we get an inverted signal, amplified 25 times and shifted by 4.5 volts. After the second (the conder in front of him is not needed, because for him the “ground” is the same 4.5 volts by which we have already shifted the signal), the input signal is inverted again and amplified another 80 times.
The total gain of the two cascades is 2000, i.e. a bipolar signal from a microphone of 2 mV appears before the ADC with a 4-volt signal, shifted by half the supply voltage. Exactly what is needed.
I selected the gain for my specific microphone - of course, if your output is not 2 mV but 20, then the gain should be reduced. And you can completely solder the tuning resistors, and change the gain as needed.
The second "floor" of the circuit - two anti-aliasing filters made according to the topologySalena-Kay , the second order. Since the main speech signal lies at low frequencies, I chose a sampling frequency of 5KHz, which gives us a maximum signal frequency of 2500 Hz. The filters are tuned to a frequency of about 2KHz, which, in combination with the 4th order, provides excellent anti-aliasing filtering.
And with the last step, we chop off the 4.5V DCs coming from the amplifiers using a C10 capacitor and add a new 2.5V DC component to capture the ADC of the controller, which, of course, is powered by 5V and waits for a signal ranging from 0 to 5V.
The last part of the circuit is a controller with a harness:
TSOP1736, a power relay, a pair of control buttons (which I never used in the project), an indicator diode and a programming port are also shown here.
The whole scheme looks like this:
The developed platform turned out to be very convenient for various DSP experiments.
For the analog and digital parts, I had no complaints for all the time of testing and use. But with the power supply, I was slightly mistaken - taking a 12V transformer, I did not take into account that it produces 12 volts at rated load (about 100 mA). And since the circuit consumes much less, the transformer produces not 12 but about 15 volts, which is why the linear stabilizers heat up, especially the one that is at 5V - because a whole dozen falls on it.
Otherwise, the circuit turned out to be very successful, and I often use a similar analog part in my projects. Since the controller is wound up at 20 MHz, and the sampling frequency is 5 KHz, it has 4000 clock cycles for digital signal processing.
That's all for now, in the next part of the article I will talk about the recognition algorithm I implemented. To save time, it was implemented in pure assembler, so get ready)
First of all, I would like to show the result that I have come to.
The device is assembled in iron and is fully operational. The recognition probability is, of course, very low, but for such a device this is already a great achievement - I recall that its core is the 8-bit ATMega88 microcontroller, with a frequency of 20 MHz and without any DSP instructions. The device recognizes two teams (the number of teams can be expanded to a dozen, resources allow), one of which turns on the power load, the other turns it off. In addition, you can turn on / off the load from any IR remote control. The device is capable of switching up to 250V / 8A.
For the most part, I designed it out of academic interest to test whether it is possible to implement such DSP algorithms on a cheap and low-power general purpose microcontroller. The result was quite satisfactory, and the device works at my place 24/7.
If we talk about the appropriateness of using low-power microcontrollers - I’ll say briefly: in principle, it’s better not to)
For a similar task, some junior chip from the dsPIC line will be much more suitable, with sixteen-bit DSP instructions, which are now as cheap as AVRs and come with the same small number of legs. Or something from TI from the MSP430F2xxx line
But if you are also interested in what you can squeeze out of AVR as well as me, this article is for you.
Circuitry
Let's start with the development of the electrical circuit and try to understand what we need from it.
And you need the following:
1) Power supply from 220v. In principle, you can power it even from batteries, but having 220 volts at hand IMHO a more logical decision to take power from there.
2) Switching load in 220v / 5A with control from 5V. I took the amperes offhand, 5A is enough to power some frail kettle with kilowatt consumption. Or with a dozen incandescent bulbs of 100 watts)
3) An additional control is desirable, in case the voice lets you down, or you don’t want to make noise.
4) Capture sound with selected parameters. About the parameters a little later.
The first point is trivial - you can put any power supply you know, because circuit consumption is very low. But since it is turned on 24/7, I chose a simple and reliable half-wave transformer power supply unit, which consists of the TPG-0.7 transformer, which converts 220V to 12V, a diode bridge, a smoothing capacitor and two linear voltage stabilizers, giving me stable 5V and 9V.
5V power, of course, goes to the digital circuitry. But I needed 9V for the analog part, because The maximum voltage that the LM324 opamp can issue is equal to Epit-1.5 volts. It is not difficult to calculate that when it is powered from 5V, a maximum output of 3.5 can be obtained, this did not suit me.
We pass to point two. To switch the load, I chose a reliable and proven solid-state relay S202S02, capable of switching up to 250Vh8A.
It does not contain any mechanical parts, the switching circuit is extremely simple: the relay has 4 pins - two pins to the load, they are in a "normally open" state. When applied to the control pin log 1, the relay closes and conducts current.
Point 3 is also simple. The TSOP1736 integrated IR sensor comes to the rescue, which is a small three-legged miracle that connects to the ground and 5V power with two pins, respectively, and issues a log from the third. 1 when there is no input signal, and log 0 when an input signal is detected. The input signal is modulated IR radiation with a carrier frequency of 36 kHz, which is close to the carrier of most IR remotes. Due to the modulation, the TSOP is quite well protected from extraneous IR noise and constant light such as sunlight.
We turn to the most interesting part, the audio capture part. Immediately present the developed scheme:
So, as I said, the analog part is powered from 9 volts. The scheme is based on the Texas Instruments apnotes devoted to unipolar inclusion of operatives. As an operational amplifier, he chose the LM324, a penny “quad-core” op-amp. You can buy everywhere, no more than 10 rubles, so the entire analog part is built on a single chip.
The signal from the electret microphone, fed through the resistor R4, enters through the decoupling capacitor to the input of the preamplifier, and then to the amplifier (the upper "floor" of the circuit). The amplifiers are turned on by an inverting unipolar circuit, so half the supply voltage from the divider is applied to the non-inverting inputs.
After the first amp, we get an inverted signal, amplified 25 times and shifted by 4.5 volts. After the second (the conder in front of him is not needed, because for him the “ground” is the same 4.5 volts by which we have already shifted the signal), the input signal is inverted again and amplified another 80 times.
The total gain of the two cascades is 2000, i.e. a bipolar signal from a microphone of 2 mV appears before the ADC with a 4-volt signal, shifted by half the supply voltage. Exactly what is needed.
I selected the gain for my specific microphone - of course, if your output is not 2 mV but 20, then the gain should be reduced. And you can completely solder the tuning resistors, and change the gain as needed.
The second "floor" of the circuit - two anti-aliasing filters made according to the topologySalena-Kay , the second order. Since the main speech signal lies at low frequencies, I chose a sampling frequency of 5KHz, which gives us a maximum signal frequency of 2500 Hz. The filters are tuned to a frequency of about 2KHz, which, in combination with the 4th order, provides excellent anti-aliasing filtering.
And with the last step, we chop off the 4.5V DCs coming from the amplifiers using a C10 capacitor and add a new 2.5V DC component to capture the ADC of the controller, which, of course, is powered by 5V and waits for a signal ranging from 0 to 5V.
The last part of the circuit is a controller with a harness:
TSOP1736, a power relay, a pair of control buttons (which I never used in the project), an indicator diode and a programming port are also shown here.
The whole scheme looks like this:
Schematic Development Results
The developed platform turned out to be very convenient for various DSP experiments.
For the analog and digital parts, I had no complaints for all the time of testing and use. But with the power supply, I was slightly mistaken - taking a 12V transformer, I did not take into account that it produces 12 volts at rated load (about 100 mA). And since the circuit consumes much less, the transformer produces not 12 but about 15 volts, which is why the linear stabilizers heat up, especially the one that is at 5V - because a whole dozen falls on it.
Otherwise, the circuit turned out to be very successful, and I often use a similar analog part in my projects. Since the controller is wound up at 20 MHz, and the sampling frequency is 5 KHz, it has 4000 clock cycles for digital signal processing.
That's all for now, in the next part of the article I will talk about the recognition algorithm I implemented. To save time, it was implemented in pure assembler, so get ready)