Hands free, but not the phone. Obedient home when hands are not enough

From the sandbox

Hello Community!

I had a desire to dig deeper with the microcontroller and do something useful. The goal was formed almost immediately, because in the apartment something strained me.

As you know, a computer desk is also a dinner table, in order to watch Drobyshevsky or read Giktayms / Green Cat / etc. simultaneously with dinner. But there is a problem - I usually leave the kitchen with both hands busy, back too , ~~because cups are accumulated in 3 pieces.~~ Turning on and off the light in the kitchen (triple switch - kitchen / bath / toilet) has a shoulder, nose, little finger. That is inconvenient in any way, but it is impossible to rearrange it below. There was a task to manage somehow remotely.

All kinds of presence and passage sensors were swept away immediately - not that accuracy, there is no control at the will of the owner. The solution is found in the sound control, voice. I will say right away that I didn’t plan to make a speech reenactment, it is not needed here. The light that turns on cotton is described in Radio 80s, but I didn’t want to do that. It turned out a kind of handsfree, when the hands are busy. Details - next.

Hardware part

There was a board with Atmega32 with quartz and SEM0007M-32A peripherals and a scattering of electronics.

There was a microphone and an operational amplifier. For the output - a transistor in the sot32 package on the board, there is also a relay for 7 Amps. Everything is collected upihano in a box for business cards, the relay is paralleled with a switch, the microphone is hidden under the outlet. The scheme is banal, I did not even draw it. Only one analog input and one discrete output of MK are used. SEM is redundant, but for now let it be.

Fee and untied wires. Then he redid the neater.

The switch itself, the microphone is not visible under the dismantled outlet.

The search algorithm.

Purpose: the sensor must respond to a word, for example “ light! "With a minimum of code.
Task: to reveal the command word against the background of possible noise, knocking, clicks with the same switch. That is, simply amplitude analysis does not fit, but spectral analysis showed that there are too many harmonics in the word and they certainly change. Therefore, it was necessary to look for a simple solution, but with an acceptable noise protection. You can make several time-frequency filters and comparison with a sample word, but there is no need to engage in recognition. It was decided to analyze the presence of only a vowel sound, for example, the sound “E” or “E”.

Sound "E". You can see a lot of harmonics, because of this, the analysis is difficult.

Sound "A". The spectrum looks cleaner, there is a main frequency.

Software part

In order to know the spectral components of the signal, you can use digital filters. On the Internet, there is a good program for building digital FIR and IIR filters and calculating their coefficients - it’s clearly there and the C code is generated automatically.

But I refused from digital filters

For acceptable filtering (4 or more orders of magnitude), many coefficients were obtained, and even float. Something like this, plus all the calculations of the filter in the float:

float ACoef[NCoef+1] = {
        0.00000347268864059354,
        0.00000000000000000000,
        -0.00001389075456237415,
        0.00000000000000000000,
        0.00002083613184356122,
        0.00000000000000000000,
        -0.00001389075456237415,
        0.00000000000000000000,
        0.00000347268864059354
    };
    float BCoef[NCoef+1] = {
        1.00000000000000000000,
        -7.09708794063733790000,
        22.77454294680684300000,
        -43.03321836036351300000,
        52.29813665034108500000,
        -41.84199842886619100000,
        21.53121088556695300000,
        -6.52398963450972500000,
        0.89383378261285684000
    };

The microcontroller might have done it, but there would be problems with debugging - it’s not easy to push the boundaries of the filter - these are new coefficients.

After some searches, I stopped at a single-frequency Fourier transform online. That is, the classical discrete Fourier transform, performed on the arrival of each signal sample with a sampling frequency (1600 Hz), does not pass through the frequencies, the frequency is one, so it is easy to adjust via RS-232 during adjustment. As a result, the analysis was made for a frequency of 128 Hz.

Due to short samples (blocks) and a rectangular window - the frequency resolution is low, which gives selective sensitivity in the range of 114 ... 140 Hz, and this is the P-filter that I wanted to get.

First you have to understand where the ~~screaming~~ starts voice command signal. To do this, first calculate the zero level of the signal, through exponential smoothingwith smoothing constant 1/64. The code is below.

Part of the timer code for signal processing. 1600 Hz Timer Frequency

Сигнал нормализуется к среднему. Для определения уровня интенсивности звука абсолютные значения сигнала также усредняются с константой 1/16, для ВЧ-фильтрации от отдельных полуволн сигнала (это аналог RMS, но проще в вычислениях). Превышение этого уровня над порогом является началом голосовой команды, и начинается последовательный анализ 5-ти блоков по 135 отсчётов (84,3 мс).

// Timer 0 output compare interrupt service routine
interrupt [TIM0_COMP] voidtimer0_comp_isr(void)
{
    a =  adc_data[0]  << 2 ;   // считывание с АЦП и умножение на 4 для улучшения точности.
    a0 = (a0*63 + a+ 63) >> 6;    // экспон. сглаживание для "0".  сход до 10% за 150 отсчётов 
    ae =  (int)(a - a0);      //  
    a = ae;   // теперь это нормализованное значение к среднему. 
    ae = abs(ae);
    if (ae < 32) {  // обнуление ошибки при целочисленном усреднении
      ae = 0;
    };
    d = (int)((15 * (longint) d + ae + 15) >> 4);  // средний эксп. уровень   // сход до 10% за 35 отсчётовif (d > 100)  { //превышение порога уровнем сигналаif (snd == 0) {  Yz=0;   snd++; } // начинается первый блок
       PORTB.1 = 1;  // зажигается светодиод индикатор превышения порога
    };
.....

The figure below shows the signal, signal level, threshold and 5 blocks.

Interference protection

The signal is divided into blocks for protection against a pulse — a click or a knock. A pulse, as is well known, has a uniform frequency response, that is, in any frequency band there will be a non-zero result and probably above the threshold. But ~~hair lengths, but the~~ impulse ~~mind~~ is short. That is, there will be no more impulse in the next block, which means that the level in the frequency band will be below the threshold. At the same time with this advantage, short blocks give low frequency resolution . Therefore, some frequency differences in the signal still fall into the selected frequency line.

Frequency conversion

In each block, a single-frequency Fourier transform is performed — a transform for one frequency f.

Traditionally, to speed up the calculations, the sin and cos functions are made tabular and scaled to -127 .. + 127.
The ps array index is si(ps) calculated from the sin argument (2 * π * f * t / T), of course, with a loopback within one period. The index pc for cos (2 * π * f * t / T) is simply shifted 12 positions forward in the same array si.

Result Y - the level of the spectral line is obtained as the sum of the absolute values of the real and imaginary parts during one block.

So that

по-правильному надо делать сумму квадратов, и корень, но это жуть для 8-разрядного МК.

In the same timer:

    a_si =   (longint) (a * si[ps]) >> 4;  // a*sin 
    a_co =  (longint) (a * si[pc]) >> 4; // a*cos
    Ysi = Ysi + a_si  ;    //сумма 
    Yco = Yco + a_co;   
    Y = (labs(Ysi) + labs(Yco)) >> 7;    //  деление на 128  для помещения в int и передачи по rs-232.

At the end of each block, Y is compared with the threshold, the number of blocks with exceeding the threshold — the activated blocks is calculated. After the experiments, it turned out that the minimum number of triggered blocks is 3 out of 5.

An example of the spectral intensity in blocks with voice command. The team has passed.

Three or more triggered blocks are interpreted as a correctly accepted command. The signal at the discrete output of the MC is inverted, turning on or off the light. Since the entire analysis takes place inside the blocks, there is no delay after the last block.

The computation time is about 1600 clock cycles, the timer is called every 9000 clock cycles, so the workload of the MC is low - there is a place for further experiments with recognition. Or you can make a complete solution of a smaller size and on a weak MK.

The control of the correctness of the algorithms was carried out through the exchange of the necessary variables (log) via RS-232 with the program on VBasic. The frequency f and thresholds are stored in the eeprom.

As a result: the sensor turned out to be very convenient, it responds to words from “ A ”, for example, “ Waaau ”, “ Taaam ”, “ Laait ”, “ Yaaaau ”, “ Yao-Yao ” . Volume is normal for human conversation. The word " Shine " stubbornly refuses to listen. Clicks, knocking doors, steps, pouring water ignores. Now you can walk with full hands cups and plates)).

Tags: