khdavid February 19, 2013 at 21:25

Kalman Filter

Tutorial

On the Internet, including on the hub, you can find a lot of information about the Kalman filter. But it’s hard to find the easily digestible conclusion of the formulas themselves. Without a conclusion, all this science is perceived as a kind of shamanism, formulas look like a faceless set of symbols, and most importantly, many simple statements that lie on the surface of the theory are beyond understanding. The purpose of this article will be to talk about this filter in the most accessible language possible.
The Kalman Filter is a powerful data filtering tool. Its main principle is that when filtering, information about the physics of the phenomenon itself is used. For example, if you filter data from a car’s speedometer, then the inertia of the machine gives you the right to perceive too fast jumps in speed as a measurement error. The Kalman filter is interesting in that, in a sense, it is the best filter. We will discuss in more detail below what exactly the words “best” mean. At the end of the article I will show that in many cases the formulas can be simplified to such an extent that almost nothing remains of them.

Educational program

Before getting acquainted with the Kalman filter, I suggest recalling some simple definitions and facts from probability theory.

Random value

When they say that a random variable is given

, they mean that this quantity can take random values. It takes different values with different probabilities. When you throw, say, bone, then rolled a discrete set of values:

. When it comes, for example, to the speed of a wandering particle, then, obviously, we have to deal with a continuous set of values. The “dropped out” values of a random variable

will be denoted by

but sometimes, we will use the same letter as the random variable:

In the case of a continuous set of values, the random variable is characterized by the probability density

, which dictates to us that the probability that the random variable “drops out” in a small neighborhood of a point

long

is equal to

. As we see from the picture, this probability is equal to the area of the hatched rectangle under the graph:

Quite often in life, random variables are distributed according to Gauss, when the probability density is equal

We see that the function

has the shape of a bell centered at a point

and with a characteristic order width

.
Once we talked about the Gaussian distribution, it would be a sin not to mention where it came from. Just like numbers

and are

firmly established in mathematics and are found in the most unexpected places, the Gaussian distribution has deep roots in probability theory. One remarkable statement partially explaining Gaussian omnipresence is this:
Let there be a random variable

having an arbitrary distribution (in fact, there are some restrictions on this randomness, but they are completely not rigid). Let's conduct

experiments and calculate the sum of the

“dropped out” values of a random variable. We will do many such experiments. It is clear that each time we will receive a different value of the sum. In other words, this sum is itself a random variable with its own specific distribution law. It turns out that for sufficiently large

the distribution law of this sum tends to the Gaussian distribution (by the way, the characteristic width of the "bell" grows as well

). We read in more detail on Wikipedia: the central limit theorem. In life, very often there are quantities that consist of a large number of identically distributed independent random variables, and therefore are distributed according to Gauss.

Mean

The average value of a random variable is what we get in the limit if we conduct a lot of experiments and calculate the arithmetic mean of the values that fell out. The mean value is denoted in different ways: mathematicians like to denote by

(expectation or mean value), and foreign mathematicians by

(expectation). Physicists through

. We will denote on foreign way:

.
For example, for a Gaussian distribution

, the average is

Dispersion

In the case of the Gaussian distribution, we clearly see that the random variable prefers to fall out in a certain neighborhood of its average value

Once again admire the Gaussian distribution

As can be seen from the graph, the characteristic scatter of order values

. How to evaluate this scatter of values for an arbitrary random variable, if we know its distribution. You can draw a graph of its probability density and estimate the characteristic width by eye. But we prefer to go along the algebraic path. It is possible to find the average length (module) of the deviation from the average values

. This value will be a good estimate of the characteristic spread of values

. But you and I very well know that using modules in formulas is one headache, therefore this formula is rarely used for evaluating the characteristic scatter.
A simpler way (simple in the sense of calculations) is to find

. This value is called dispersion, and is often referred to as

. The root of the variance is a good estimate of the spread of a random variable. The root of the variance is also called the standard deviation.
For example, for the Gaussian distribution,

it can be calculated that the variance defined above is

exactly equal

, which means the standard deviation is equal

, which is very well consistent with our geometric intuition.
In fact, a little fraud is hidden here. The fact is that in the definition of the Gaussian distribution, the exponent is an expression

. This two in the denominator stands precisely to ensure that the standard deviation

equals the coefficient

. That is, the Gaussian distribution formula itself is written in a form specially tailored to what we will consider its standard deviation.

Independent Random Variables

Random variables are dependent and not. Imagine that you throw a needle onto a plane and record the coordinates of both ends. These two coordinates are dependent, they are connected by the condition that the distance between them is always equal to the length of the needle, although they are random variables.
Random variables are independent if the result of the loss of the first of them is completely independent of the result of the loss of the second of them. If the random variables

and

independent, the average value of their product is equal to the product of their average values:

Evidence

For example, having blue eyes and graduating from school with a gold medal are independent random variables. If the blue-eyed, for example

a gold medal

, the blue-eyed medalists

This example tells us that if the random variables

and

set their probability densities

and

, the independence of the values expressed in the fact that the probability density

(the first value is dropped

and the second

) is given by:

It immediately follows from this that:

As you can see, the proof is carried out for random variables that have a continuous spectrum of values and are given by their probability density. In other cases, the idea of the proof is similar.

Kalman Filter

Formulation of the problem

Denote by the

value that we will measure, and then filter. It can be a coordinate, speed, acceleration, humidity, stink degree, temperature, pressure, etc.
We begin with a simple example, which will lead us to the formulation of the general problem. Imagine that we have a radio-controlled machine that can only go back and forth. Knowing the weight of the car, shape, road surface, etc., we calculated how the controlling joystick affects the speed

Then the coordinate of the machine will change according to the law:

In real life, we can’t take into account in our calculations small disturbances acting on the car (wind, bumps, pebbles on the road), so the real speed of the car will differ from the calculated one. A random variable will be added to the right side of the written equation

We have a GPS-mounted sensor that tries to measure the true coordinate of the

machine, and, of course, cannot measure it accurately, but measures it with an error

, which is also a random variable. As a result, we get erroneous data from the sensor:

The problem is that, knowing the sensor’s incorrect readings

, find a good approximation for the true coordinate of the machine

. This good approximation will be denoted by

.
In the formulation of the general problem,

anything can be responsible for the coordinate (temperature, humidity ...), and the term responsible for controlling the system from the outside will be designated by

(in the example with a machine

). The equations for the coordinate and sensor readings will look like this:

(1)

Let's discuss in detail what we know:

Is a known quantity that controls the evolution of a system. We know her from our physical model.
Model error and sensor error are random values. And their distribution laws do not depend on time (iteration number ).
The mean values of errors are equal to zero .
The very law of the distribution of random variables may not be known to us, but their variances and are known . Note that variances are independent of , because the laws of distribution are independent of it.
It is assumed that all random errors are independent of each other: what error will be at a point in time is completely independent of the error at another point in time .

It is worth noting that the filtering task is not a smoothing task. We do not strive to smooth the data from the sensor, we strive to get the closest value to the real coordinate

Kalman Algorithm

We will reason by induction. Imagine that at the

ith step we have already found the filtered value from the sensor

, which well approximates the true coordinate of the system

. Do not forget that we know the equation that controls the change in the unknown coordinate:

therefore, not yet getting the value from the sensor, we can assume that at the step the

system evolves according to this law and the sensor will show something close to

. Unfortunately, so far we cannot say anything more accurate. On the other hand, at the step

we will have an inaccurate sensor reading on our hands

.
Kalman's idea is that in order to get the best approximation to the true coordinate

, we must choose the middle ground between the reading

inaccurate sensor and

- our prediction of what we expected from him to see. We will give the sensor a weight

and the weight remains at the predicted value

The coefficient

is called the Kalman coefficient. It depends on the iteration step, so it would be more correct to write

, but for now, in order not to clutter the calculation formulas, we will omit its index.
We must choose the Kalman coefficient

so that the resulting optimal coordinate value

is closest to the true coordinate

. For example, if we know that our sensor is very accurate, then we will trust its reading more and give the value

more weight (

close to one). If the sensor, on the contrary, is not at all accurate, then we will focus more on the theoretically predicted value

.
In the general case, to find the exact value of the Kalman coefficient

, you just need to minimize the error:

We use equations (1) (those with a blue background in the frame) to rewrite the expression for the error:

Evidence

Now is the time to discuss what it means to minimize error? After all, an error, as we see, is itself a random variable and takes on different meanings each time. In fact, there is no unambiguous approach to determining what means that the error is minimal. Just as in the case of dispersion of a random variable, when we tried to estimate the characteristic width of its dispersion, here we choose the simplest criterion for calculations. We will minimize the mean value of the squared error:

We will write the last expression:

key to proof

From the fact that all the random variables in the expression for the

independent and the average values of sensor errors and the model are equal to zero

, it follows that all the "cross" terms are equal to zero

.
Plus, the formulas for variances look much simpler:

and

(since

)

This expression takes its minimum value when (we equate the derivative to zero)

Here we already write the expression for the Kalman coefficient with the step index

, thereby we emphasize that it depends on the iteration step.
We substitute the

minimizing value of the Kalman coefficient into the expression for the mean-square error

. We obtain

.

Our problem is solved. We got an iterative formula to calculate the Kalman coefficient.

All formulas in one place

Example

In the advertising picture at the beginning of the article, data was filtered from a fictitious GPS sensor installed on a fictitious car that travels equally accelerated with a known fictitious acceleration

Look again at the filtering result.

Matlab code

clear all;
N=100  % number of samples
a=0.1 % acceleration
sigmaPsi=1
sigmaEta=50;
k=1:N
x=k
x(1)=0
z(1)=x(1)+normrnd(0,sigmaEta);
for t=1:(N-1)
  x(t+1)=x(t)+a*t+normrnd(0,sigmaPsi); 
   z(t+1)=x(t+1)+normrnd(0,sigmaEta);
end;
%kalman filter
xOpt(1)=z(1);
eOpt(1)=sigmaEta; % eOpt(t) is a square root of the error dispersion (variance). It's not a random variable. 
for t=1:(N-1)
  eOpt(t+1)=sqrt((sigmaEta^2)*(eOpt(t)^2+sigmaPsi^2)/(sigmaEta^2+eOpt(t)^2+sigmaPsi^2))
  K(t+1)=(eOpt(t+1))^2/sigmaEta^2
 xOpt(t+1)=(xOpt(t)+a*t)*(1-K(t+1))+K(t+1)*z(t+1)
end;
plot(k,xOpt,k,z,k,x)

Analysis

If we trace how

the Kalman coefficient changes with an iteration step

, then we can show that it always stabilizes to a certain value

. For example, when the mean square errors of the sensor and the model relate to each other as ten to one, then the graph of the Kalman coefficient depending on the iteration step looks like this:

In the following example, we will discuss how this will help to greatly simplify our life.

Second example

In practice, it often happens that we don’t know anything about the physical model of what we are filtering. For example, you wanted to filter the readings from your favorite accelerometer. You do not know in advance what law you intend to twist the accelerometer. The maximum information you can grab is the variance of the sensor error

. In such a difficult situation, all ignorance of the motion model can be driven into a random variable

But, frankly, such a system no longer completely satisfies the conditions that we imposed on the random variable

, because now all physics of motion unknown to us is hidden there, and therefore we cannot say that at different points in time the model errors are independent of each other and that their average values are equal to zero. In this case, by and large, the Kalman filter theory is not applicable. But, we will not pay attention to this fact, as well, to apply all the stupid colossus formulas, choose the coefficients

and

on the eye, so that the filtered data looked cute.
But you can go a different, much simpler way. As we saw above, the Kalman coefficient always stabilizes to a value

with increasing step number . Therefore, instead of selecting the coefficients and and finding the Kalman coefficient from complex formulas

, we can assume that this coefficient is always a constant, and select only this constant. This assumption spoils almost nothing. Firstly, we already illegally use the Kalman theory, and secondly, the Kalman coefficient quickly stabilizes to a constant. As a result, everything will be greatly simplified. In general, we don’t need any formulas from Kalman’s theory, we just need to select an acceptable value

and insert it into the iterative formula:

The following graph shows data from a fictitious sensor filtered in two different ways. Provided that we do not know anything about the physics of the phenomenon. The first way is honest, with all the formulas from Kalman's theory. And the second is simplified, without formulas.

As we can see, the methods are almost no different. A small difference is observed only at the beginning, when the Kalman coefficient has not yet stabilized.

Discussion

As we saw, the main idea of the Kalman filter is to find a coefficient

such that the filtered value

on average would be the least different from the real coordinate value

. We see that the filtered value

is a linear function of the sensor reading

and the previous filtered value

. And the previous filtered value

is, in turn, a linear function of the sensor reading

and the previous filtered value

. And so on, until the chain is fully deployed. That is, the filtered value depends on all previous sensor readings linearly:

Therefore, the Kalman filter is called a linear filter.
It can be proved that of all the linear filters, the Kalman filter is the best. The best in the sense that the average square of the filter error is minimal.

Multidimensional case

The whole Kalman filter theory can be generalized to the multidimensional case. The formulas there look a little worse, but the very idea of their derivation is the same as in the one-dimensional case. In this wonderful article you can see them: http://habrahabr.ru/post/140274/ .
And in this wonderful video, an example of how to use them is analyzed.

Literature

Kalman's original article can be downloaded here: http://www.cs.unc.edu/~welch/kalman/media/pdf/Kalman1960.pdf .
This post can also be read in English http://david.wf/kalmanfilter

Tags:

Kalman Filter

Educational program

Random value

Mean

Dispersion

Independent Random Variables

Kalman Filter

Formulation of the problem

Kalman Algorithm

Example

Analysis

Second example

Discussion

Multidimensional case

Literature

Also popular now: