# Visual linear approximation using Gnuplot

It is said that nonlinear approximation is an art, but sometimes it is not easy to deal with the usual linear one.

Many probably remember that the simplest and most accurate method of constructing direct least squares is a "transparent ruler by eye." Previously, when counting on calculators, this method allowed you to save many hours of monotonous calculations, but now for obviously linear processes this is no longer relevant, even Excel can instantly calculate and draw approximations.

However, when solving real problems, one often has to deal with processes for which the model is unknown. In such cases, it is wise to construct piecewise linear approximations. And here, when the exact construction criteria simply do not exist - the “transparent ruler” method, based on the “art of approximation” (in simple terms, the scent), becomes relevant again.

Printing charts and drawing straight lines with a pencil and a transparent ruler on them still works (and sometimes it's even interesting to draw like that). And here we will use Gnuplot - he knows how to draw data in various representations, knows how to calculate approximations, and at the same time leaves enough room for the user to maneuver.

As an example of “vital” data with an unknown model, let us consider the time dynamics of the body mass index (BMI) of the girls of the month of Playboy magazine. The challenge is to catch the dynamics of the general trends.

Initial data taken from an article by Vadim Markov (@BubaVV) "« Correlations for beginners "" . Link to image files in Gnuplot will be given below. A small remark on the data: according to the meaning of the data on the X axis, time (months) is postponed, but in order not to complicate the task, we will use not the time, but simply the serial number of the record.

To begin with, we construct the set of available points and draw a linear trend across all points. We immediately note the problem areas with a question mark.

There is clearly something wrong with the linear approximation, it seems that the trend has changed in the process. We construct a quadratic approximation, which allows us to catch the change in the angle of inclination of straight lines.

The quadratic approximation looks better (well, still, plus one parameter). It can be seen that the linear trend is changing in the middle of the set, we mark this area with a colored circle. To the left of the circle is one character of dynamics, to the right is another, for better perception, the right and left areas are also marked with different colors.

Points in the left area will be approximated by one straight line, and in the right by another. At the same time, along the X axis, we’ll put timestamps instead of abstract numbers, we don’t need many details, we’ll note several years.

Now, at least in appearance, the approximation by the straight lines is quite good, the values of the line parameters can be taken from the log file that Gnuplot writes in the approximation process.

Summary. Without calculating anything, just looking at the graphs and drawing lines, we determined the main trends in the dynamics of the model. By the way, I wonder what happened in the region of 1985, that girls with a higher BMI began to come into fashion?

PS. All data and files for building pictures in Gnuplot can be downloaded at: drive.google.com/file/d/0BwHQSqFOG-7lU1BfbkdqTTFxdkU/view?usp=sharing

PPS. For the sake of interest - this will look like an approximation by a polynomial of the 4th degree. Judging by the schedule, it makes sense to see if the trend for thinner models appears again in fashion.

Many probably remember that the simplest and most accurate method of constructing direct least squares is a "transparent ruler by eye." Previously, when counting on calculators, this method allowed you to save many hours of monotonous calculations, but now for obviously linear processes this is no longer relevant, even Excel can instantly calculate and draw approximations.

However, when solving real problems, one often has to deal with processes for which the model is unknown. In such cases, it is wise to construct piecewise linear approximations. And here, when the exact construction criteria simply do not exist - the “transparent ruler” method, based on the “art of approximation” (in simple terms, the scent), becomes relevant again.

Printing charts and drawing straight lines with a pencil and a transparent ruler on them still works (and sometimes it's even interesting to draw like that). And here we will use Gnuplot - he knows how to draw data in various representations, knows how to calculate approximations, and at the same time leaves enough room for the user to maneuver.

As an example of “vital” data with an unknown model, let us consider the time dynamics of the body mass index (BMI) of the girls of the month of Playboy magazine. The challenge is to catch the dynamics of the general trends.

Initial data taken from an article by Vadim Markov (@BubaVV) "« Correlations for beginners "" . Link to image files in Gnuplot will be given below. A small remark on the data: according to the meaning of the data on the X axis, time (months) is postponed, but in order not to complicate the task, we will use not the time, but simply the serial number of the record.

To begin with, we construct the set of available points and draw a linear trend across all points. We immediately note the problem areas with a question mark.

There is clearly something wrong with the linear approximation, it seems that the trend has changed in the process. We construct a quadratic approximation, which allows us to catch the change in the angle of inclination of straight lines.

The quadratic approximation looks better (well, still, plus one parameter). It can be seen that the linear trend is changing in the middle of the set, we mark this area with a colored circle. To the left of the circle is one character of dynamics, to the right is another, for better perception, the right and left areas are also marked with different colors.

Points in the left area will be approximated by one straight line, and in the right by another. At the same time, along the X axis, we’ll put timestamps instead of abstract numbers, we don’t need many details, we’ll note several years.

Now, at least in appearance, the approximation by the straight lines is quite good, the values of the line parameters can be taken from the log file that Gnuplot writes in the approximation process.

Summary. Without calculating anything, just looking at the graphs and drawing lines, we determined the main trends in the dynamics of the model. By the way, I wonder what happened in the region of 1985, that girls with a higher BMI began to come into fashion?

PS. All data and files for building pictures in Gnuplot can be downloaded at: drive.google.com/file/d/0BwHQSqFOG-7lU1BfbkdqTTFxdkU/view?usp=sharing

PPS. For the sake of interest - this will look like an approximation by a polynomial of the 4th degree. Judging by the schedule, it makes sense to see if the trend for thinner models appears again in fashion.