Wellford method and one-dimensional linear regression
One-dimensional linear regression is one of the simplest regression methods (and generally one of the simplest machine learning methods), which allows us to describe the linear dependence of the observed value on one of the attributes. In the general case, in machine learning problems, one has to deal with a large number of different attributes; one-dimensional linear regression in this case selects one of them that allows you to achieve the best correlation with the objective function.
In a previous post in this series, we discussed the accuracy of calculating averages and covariances, and also got acquainted with the Wellford method, which in many cases avoids computational errors in these problems. Today we look at the practical application of the Wellford method in the problem of one-dimensional linear regression.
Content
- 1. One-dimensional linear regression
- 2. Solution for centered samples
- 3. The decision in the General case
- 4. Application of the Wellford method
- 5. Experimental method comparison
- Conclusion
- Literature
1. One-dimensional linear regression
In the one-dimensional linear regression problem, we assume that there are two sequences of real numbers: signs and answers
. In addition, there is a vector of corresponding weights
. As always, we will assume that these sequences contain a potentially infinite number of elements, but for now only
first elements of each of the sequences.
Our task will be to restore a linear relationship between features and answers, that is, to build a linear decision function :
This minimizes the mean-square loss functional:
For analysis, it is easier to work with a formula that is free of radical and normalization:
Since the minimum points for the functionals and
match, such a replacement is correct.
2. Solution for centered samples
For loss functional easy to write derivatives with respect to
and
:
If we equate them to zero, we get:
An important digression. Equating the derivative to zero in this case is correct, because:
1. The loss functional is convex in terms of optimizable parameters, so any point of the local optimum will also be a point of global optimum .
2. The loss functional with respect to optimized parameters is a parabola; therefore, the found extreme points will be minimum points.
If the optimal parameter equal to zero, to find a solution would not be difficult. You may notice that centering, a standard machine learning method of preprocessing a sample, leads precisely to this effect. Indeed, consider the problem for centered variables:
The sum of the weighted attributes is now equal to zero, as well as the sum of the weighted responses:
Then the optimal value of the free parameter is zero:
And this means that the optimal value of the parameter easy to find:
3. The decision in the General case
Now let's try to return to the general case of off-center data. If a Is the decisive function for the centered case, the values of which are determined by the formula
and approximate the value then the next decisive function approximates the quantity
:
Therefore, the solution to the initial problem of one-dimensional linear regression can be written as follows:
Here we use the notation introduced in the last article for the weighted average :
You can understand that such a transition is correct, in another way. If the solution for centered data is optimal, then the parameters and
deliver minimum loss functionality
:
Now we will replace the variables, returning to the off-center data:
The resulting expression describes the value of the loss functional for unbiased data in accordance with the formulas for
and
which we got above. The value of the functional in this case reaches a minimum, therefore, the problem is solved correctly!
4. Application of the Wellford method
Now note that when calculating the parameter the same covariances are used, the calculation of which we dealt with in the previous article . In fact, using the notation from it, you can write:
This means that to calculate the regression coefficient, you need to calculate the covariance twice using the Wellford method. In the process of these calculations, we will simultaneously find the average values needed to calculate the free regression coefficient.
The code for adding another element to the selection consists of updating the averages and variances for the signs and answers, as well as the covariance between the signs and answers:
void TWelfordSLRSolver::Add(constdouble feature, constdouble goal, constdouble weight) {
SumWeights += weight;
if (!SumWeights) {
return;
}
constdouble weightedFeatureDiff = weight * (feature - FeaturesMean);
constdouble weightedGoalDiff = weight * (goal - GoalsMean);
FeaturesMean += weightedFeatureDiff / SumWeights;
FeaturesDeviation += weightedFeatureDiff * (feature - FeaturesMean);
GoalsMean += weightedGoalDiff / SumWeights;
GoalsDeviation += weightedGoalDiff * (goal - GoalsMean);
Covariation += weightedFeatureDiff * (goal - GoalsMean);
}
And this is the method that solves the problem of finding the optimal parameters of one-dimensional linear regression after adding all the elements:
template <typename TFloatType>
void TWelfordSLRSolver::Solve(TFloatType& factor, TFloatType& intercept, constdouble regularizationParameter = 0.1) const {
if (!FeaturesDeviation) {
factor = 0.;
intercept = GoalsMean;
return;
}
factor = Covariation / (FeaturesDeviation + regularizationParameter);
intercept = GoalsMean - factor * FeaturesMean;
}
The value GoalsDeviation
is not used here, but we will need it in future articles.
Combining all the calculations in one class allows you to avoid some overhead. Say, if two objects were used in the implementation for storing secondary and three objects for storing covariances (signs with signs, answers with answers, signs with answers), then the sum of weights would be updated five times for each example from the sample.
5. Experimental method comparison
For practical comparison, I wrote a program that implements various methods for solving problems of one-dimensional and multidimensional linear regression . We will discuss multidimensional regression in the following articles, but now we focus on the one-dimensional case.
We will compare, as usual, "naive" methods, methods based on summation by the Kahan method, and methods based on the Wellford method.
The “naive” methods directly apply formulas for calculating covariances :
voidAdd(constdouble feature, constdouble goal, constdouble weight = 1.){
SumFeatures += feature * weight;
SumSquaredFeatures += feature * feature * weight;
SumGoals += goal * weight;
SumSquaredGoals += goal * goal * weight;
SumProducts += goal * feature * weight;
SumWeights += weight;
}
The class is template and has specializations with counters of type double and type TKahanAccumulator .
In addition, a class has been implementedTTypedBestSLRSolver
that selects the best of the features for constructing a one-dimensional regression model. This is done very simply : the problem of one-dimensional linear regression is solved for each of the signs, and then the best of the resulting models is selected.
To test the developed methods, we will use model data from the LIAC collection . For convenience, some of the data sets are placed in the data directory in a format that the written program understands.
The data in the tasks is "spoiled" in a simple way : the values of attributes and responses are multiplied by a certain number, after which a different number is added to them. Thus, we can obtain a problematic from the point of view of calculations case: large average values in comparison with the scatter values.
In the mode, the research-bslr
selection changes several times in a row, and each time a rolling control procedure is launched on it . The result of the check is the average value of the coefficient of determination for test samples.
For example, for the kin8nm sample, the results are as follows:
injurefactor: 1
injureoffset: 1
fast_bslrtime: 0.001322R^2: 0.27359kahan_bslrtime: 0.002999R^2: 0.27359welford_bslrtime: 0.00432R^2: 0.27359normalized_welford_bslrtime: 0.004288R^2: 0.27359injurefactor: 0.1injureoffset: 10
fast_bslrtime: 0.001256R^2: 0.27359kahan_bslrtime: 0.002948R^2: 0.27359welford_bslrtime: 0.004303R^2: 0.27359normalized_welford_bslrtime: 0.004275R^2: 0.27359injurefactor: 0.01injureoffset: 100
fast_bslrtime: 0.001283R^2: 0.27359kahan_bslrtime: 0.003015R^2: 0.27359welford_bslrtime: 0.004304R^2: 0.27359normalized_welford_bslrtime: 0.004285R^2: 0.27359injurefactor: 0.001injureoffset: 1000
fast_bslrtime: 0.001262R^2: 0.27324kahan_bslrtime: 0.002977R^2: 0.27359welford_bslrtime: 0.004329R^2: 0.27359normalized_welford_bslrtime: 0.00428R^2: 0.27359injurefactor: 0.0001injureoffset: 10000
fast_bslrtime: 0.00128R^2: -59.271kahan_bslrtime: 0.003009R^2: -0.0005269welford_bslrtime: 0.004304R^2: 0.27359normalized_welford_bslrtime: 0.00428R^2: 0.27359fulllearningtime:
fast_bslr 0.006403skahan_bslr 0.014948swelford_bslr 0.02156snormalized_welford_bslr 0.021408s
In this case, a decrease of all values in the sample by 10 thousand times, together with the addition of a value of 10,000 to them, leads to the inoperability of the standard algorithm, even when summing by the Kahan method. Similar results are obtained on other samples, including those that are found in real life in production.
Conclusion
So, today we talked about the problem of one-dimensional linear regression, figured out how to obtain analytical solutions in this problem and how to use the Wellford method to find solutions.
The Wellford method makes solving the problem substantially more resistant to possible problems in the data. However, this method is 2-4 times slower than the standard algorithm, so in practice you need to decide for yourself what is more important at the moment - not to depend on possible problems in the data or to work as quickly as possible.
If there is a need to build models on different data many times and there is no way to control the quality of each received model, I would recommend using the Wellford method.
In the next article, we will talk about using the Wellford method to solve the multidimensional linear regression problem.
Literature
- habrahabr.ru: Accurate calculation of means and covariances by the Welford method
- github.com: Linear regression problem solvers with different computation methods
- github.com: The collection of ARFF datasets of the Connectionist Artificial Intelligence Laboratory (LIAC)
- machinelearning.ru: One-dimensional linear regression