Data-driven decision on the example of choosing the color for painting walls

    Starting to choose their own color for painting the walls in the room, I was faced with an interesting thing. From the very beginning, this whole process began to resemble work on some IT-ML-Blah-blah-blah-analytical project.

    There is also a customer who does not really understand what he wants, but he wants everything to be good and to be pleasant to him. There are still a few interested parties on the part of the customer who cannot agree on what is “good”. There are some reformulations of the problem that, under a big question, are relevant to this very “good,” but at least somehow solvable. There is a selection of solutions and attempts to implement them. There is iteration, which implicitly, but monotonously, leads to some kind of solution that would suit everyone. And there are some strange conclusions that could hardly be made in a “real” project, because due to general nervousness and participation in the process of money, the focus rarely stops at these points of the process.

    In general, if you look at the choice of color in the room, as an analytical process, it can be interesting.

    Formulation of the problem

    There are things more disgusting and annoying than a few people choosing colors in which to paint the wall in the room, but they are few. Most often they are associated with any diseases or injuries.

    The methodical viewing of colors causes a sea of ​​associations with hospitals, supermarkets, state institutions, Soviet kitchens in communal apartments and public toilets. And if it doesn’t call you, then the interlocutor for sure had exactly this color had a wall in the school corridor, where the interlocutor had spent the worst years of his life. Irritation tends to increase, and the decision is not made.

    Thus rises the task. The quality criteria for solving this problem are as follows:

    KPI:The degree of satisfaction of all interested persons from the color of the wall in the room. Positive KPI is directly unattainable before painting as such (but you don’t want to repaint 10 times), so you have to use something else.

    Realistic KPI: The degree of satisfaction of all interested persons in choosing a color for painting a wall (it is worth noting right away that satisfying choice of color does not guarantee satisfaction of this color on the wall, but there is no particular choice. Decisions need to be made in conditions when the result of decisions is still unknown)

    Input data:

    • Room
    • RGB palette
    • Interested parties who choose a color for the room

    Well, here begins the process, familiar to anyone implementing corporate projects.

    And I'm not doing a finger here, I think of something clever-mathematical now!

    Look at the colors and see clearly how it will look on the wall, can mostly only people with artistic education. But they are not us. We need to figure out how to relate the colors to the walls so that the decision is made faster and somehow more substantiated. And is our salvation. You can take a panoramic picture of the room, erase the walls with a photo and then paint them with something. Maybe this will help to make a decision that suits everyone? In our case, it all started with this:

    Here it is a room with poorly worn walls. Now you can paint the walls in anything and see whether we like it or not. For example, to do something like this:

    And it feels great.

    But no!

    The world of pink ponies with torn eyes is not about us. Discussion of stakeholders on the choice of color when applying it to the photo also quickly comes to a standstill. A couple of hours of messing around with paint and no consensus. But! We have some tool. You can use it. If we don’t work out, maybe there are people who can do it for us?

    Expert solution

    The experts were called up all interested VK subscribers. The stake was on the fact that the year 2018 is outside, but still we have a country of councils. Just let everyone start advising.

    For the above masterpiece of photo art, a “site” was made, which contained a photograph of the room and a color picker which allowed changing the color of the walls. Link to the site was posted in VK with a request to help with the choice of color.

    Experts expressed a number of opinions, shared their options. Alas, this did not lead to a consensus, but caused a number of shifts in aesthetic priorities among stakeholders. And these shifts further allowed us to reach a consensus.

    Path to Machine Learning

    The experts had two problems.

    • Expert opinion did not always coincide with the opinion of the interested person,
    • There were few experts and it was impossible to rake all their opinions into a pile to make one average decision.

    Together, both problems did not give the possibility of a decision.

    • If there were a lot of experts, they could have been crushed by the amount of some option,
    • If there were few of them, but everyone’s opinion coincided with stakeholders, one could simply agree with the experts.

    But the very formulation of the problem of expert opinion gave the key to the next iteration.

    Must have a lot of ratings. If there are a lot of them, and all the votes are for something concrete, it is easier to accept it. But how to do that?

    For example, to make a small desktop application that would apply random gamut variants to a picture, and interested people would rate these variants from 0 to 10. Thus, more than 300 different color scores were compiled in the format:

    $ \ begin {pmatrix} r \ in [0,255] \\ g \ in [0,255] \\ b \ in [0,255] \\ V \ in [0,10] \ end {pmatrix} $

    Basically, colors were rated at 0, of course.

    Now we have a sample with estimates, although not very convenient for analysis. We can reformulate that V is from 0 to 10; this is in fact not an estimate of color, but the number of people who voted for this option. And transform 4-dimensional vector$ (r, g, b, V) ^ T $ in v 3 dimensional $ (r, g, b) ^ T $. Naturally, the zeros disappear. And each vector is now an equal voice for the corresponding shade. It is convenient to add at the same time to the RGB values ​​of random small offsets, so that a lot of very similar shades do not work out.

    In such a formulation, a typical formulation of the problem of finding the maximum density of a multidimensional distribution is obtained. Those. we are looking for an area in which people would “vote” most often.

    If we want to increase the influence of high marks, we can take V² when converting 4-d to 3-d. Then, the rating of 2 turns into 4 votes, and the rating of 4 into 16 votes. Thus, we can reduce the influence of those colors for which we voted, but not directly.


    The very two simple things that come to mind are that, probably, you can approximate the resulting distribution with a multidimensional normal distribution or a quadratic function in 3-d (something dome-shaped with a pronounced maximum). Not that everything there really was beautifully described by such simple functions, but in this case you could definitely close your eyes a little.

    To approximate the multidimensional normal , you only need to estimate according to the data the covariance matrix and the expectation for all marginal distributions.. After that, you can analytically calculate the maximum, but it is much easier (for the brain) to simply walk through all the possible color combinations with some step, substituting the distribution parameters and color values ​​into the distribution formula. Then take the maximum point.

    $ max (N (\ boldsymbol {\ mu}, \ boldsymbol \ Sigma, \ boldsymbol {Color})), where \\ $

    $ \\ \ boldsymbol {\ mu} - vector \ spacelamps, \\ \ boldsymbol \ Sigma - matrix \ spacecovariation, \\ \ boldsymbol {Color} = perm ((r, g, b) ^ T), \\ perm ( ) - all \ space possible \ space \ specified \ space permutations \ space flowers, \\ $
    Approximation by multidimensional normal and the search for its maximum gave a good blue-violet color of the desired degree of haze: It is

    even easier to approximate by a square.

    To do this, the function of the form for all possible variations was set in the matlab$ i, j, k $:

    $ С (r, g, b, \ boldsymbol {c}) = \ sum_ {i, j, k} c_ {ijk} r ^ ig ^ jb ^ k, (i, j, k) \ in N, i + j + k \ le2 $

    For all possible data, the coefficients changed. $ c_ {ijk} $and by simulating annealing, the quadratic error of the density of points in the vicinity was minimized:

    $ min (\ sum_ {D} (C (r, g, b, \ boldsymbol {c}) - p (r, g, b))) ^ 2), where $

    $ \ space p (r, g, b) - density \ space of voices \ space in \ space neighborhood \ space points \ space (r, g, b).  \\ $
    After adjusting the polynomial coefficients, the maximum was considered by calculating the derivative of the function $ -C $using the symbolic toolbox and further roots from the derivative. There were several roots, but one was suitable for the parameters of color restrictions.

    The approximation of a quadratic function and the search for its extremum gave a similar variant with a normal distribution. A little more blue and less red:

    But sometimes you can not think about approximations and just write a neural network, which feed all the data. It makes no sense to talk about the parameters of the neural network, but there were many options, many architectures and several different distortions over the variables. As a result, it turned out to pull out a fairly stable opinion of her regarding the best color. The previous assessments of the “goodness” of colors were arranged in almost everything except green. Green, she generously added.

    By the way, if we reformulate the density as classes, for example, looking for some values ​​greater than 5, and using the Gradient Boosting Classifier to find areas of space where positive estimates are most dense, it turned out about the same.

    And here again the problem manifested itself, which is often manifested in real projects during the presentation of the so-called “intermediate results”. It sounds like this:

    This is it, of course, everything is very good: neural networks, distribution, but for some reason this is not what I want ... It would be desirable that, as it were, good.

    That moment is very interesting here, that what is assessed as beautiful in the photo is not necessarily what you want to have on your wall, if you evaluate it more carefully again!

    So in real projects, some indicators, reliance on which had previously seemed objective and constructive, suddenly somehow get discouraged when getting a result on them (even if, in general, the numbers work out well). And there remains one. Find all the good things that we did and present some beautiful pictures to suit everyone.


    Let's somehow beautifully draw, as voted for the color? For example, we will make a 3-d scatter plot in the masterlab for HSV (255,255,255), where the places for which we voted will be represented with balls of the desired color. The bigger the ball, the more voted. In our case, it looked like this:

    Sorry for not bringing the HSV to 360/100/100, it was too lazy, but everything from 0 to 1 in the matlab.

    The bigger the ball, the more like the color.

    From the projections of this graph, it becomes obviously, why different methods gave the results that they gave. For example:

    Chroma / Hue Projection

    From the Chroma / Hue projection, it can be seen that if we build a Gaussian or approximate a quadratic function, the maximum will be somewhere between blue and violet. In this case, the maximum color will be around 100 and very bright colors are not selected. A neural network in which the space will be quite complicated with a large number of local extremes, has the ability to take into account the green part. Apparently there was a pinkish peak somewhere, but a little lower, so I didn’t get to the final result. Etc.

    Looking at the visualization revealed 3 notable pseudo-clusters.

    The averaging abilities of all coarse methods joyfully merged light blue and dark blue into one medium blue result. At the same time, the logic prompted that you should not paint the room in dark colors (even if they look at the photo), so this cluster should be discarded. And just nobody really liked green because it caused all sorts of hospital-communal associations with a long look. Why is there any kind of math here, huh?

    In the remaining cluster, you just had to take the conditional mean and finish it somehow with your hands.

    On this, all interested parties agreed that, in general, this is a compromise that is worth staying on.

    After another couple of weeks, reality did not distort the expectations too much (although it was somewhat pinker than the reality is supposed to happen) ...

    This pinkish color in itself is, of course, a thing in doubt, but we ourselves wanted it. Now it remains only to buy furniture of a suitable color. Hm But why not think about the automatic optimal arrangement of furniture in the room? This is a typical convex optimization problem ...


    Being at the same time on the part of the customer and the performer, I was surprised to see how I could not find myself with myself on the issue in which I was on both sides, sort of interested. Also, I was surprised to see how those things that I considered objective and directly inferred from data that satisfy me as a performer may not satisfy me as a customer either.

    The madness here is added by the fact that I compiled the data myself, directly indicating to them what it is that can potentially satisfy me. And apparently, I could not compose them fully adequately, for some reason, which I myself are not too clear. For example, I gave high marks to dark colors, although they clearly did not have the opportunity to be chosen in the end. Why? Because it was cute in the picture. The real task was completely different, but choosing the color from more complex considerations on the machine was not realistic.

    Ultimately, in the data that I generated, the solution was hidden. But it was not very clear how it was right to come to the decision, without these strange dances with a tambourine, which corrected my own idea of ​​what I needed (or absolutely not needed). I looked at the visualization with balls and before taking up the search for the maximum using different methods. But "clusters" I saw on it only after.

    Also popular now: