Defining Dominant Colors: Python and the k-Means Method


    Assorium

    On Habré several articles were published with algorithms and scripts for choosing the dominant colors in the image: 1 , 2 , 3 . In the comments to those articles, you can find links to a dozen similar programs and services. But there is no limit to perfection - and why not consider a method that seems most optimal? We are talking about using clustering using the k-means method (k-means).

    Like many before him, American web developer Charles Leifer used the k-means methodfor clustering colors in the image. The idea of ​​the method for clustering any data is to minimize the total quadratic deviation of the points of the clusters from the centers of these clusters. At the first stage, the starting points (centers of mass) are randomly selected and the affiliation of each element to a particular center is calculated. Then, at each iteration of the execution of the algorithm, the centers of mass are recalculated - until the algorithm converges.

    The result is something like this. The points are colored, depending on the color of the cluster, black points represent the centers of mass.



    As applied to images, each pixel is positioned in a three-dimensional RGB space, where the distance to the centers of mass is calculated. For optimization, images are reduced to 200x200 using the libraryPIL . It is also used to extract RGB values.

    The code

    from collections import namedtuple
    from math import sqrt
    import random
    try:
        import Image
    except ImportError:
        from PIL import Image
    Point = namedtuple('Point', ('coords', 'n', 'ct'))
    Cluster = namedtuple('Cluster', ('points', 'center', 'n'))
    def get_points(img):
        points = []
        w, h = img.size
        for count, color in img.getcolors(w * h):
            points.append(Point(color, 3, count))
        return points
    rtoh = lambda rgb: '#%s' % ''.join(('%02x' % p for p in rgb))
    def colorz(filename, n=3):
        img = Image.open(filename)
        img.thumbnail((200, 200))
        w, h = img.size
        points = get_points(img)
        clusters = kmeans(points, n, 1)
        rgbs = [map(int, c.center.coords) for c in clusters]
        return map(rtoh, rgbs)
    def euclidean(p1, p2):
        return sqrt(sum([
            (p1.coords[i] - p2.coords[i]) ** 2 for i in range(p1.n)
        ]))
    def calculate_center(points, n):
        vals = [0.0 for i in range(n)]
        plen = 0
        for p in points:
            plen += p.ct
            for i in range(n):
                vals[i] += (p.coords[i] * p.ct)
        return Point([(v / plen) for v in vals], n, 1)
    def kmeans(points, k, min_diff):
        clusters = [Cluster([p], p, p.n) for p in random.sample(points, k)]
        while 1:
            plists = [[] for i in range(k)]
            for p in points:
                smallest_distance = float('Inf')
                for i in range(k):
                    distance = euclidean(p, clusters[i].center)
                    if distance < smallest_distance:
                        smallest_distance = distance
                        idx = i
                plists[idx].append(p)
            diff = 0
            for i in range(k):
                old = clusters[i]
                center = calculate_center(plists[i], old.n)
                new = Cluster(plists[i], center, old.n)
                clusters[i] = new
                diff = max(diff, euclidean(old.center, new.center))
            if diff < min_diff:
                break
        return clusters

    Examples













    Identifying dominant colors is a pretty useful thing that can always be used. This is a palette choice for a website or some UI elements. For example, the Chrome browser uses the k-means method to select the dominant color from the favicon.


    Also popular now: