alizar October 25, 2012 at 03:06

Defining Dominant Colors: Python and the k-Means Method

^Assorium

On Habré several articles were published with algorithms and scripts for choosing the dominant colors in the image: 1 , 2 , 3 . In the comments to those articles, you can find links to a dozen similar programs and services. But there is no limit to perfection - and why not consider a method that seems most optimal? We are talking about using clustering using the k-means method (k-means).

Like many before him, American web developer Charles Leifer used the k-means methodfor clustering colors in the image. The idea of the method for clustering any data is to minimize the total quadratic deviation of the points of the clusters from the centers of these clusters. At the first stage, the starting points (centers of mass) are randomly selected and the affiliation of each element to a particular center is calculated. Then, at each iteration of the execution of the algorithm, the centers of mass are recalculated - until the algorithm converges.

The result is something like this. The points are colored, depending on the color of the cluster, black points represent the centers of mass.

As applied to images, each pixel is positioned in a three-dimensional RGB space, where the distance to the centers of mass is calculated. For optimization, images are reduced to 200x200 using the libraryPIL . It is also used to extract RGB values.

The code

from collections import namedtuple
from math import sqrt
import random
try:
    import Image
except ImportError:
    from PIL import Image
Point = namedtuple('Point', ('coords', 'n', 'ct'))
Cluster = namedtuple('Cluster', ('points', 'center', 'n'))
def get_points(img):
    points = []
    w, h = img.size
    for count, color in img.getcolors(w * h):
        points.append(Point(color, 3, count))
    return points
rtoh = lambda rgb: '#%s' % ''.join(('%02x' % p for p in rgb))
def colorz(filename, n=3):
    img = Image.open(filename)
    img.thumbnail((200, 200))
    w, h = img.size
    points = get_points(img)
    clusters = kmeans(points, n, 1)
    rgbs = [map(int, c.center.coords) for c in clusters]
    return map(rtoh, rgbs)
def euclidean(p1, p2):
    return sqrt(sum([
        (p1.coords[i] - p2.coords[i]) ** 2 for i in range(p1.n)
    ]))
def calculate_center(points, n):
    vals = [0.0 for i in range(n)]
    plen = 0
    for p in points:
        plen += p.ct
        for i in range(n):
            vals[i] += (p.coords[i] * p.ct)
    return Point([(v / plen) for v in vals], n, 1)
def kmeans(points, k, min_diff):
    clusters = [Cluster([p], p, p.n) for p in random.sample(points, k)]
    while 1:
        plists = [[] for i in range(k)]
        for p in points:
            smallest_distance = float('Inf')
            for i in range(k):
                distance = euclidean(p, clusters[i].center)
                if distance < smallest_distance:
                    smallest_distance = distance
                    idx = i
            plists[idx].append(p)
        diff = 0
        for i in range(k):
            old = clusters[i]
            center = calculate_center(plists[i], old.n)
            new = Cluster(plists[i], center, old.n)
            clusters[i] = new
            diff = max(diff, euclidean(old.center, new.center))
        if diff < min_diff:
            break
    return clusters

Examples

Identifying dominant colors is a pretty useful thing that can always be used. This is a palette choice for a website or some UI elements. For example, the Chrome browser uses the k-means method to select the dominant color from the favicon.

Tags:

Defining Dominant Colors: Python and the k-Means Method

The code

Examples

Also popular now: