p0is0n July 29, 2013 at 09:03

Picture Factory - How Does It Work? Part 2

Finally I was going to write the second part as promised in the first . In this part, I want to talk about the client side of the project.

What is used:

As previously said, the project is completely written in Python (with Cython inserts). All information about images, users, statistics is stored in the MySQL database.

Sphinx server is used for search (main) and filter. Client written for twisted txsphinx .

For "likes", the number of views of the image and the number of downloads is used by Redis. Redis also stores top images (the main page) and “similar images” (the page of the image itself). For twisted client txredis, found in the vast and slightly modified for itself (not yet in public).

Web: TwistedWeb with the Jinja2 template engine, everything is drawn by Bootsrap and Jquery. The end of the chain is Nginx.

The interesting part:

The first (and most interesting) thing was to make an image filter . First, a list of fields to search was made:

Categories
Minimum Image Resolution
Keywords
Colors

It was decided to make the filter using Sphinx. Indexing occurs through xmlpipe. The definition in sphinx is very simple:

source images {
	type					= xmlpipe2
	xmlpipe_command				= bin/sphinx.py --indexer=images
}
index images {
	source					= images
	path					= /var/lib/sphinx/data/images
	morphology         			= stem_enru
	charset_type				= utf-8
	min_word_len        			= 2
	min_infix_len 				= 3
	enable_star				= 1
	docinfo 				= extern
	html_strip = 1
	index_exact_words			= 1
	expand_keywords				= 0
	wordforms = images_wordforms.txt
}

Categories: MVA attribute, list ID. Also, the text attribute is a list of category names (for the correct search, adding weight to the results).

Minimum Image Resolution: Two attributes width and height . Here, everything is also simple, searching by the range of each attribute, from the one set by the user to the maximum (magic number 10000).

Keywords: Three text attributes title tags keywords . Title - the title of the image, the results are given the maximum weight when hit. Tags - list of image tags, average weight. Keywords - a set of keywords (the user does not see them), taken on the image page, may contain garbage. Little weight.

Colors: It was the most difficult, I will tell in more detail. A color palette {ID => RGB} has been created. When adding an image to the database, we get a list of dominant colors and equate them to our palette. Image colors are stored in a database with two values: color ID and percentage occupied on the image. There are ten MVA attributes “c_X” in the index, where X is a number from 0 to 9. All colors of the image fall into c_0, colors from c_1 with percent> = 10, colors into c_2 with percent> = 20, etc.

Filter by color: When searching for images by color, all images are taken for which the color is in the index c_1, then the weight of the color is considered. When searching by color with ID 2 (pseudo-code):

setSelect('(IN(c_1,2)*1) + (IN(c_2,2)*1) + (IN(c_3,2)*1) + (IN(c_4,2)*1) + (IN(c_5,2)*1) + (IN(c_6,2)*1) + (IN(c_7,2)*1) + (IN(c_8,2)*1) + (IN(c_9,2)*1) AS colors_weight')
setOrder('colors_weight DESC')

Perhaps the color search is not done in the most optimal way, but this is the most successful of what I came up with.

Total:

The filter speed makes me happy, now it is about 50-80 milliseconds with 70,000 images. If something else is interesting on the project, please ask, I will be glad to tell you. Again the project itself: http://picsfab.com

Tags:

Picture Factory - How Does It Work? Part 2

What is used:

The interesting part:

Total:

Also popular now: