Guess the Celebrity

    There’s a quiz on the movie search called “Guess the Celebrity.” It is necessary in 10 seconds to guess the actor (director, screenwriter, just a famous person) in the photo. The rules are simple, but getting to know a person is not so easy. Especially if you don’t know. It was here that the idea was born to "help" yourself in solving.

    First you need to decide on a concept. The first thing that comes to mind is to find out the person’s ID and upload his photo. It’s not difficult to find out the person’s ID, the cinema search is constantly updated, and one of these innovations was the autocomplete in the search line (earlier the search redirected to another domain - s.kinopoisk.ru, this would complicate the task even more). Separately, to search for people, it uses queries of the form: The handsome JSON comes in response. We have identifiers of persons, it remains to upload photos. To speed up the upload process, we will use small copies of photos. They are located at: As you can see, the statics are on another domain (and this will add us problems). We have all the data, it remains to add a few styles and arrange as a user script:

    www.kinopoisk.ru/handler_search_people.php?q={query}




    st.kinopoisk.ru/images/sm_actor{id}.jpg




    (function(){
    	functiondoMain(){
    		$('img[name="guess_image"]').css({"border":"1px solid black","margin":"10px 0 10px 0"});
    		$("#answer_table").parent().css({"background":"#f60","padding-left":"130px","padding-bottom":"30px"});
    		for (var i=0; i<4; ++i){
    			$('<div><img src="http://st.kinopoisk.ru/images/users/user-no-big.gif" \
    			class="cheet_image" width=52 hight=82 /></div>')
    			.bind("click", function(){
    				$(".cheet_image").css({'box-shadow':'','border':''});
    			})
    			.bind("load", function() {
    				$(this).css({'box-shadow':'0 0 10px rgba(0,0,0,0.9)',"border":"1px solid red"});
    			})
    			.appendTo("#a_win\\["+i+"\\]");
    		}
    		$('img[name="guess_image"]').bind("load", function(){
    				doLoader(0);
    		});
    	}	
    	functiondoLoader(i){
    		$.getJSON(
    			"/handler_search_people.php",
    			{
    				q: $("#win_text\\["+i+"\\]").html()
    			},
    			function(data){
    				$(".cheet_image").eq(i)
    					.attr('src','http://st.kinopoisk.ru/images/sm_actor/'+data[0].id+'.jpg');
    				if (i < 4) doLoader(++i);
    			}
    		);
    	}	
    	window.addEventListener('DOMContentLoaded', doMain, false);
    })();
    

    Now, each time a new image is uploaded, photos from the answer options are loaded:



    The disadvantages of this method include the fact that we still need to perform certain actions - visually recognize photos. Ideally, only one action should be required of us - clicking on the "start" button.

    We will improve our script. Now we will compare the images and select the correct option based on the comparison. First, let's try to compare the hashes of the image. We need to make sure that the hidden image and the statically accessible analogue are the same thing. We open the images in the HEX editor and see that this is not so:





    As you can see, the images are generated dynamically. There is only one way out - to compare images pixel by pixel. And here HTML5 comes to the rescue, in particular the element <canvas>. All that is required of us is to draw an image and call the methodgetImageData(x, y, width, height). However, we remember that the image is stored on another domain and there is no question of any CORS :



    The way out of this situation is the use of inter-window communication - a method postMessage() and an event message. In the hidden frame, we will load the main page of the domain where the photos are located, upload the image itself, convert the base64 string to base64 and send it to the parent frame. Although of course, you can do something else: upload an image, dynamically create a canvas element and get an array of pixel values ​​from it. Since the type of the received object will be not just Array, butUint8ClampedArray(a simple 8-bit array) that does not have a join method, you will have to use JSON to serialize / deserialize the data. By itself, this is very expensive and loses in performance to the first method, which we will use.

    First of all, we need to get a base64 encoded image. In the hidden frame, load the main page, and in the anchor we transmit the image identifier and the number of the answer option. In the frame itself, we load the desired image and find its base64 code:

    xhr = new XMLHttpRequest();
    xhr.open('GET', '/images/sm_actor/'+hash[0]+'.jpg', false);
    xhr.overrideMimeType('text/plain;charset=x-user-defined');
    xhr.onload = function() {
        if (xhr.readyState == 4){
            var resp = xhr.responseText;
            var data = 'data:';
            data += xhr.getResponseHeader('Content-Type');
            data += ';base64,';
            var decodedResp = '';
            for(var i = 0, L = resp.length; i < L; ++i)
                decodedResp += String.fromCharCode(resp.charCodeAt(i) & 255);
            data += btoa(decodedResp);
        }
    };
    xhr.send(null);
    

    When sending the image in the Chrome browser, one unpleasant feature was found out: the image obtained in this way is still protected by the CORS policy and you cannot get its data from canvas. The way out of this deadlock is to embed the script in the page code and send the image in this way (as it turned out, and this method does not work the first time):

    if (typeofwindow.chrome == 'undefined')
    	window.parent.postMessage(hash[1]+"|"+data, "http://www.kinopoisk.ru/");
    else {
    	var scr = document.createElement("script");
    	scr.setAttribute('type','application/javascript');
    	scr.textContent = "window.parent.postMessage('"+hash[1]+"|"+data+"', 'http://www.kinopoisk.ru/');";
    	document.body.appendChild(scr);
    }
    

    Now the fun begins - image comparison. First of all, my choice fell on the IM.js library (from the words Image Match, to the well-known Internet Messager it has nothing to do). For unknown reasons, she refused to start me. I had to study literature about comparing images. I settled on the simplest method - using the ΔE * metric and its simplest implementation, CIE76. Although it uses the LAB color space, we will use it in ordinary RGB. Because of this, errors will inevitably arise, but even with them the result is quite acceptable. Moreover, you will have to convert RGB -> LAB through the intermediate XYZ space, which will cause even greater errors. The essence of CIE76 is to find the rms color:



    In the code, it looks like this:

    // В качестве параметра передаём // контекст изображения, полученного из фреймаfunctiondoDiff(context) {
    	var all_pixels = 25*40*4;
    	var changed_pixels = 0;
    	var first_data = context.getImageData(0, 0, 25, 40);
    	var first_pixel_array = first_data.data;
    	// получаем данные загаданного изображения// из заранее созданного и отрисованного canvasvar second_ctx = $("#guess_transformed").get(0).getContext('2d');
    	var second_data = second_ctx.getImageData(0, 0, 25, 40);
    	var second_pixel_array = second_data.data;
    	for(var i = 0; i < all_pixels; i+=4) {
    		if (first_pixel_array[ i ] != second_pixel_array[ i ] ||	// R
    			first_pixel_array[i+1] != second_pixel_array[i+1] ||	// G
    			first_pixel_array[i+2] != second_pixel_array[i+2])		// B
    			{
    				changed_pixels+=Math.sqrt(
    					Math.pow( first_pixel_array[ i ] - second_pixel_array[ i ] , 2) +
    					Math.pow( first_pixel_array[i+1] - second_pixel_array[i+1] , 2) +
    					Math.pow( first_pixel_array[i+2] - second_pixel_array[i+2] , 2)
    				) / (255*Math.sqrt(3));
    			}
    	}
    	return100 - Math.round(changed_pixels / all_pixels * 100);
    }
    

    Everything is ready, it remains to omit all parts in the form of a user script and test it.



    As we can observe, everything works. The most expensive part is downloading images. That is why all images are loaded sequentially (after receiving the message event). At the same time loading images to process all 4 results, sometimes it took more than 10 seconds. It is also worth paying attention to the percentage of similarity. It is never higher than 96% and less than 75% even with completely different images.

    The final chord of our opera will be the addition of automatic comparison and clicking on the desired button:

    // обработчик события messagefunctiondoMessage(e) {
    	var data = e.data.split("|", 2);
    	var index = parseInt(data[0]);
    	// ...if (index == 3)
    		$(document).trigger("cheetcompare");
    	//...
    }
    // в main вешаем обработчик нашего читерского событияfunctiondoMain(){
    	// ...
    	$(document).bind("cheetcompare", function(e){
    	var max = 0;
    	// скрытые input, в них храним результат сравненияvar cheetd = $(".cheet_diff");
    	for(var i = 0; i < 4; ++i) {
    		max = (cheetd.eq(max).val() > cheetd.eq(i).val()) ? max : i;
    	}
    	$("#a_win\\["+max+"\\]").trigger("click");
    });
    	// ...
    }
    

    Alas, it was not possible to completely abandon the visual control, from time to time photos pop up not from the avatar, but from the gallery. Nevertheless, their minority. A simple visual control filter will be to search for a result with a degree of similarity above 93. The result of the script can be seen in this video:



    The script was tested in Opera 12, Chrome 22 + Tampermonkey (if it doesn’t work, refresh the page, it doesn’t work the first time). In Firefox 16.0.1, the script refused to start - the getImageDatarequested image does not work .

    You can download the script from userscripts.org: DOWNLOAD

    Literature

    1. Getting cross-domain data in Google Chrome through user script
    2. canvas same origin security violation work-around
    3. Canvas training
    4. Uint8ClampedArray
    5. IM.js: Quick image comparison pixel by pixel
    6. Comparing Images and Generating Difference Images in Ruby
    7. Color_difference formula



    UPD # 1 As rightly pointed out by Monder , a bug crept into the formula. Namely, the divider, which is the maximum color difference (maximum color difference). You can visualize this as follows:



    If you represent the RGB family in the form of a cube, then the maximum color difference will be the diagonal, which can be found as follows:



    It is worth noting that the spread of values ​​has become more adequate: 60% - 95%. Now the bar of the visual filter can be reduced to 90%. In this case, almost certainly there is no similar photo and you have to guess for yourself.

    UPD # 2 nick4fake's Habraiser suggested a successfully forgotten normalization formula to 0 ... 100:

    new = (old - min) / (max - min) * 100

    Applicable to our task, it looks as follows: Y = (X - 55) / 38 * 100 . The scatter of values ​​has become even more noticeable, especially for photographs that are dominated by different shades (light / dark), now it is about 30% - 90%.

    Also popular now: