We use Yandex.Clean Web to protect against spam

  • Tutorial
For quite a long time Yandex has been providing a free spam detection service in messages called Yandex.Clean Web , but so far it has remained unpopular.

In this post I will demonstrate the basic methods of working with the Yandex.Clean Web API using an example of a simple PHP class.

So, the service supports four methods - detecting spam, receiving CAPTCHA, checking the entered CAPTCHA, and appealing the decision of the spam detector. We will consider working with the first three methods.

For convenience, we will arrange all this in the form of a simple static class.

class YandexCW
        public static $api_key          = '12345';
        /* URL-адреса */
        const check_data_url            = 'http://cleanweb-api.yandex.ru/1.0/check-spam';
        const get_captcha_url           = 'http://cleanweb-api.yandex.ru/1.0/get-captcha';
        const check_captcha_url         = 'http://cleanweb-api.yandex.ru/1.0/check-captcha';

We proceed to implement the class methods. The Clean Web API accepts GET and POST requests, depending on the required method, and returns the result in XML format. Therefore, first, we will write in our class a simple private method for sending requests and reading responses. We will use SimpleXML to read the answers, but we will not use CURL - fortunately, the standard file_get_contents function allows you to make both GET and POST requests using contexts .

    /* Отправка запроса сервису */ 
    private function xml_query($url, $parameters = array(), $post = false) 
        if (!isset($parameters['key'])) $parameters['key'] = self::$api_key; 
        $parameters_query = http_build_query($parameters); 
        if ($post) 
            $http_options = array( 
                    'http'    => array ( 
                        'method'     => 'POST', 
                        'content'    => $parameters_query 
            $context = stream_context_create($http_options); 
            $contents = file_get_contents($url, false, $context); 
         } else $contents = file_get_contents($url.'?'.$parameters_query); 
        if (!$contents) return false; 
        $xml_data = new SimpleXMLElement($contents); 
        return $xml_data; 

This method will greatly simplify our work with the API - it automatically substitutes the key, forms the context for file_get_contents, if we need to make a POST request, and also returns the response as a SimpleXML object. I think the code does not need more detailed commenting. So let's move directly to the methods for working with the API.

Checking messages for spam

First of all, we implement a method for sending the contents of a message to Yandex and then checking it for spam. However, before you just give the code, you should clarify something. According to the description of the check-spam method , it can accept the following parameters regarding the contents of the message:

  • ip - IP address of the sender.
  • email - The email address of the sender.
  • name - Sender name displayed in message signatures.
  • login - The name of the user account on the resource.
  • realname - full name of the user, taken, for example, from his registration data.
  • subject-plain - Post subject in text / plain format.
  • subject-html - Post subject in text / html format.
  • subject-bbcode - Post subject in BBCode format.
  • body-plain - The content (body) of the comment or post in text / plain format.
  • body-html - The content (body) of the comment or post in text / html format.
  • body-bbcode - The content (body) of the comment or post in BBCode format.

The set of data sent for verification can be arbitrary, except that only one type can be specified from the body and subject parameter family - either plain , or html , or bbcode . There are no required parameters either. Therefore, we will not transfer all this data to our method, not sequentially parameters, but as one array with an arbitrary data set.

    /* Проверка на спам */ 
    public function is_spam($message_data, $return_full_data = false) 
        if (!isset($message_data['ip'])) $ip = $_SERVER['REMOTE_ADDR']; 
        $response = self::xml_query(self::check_data_url, $message_data, true); 
        $spam_detected = (isset($response->text['spam-flag']) && $response->text['spam-flag'] == 'yes'); 
        if (!$return_full_data) return $spam_detected; 
        return array( 
                    'detected'        =>  $spam_detected, 
                    'request_id'    => (isset($response->id)) ? $response->id : null, 
                    'spam_links'    => (isset($response->links)) ? $response->links : array() 

This method will allow us to send data for verification with automatic substitution of the user's IP address. Depending on the second parameter, the function can return either just true or false , or an array with detailed information containing a list of links suspected as spam, as well as the request id generated by Yandex . By the way, it will be useful to us further.


Yandex offers us to use its own "captcha" and I must say that this solution has obvious advantages - firstly, the load on our server is reduced, and secondly, the concern about the "crack resistance" of CAPTCHA falls on the shoulders of Yandex. The method will be extremely simple:

    /* Получение CAPTCHA */ 
    public function get_captcha($id = null) 
        $response = self::xml_query(self::get_captcha_url, array('id' => $id)); 
        if (!$response || !isset($response->captcha)) return false; 
        return array('captcha_id' => $response->captcha, 'captcha_url' => $response->url); 

As you can see from the penultimate line, the method returns the captcha ID and a link to the image itself.
The link, as a rule, has the following form:
u.captcha.yandex.net/image?key= CAPTCHA ID

It is better to use both issued parameters so that the protection does not break if Yandex changes anything in the link format.

CAPTCHA Validation

Finally, the third class method will be used to validate the user-entered CAPTCHA value.
To use it, we will need to pass it the captcha id issued by the previous method, as well as what the user entered. It will not be superfluous to pass also the id of the request that we received when we sent the message for verification, however this is not necessary.

    /* Проверка CAPTCHA */ 
    public function check_captcha($captcha_id, $captcha_value, $id = null) 
        $parameters = array( 
                            'captcha'        => $captcha_id, 
                            'value'        => $captcha_value, 
                            'id'            => $id 
        $response = self::xml_query(self::check_captcha_url, $parameters); 
        return isset($response->ok); 

Examples of using

To fully check the Clean Web system, you can download a simple demo script. Before checking, do not forget to get your Clean Web API key and specify it in the script!
You can also download the class separately or see its full code in the browser.

Checking the contents of the form:
// Вызываем класс и задаём свой ключ API
YandexCW::$api_key = '12345';
// Отправляем данные формы на проверку
$allowed_keys = array('email', 'name', 'login', 'realname', 'subject-plain', 'body-plain');
$post_data = array_intersect_key($_POST, array_fill_keys($allowed_keys, null));
$is_spam = YandexCW::is_spam($post_data, true);
// Выводим итоги проверки


Most parameters when calling API methods are optional.
For example, you can not use spam checking, but just connect Yandex CAPTCHA to yourself, similar to how ReCAPTCHA is connected.
More details can be found on api.yandex.ru .

Also popular now: