Yandex Translate API: PHP and a small study of the service

After Google closed its APIs for translation, the problem of finding an online service for machine translation became especially relevant.
There are many translation services with big names on the Internet: Promt, Pragma, etc. There is no problem in PHP to simulate access to the pages of services and get translation results. But there is a problem: almost all services in response to a simple GET or POST request do not return the translation result, but the entire page in all its glory, starting with DTD. As we say in Ukraine, “bad nemes.”
After the analysis, it was found out that there are only two services that return only the result of the transfer in response to the request: Yandex and Bing from Microsoft.

Looking ahead significantly, we indicate the scope and features:

Yandex is easier to use, perfectly translates from Russian to Russian, but there is also a drawback: Yandex translates only from Russian or only into Russian. It is not possible to translate Yandex from Ukrainian into English in one operation.

Bing does not suffer from this, but:
- translations in which Russian or Ukrainian is involved suffer from a strong “accent” and necessarily require editing
- using Bing in free mode has some limitations
- to use Bing you need a certain web application identifier - appID, getting one that is not connected with legal difficulties is actually just a registration, but which is a fascinating and long quest.


So, what tasks should the library / class solve for translation?

1. Obtaining languages ​​from which and into which you can translate, and their acceptable combinations
2. Actually the actual translation of the text

Immediately remark. From common sense considerations, it is clear that translating “War and Peace” will not work in one go. Landing at a technical level gives a clearer restriction: the Janex translator uses GET requests, respectively. - very rude - about 2000 characters at a time, no more. This is quite a bit, about 2 small paragraphs of text, even a small publication on the site will go beyond this.
Hence the following task:

3. Translation of large fragments of text.

Well, imagine the task: a multilingual site. Chasing a translator each time for translating interface elements and other texts on a site is, to put it mildly, unreasonable. Accordingly, the task:

4. Caching.

Caching is needed for one more purpose: the translator from Yandex is good, but not perfect, especially considering the richness of the Russian language. Often I would like to correct the result of the translation, but for this we need to store it somewhere.

So, Yandex.Translate
Sources are available in the Google repository and are documented in Russian.

1) Languages ​​of translation.
The Yandex_Translate class contains three methods with speaking names:
yandexGetLangsPairs () - getting available language pairs FROM-> TO
yandexGet_FROM_Langs ()
yandexGet_TO_Langs ()

Example (this example is complete, file connection below, creating an instance of the class, output formatting elements, etc. will be omitted.) We will get these combinations (by the way, they change from time to time): [0] => en- ru [1] => ru-en [2] => ru-uk [3] => uk-ru [4] => pl-ru [5] => ru-pl [6] => tr-ru [ 7] => ru-tr [8] => de-ru [9] => ru-de [10] => fr-ru [11] => ru-fr [12] => it-ru [13] => es-ru [14] => ru-es Please note that in all pairs there is a language ru, well, this was already mentioned above.

include_once 'Yandex_Translate.php';
$pairs = $translator->yandexGetLangsPairs();
print_r($pairs);





















Two other methods give languages ​​separately and can be used, for example, to form selects or other elements of choice.

2. Translation
One method, three arguments: from which, to which and actually translated text.
Note also the important property of eolSymbol - line ending. If it is installed incorrectly, there will be no formatting of the output text (see comments in the source).

Example: Beginning of a text.txt file: Mario Puzo The Godfather Dedicated to Anthony Cleary BOOK FIRST Behind every great fortune lies a crime. The result of the script: Marіo p'yuzo Baptisms of the father Privyachyatsya Entony Klіr PERSHA BOOK
$text = file_get_contents('text.txt');
$translatedText = $translator->yandexTranslate('ru', 'uk', $text);
echo $translatedText;












Behind the skin great camp zoozheny shout.


Let's pay attention right away - the translation is good, but editing is required.

3. Translation of large texts.
For the translation of large texts, the abstract class Big_Text_Translate is used.
The principle is as follows.
First, the text is split into sentences using the sentensesDelimiter delimiter - the default is a period.
It would be more correct, of course, to use a dot with a space, but in real, for example, “koments”, a space after a dot can easily “heal”. Therefore, this does not cause problems in real work, but the property can be redefined.
Then the sentences are collected in text fragments, the size of which does not exceed the specified value of symbolLimit - by default 2000.
Text fragments are ready for translation, semantics and formatting are saved. The forming of fragments is handled by the static method toBigPieces, and the output is an array.
The fromBigPieces method glues translated fragments back to the whole text.
Example Run the example yourself - everything is in the repository. Dear hawkers! If the material is of interest, then its continuation will be prepared, including sections: - caching translation results in several levels - working with the Bing service - a full demo: building a multilingual site.

$bigText = file_get_contents('text_big.txt');
$textArray = Big_Text_Translate::toBigPieces($bigText);

$numberOfTextItems = count($textArray);

foreach ($textArray as $key=>$textItem){

//Показываем прогресс перевода
echo 'Переведен фрагмент '.$key.' из '.$numberOfTextItems;
flush();

$translatedItem = $translator->yandexTranslate('ru', 'uk', $textItem);
$translatedArray[$key] = $translatedItem;
}

$translatedBigText = Big_Text_Translate::fromBigPieces($translatedArray);

echo $translatedBigText;







Also popular now: