    Some time ago, we talked in our blog about the department of ABBYY at the faculty of Innovation and High Technology at MIPT. This, of course, is one of our main points of contact with the younger generation of IT-employees, but by no means the only one. We want to tell about one more student project today. This is a project codenamed ABBYY Labs, the first platform for which was also PhysTech.

    The idea of ​​student laboratories is very simple: we assemble a team of students who are involved in solving problems under the guidance of our specialists. At MIPT, this takes place as part of the annual course “Innovation Workshop”. The topic that our students are working on has been repeatedly raised in comments on posts about new versions of FineReader. The topic is “sick” for all students, so it is not surprising that this project has become so popular - among the most varied offers from companies, 20% of the guys chose it. So, our team is developing a module for recognizing printed formulas!

    In our laboratory there are 9 students from different faculties, and now everything is “adult” for them. The project was divided into two subprojects - highlighting areas that are “similar” to formulas, and directly recognizing with export to TeX. In each of them there is an analyst and developers - in the analysis ”there are three of them, and in“ recognition ”- four, among them there is a leading developer. The role of the project manager is played by a graduate student of our department - he not only manages the process, but also helps children understand the features of team work on complex technological projects. An HR specialist will help him with organizational matters. A separate role for testers is not provided - the developers themselves will be engaged in testing. They themselves will write tests for their classes. In addition, the product will be tested on a package of reference recognized images.

    From the point of view of solving problems, everything is also serious. Despite the fact that in the future product a number of ready-made libraries will be used for working with images in various formats, for text recognition and binarization of images, the guys will need:
    • create a system of signs for generating hypotheses about the presence of a formula in the image, as well as a system for combining and filtering these hypotheses;
    • to develop a conceptual apparatus for checking formulas (a kind of "semantic dictionary");
    • introduce a system of attributes and develop standards for characters that are not supported by the SDK used (because formulas are not only Greek and Latin letters);
    • come up with an algorithm for constructing a formula for recognized characters;
    • develop export to TeX.

    It is too early to talk about any results of the work of the young development team. So far, they have just begun to go through a “live” software development cycle. We wish the children to successfully go through all the stages from the analysis of the task to the “delivery” of the finished result and not to get out of the set plans and deadlines. We hope that their experience will inspire other ABBYY Labs teams that will appear in the future at various universities in our country.

    Dmitry Gritsan
    with the support of the HR service and the mobile department

