
Prediction of chemical reactions using machine translation algorithms

According to a study published by IBM Research (1) , the prediction of chemical reactions can be markedly improved by treating a chemical reaction as a translation problem.
The idea of using computers to facilitate the work of chemists is far from new. Already in the distant 1969, Corey and Wipke [1] demonstrated that the planning of synthesis and retrosynthesis (the inverse problem when a product is known, but a simple and cheap synthesis method is unknown) can be performed by a machine. (2)
With the advent of new machine learning technologies, a better prediction of the results of chemical transformations is possible. In recent years, forecasting methods based on reaction patterns have been extensively studied. For example, Segler and Waller recently introduced the neurosymbolic approach (3). They extracted the rules for reactions from the Reaxys commercial database. Then they trained the neural network with “molecular imprints of reactions” to prioritize the rules and combined the network with the Monte Carlo method to search for the tree (4) to overcome the scalability problems of other template-based methods.
To circumvent the limitations of template-based approaches and to further develop machine reaction prediction methods in 2012, the first forecasting approach without using reaction templates appeared (5). IBM researchers used a non-template method using Seq2seq models for predicting and retrosynthesis of organic reactions. A similar approach was recently published by Nam and Kim (6), who also used non-template seq2seq models. Their version was based on the Tensorflow translation model (v0.10.10.0) (7), from which they took the default values for most hyperparameters. Found in Translation

system interface (7) by researchers at IBM Research
The language of chemical formulas is the language by which people describe the chemical transformations and processes that occur in the world around them. A language invented by humans can be processed using algorithms similar to translation algorithms. Using this hypothesis, IBM researchers brought the chemical compounds into the SMILES view and proposed a new tokenization method that is arbitrarily extensible with new reaction information. Then the system was trained in data sets (the source is a patent database of reactions) containing 395 thousand chemical reactions using a neural network, often used in machine translation. Article (1) claims 80% prediction accuracy without the use of auxiliary data, such as reaction patterns. Accuracy is 6 points better than other predictive models. Moreover,
The authors hope that this method will accelerate research, such as drug development, and expect to open online access to the system in 2018 ( 8 ).
Interview:
References:
1. Schwaller P, Gaudin T, Lanyi D, Bekas C, Laino T. “Found in Translation”: Predicting Outcomes of Complex Organic Chemistry Reactions using Neural Sequence-to-Sequence Models. ArXiv171104810 Cs Stat [Internet]. 2017 Nov 13 [cited 2017 Dec 14]; Available from: arxiv.org/abs/1711.04810
2. Corey EJ, Wipke WT. Computer-Assisted Design of Complex Organic Syntheses. Science. 1969; 166 (3902): 178–92.
3. Segler MHS, Waller MP. Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction. Chem - Eur J. 2017 May 2; 23 (25): 5966–71.
4. Monte Carlo method for searching in the [Internet] tree. [cited 2017 Dec 14]. Available from: habrahabr.ru/post/282522
5. Kayala MA, Baldi P. Reaction Predictor: Prediction of Complex Chemical Reactions at the Mechanistic Level Using Machine Learning. J Chem Inf Model. 2012 Oct 22; 52 (10): 2526–40.
6. Nam J, Kim J. Linking the Neural Machine Translation and the Prediction of Organic Chemistry Reactions. ArXiv161209529 Cs [Internet]. 2016 Dec 29 [cited 2017 Dec 14]; Available from: arxiv.org/abs/1612.09529
7. Found in Translation: Neural Networks Predict Outcomes in Chemistry [Internet]. IBM Blog Research. 2017 [cited 2017 Dec 14]. Available from: www.ibm.comhttps : //www.ibm.com/blogs/research/2017/12/neural-networks-organic-chemistry/
8. IBM Research - Zurich, Found in Translation chemistry app [Internet]. 2017 [cited 2017 Dec 14]. Available from:www.zurich.ibm.com/foundintranslation
2. Corey EJ, Wipke WT. Computer-Assisted Design of Complex Organic Syntheses. Science. 1969; 166 (3902): 178–92.
3. Segler MHS, Waller MP. Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction. Chem - Eur J. 2017 May 2; 23 (25): 5966–71.
4. Monte Carlo method for searching in the [Internet] tree. [cited 2017 Dec 14]. Available from: habrahabr.ru/post/282522
5. Kayala MA, Baldi P. Reaction Predictor: Prediction of Complex Chemical Reactions at the Mechanistic Level Using Machine Learning. J Chem Inf Model. 2012 Oct 22; 52 (10): 2526–40.
6. Nam J, Kim J. Linking the Neural Machine Translation and the Prediction of Organic Chemistry Reactions. ArXiv161209529 Cs [Internet]. 2016 Dec 29 [cited 2017 Dec 14]; Available from: arxiv.org/abs/1612.09529
7. Found in Translation: Neural Networks Predict Outcomes in Chemistry [Internet]. IBM Blog Research. 2017 [cited 2017 Dec 14]. Available from: www.ibm.comhttps : //www.ibm.com/blogs/research/2017/12/neural-networks-organic-chemistry/
8. IBM Research - Zurich, Found in Translation chemistry app [Internet]. 2017 [cited 2017 Dec 14]. Available from:www.zurich.ibm.com/foundintranslation