Transcompilers are generally used for interoperability, and also to port codebases which were written in an obsolete or deprecated language like COBOL or Python into a modern one like Java or C++. They generally rely on handcrafted rewrite directions and are then applied to the Abstract Syntax Tree of the source code.
Unfortunately, the translations in questions are not up to the mark as they are not readable, they fail to respect the principles of targeted language and require human intervention in general if it is to be made functional. Hence such translations projects are quite expensive as they are time-consuming and require specialists in both the languages which are being used.
Recently the research team at Facebook has announced that they have developed a transcompiler named TransCoderAI which is capable of converting code from one high-level language to another. The neural transcompiler is based on an unsupervised machine learning approach that is capable of identifying unpredicted patterns from available datasets and with some human help, it is able to provide rule-based baselines. Theoretically the ability to convert code from one language to another is a basic need these days as most of the companies have their systems written in a legacy language like COBOL.
Many companies have spent millions of dollars getting their systems upgraded and rewritten in newer high-level languages. Recently The Commonwealth Bank of Australia spent 750 million dollars in order to get their systems rewritten from COBOL to Java.
From this, you can see the impact and need of such transcompilers which could interconvert languages from one language to another. The transCoder in theory could eliminate the need of rewriting the code from scratch, but it is much harder to implement such a system as every language has a different syntax as compared to other languages making it difficult to map functions on each other.
The TransCoder is capable of translating between Java, C++, and Python, this is done with the help of unsupervised machine learning. The initialization of the TransCoder takes place with pretraining of the model with cross-lingual languages which is able to map the set of instructions having the same functionalities to their identical representations in other languages.
Transcoding tools are converted into Abstract Syntax Tree (AST) which uses handcrafted rules to generate code in the desired language. A sequence to sequence (seq2seq) model is used by the TransCoder in addition to an encoder and decoder along with a transformer architecture. The model is capable of doing Cross Programming Language, Model pretraining, Denoising autoencoding, and Back translation.
The AI Research department at Facebook known as FAIR used 852 parallel functions to test the TransCoder and achieved an accuracy of 91.6% with Java to C++ which is highest among all six conversions.
The translations generated by the best performing version of the TransCoder were not strictly the same as the references from which it was generated but instead they had high computational accuracy. This was attributed to the use of beam search which is known to retain a set of partially decoded sequences which when combined are scored due to which the best-rated sequence rises to the top.
Such transcompiler AIs are being developed by many other tech companies as well, another example of this is Microsoft’s Open AI which successfully converted comments written in English into working functions. Open AI was trained and tested on the repositories of GitHub.
Conclusion
Every organization wants an artificially intelligent system that can take care of the nitty-gritty details of programming language dependencies when it comes to code migration or codebase portability as the choice of it is quite complicated and expensive these days. Facebook’s TransCoder is able to successfully translate one high-level language to another by recognizing the patterns specific to each language, this is done by using unsupervised machine learning. According to the test results, the TransCoder has the ability to outperform traditional commercial systems that are being used for translations of code.