Microsoft has given its translation software and services a major boost by adopting a new AI technology that significantly improves the quality of production translation models.
The software giant eventually aims to combine AI models for text, vision, audio and language through its larger XYZ-code initiative. As a component of this initiative, Z-code supports the creation of AI systems that are capable of speaking, seeing, hearing and understanding.
Microsoft has updated its Microsoft Translator software as well as its other Azure AI services with its new Z-code models. In order to get these models into production, the software giant is using Nvidia GPUs and Triton Inference Server to efficiently scale and deploy them.
It’s also worth noting that Microsoft Translator is the first machine translation provider to introduce Z-code Mixture of Experts models live for customers.
Z-code Mixture of Experts
Unlike previous AI models, Z-code models utilize a new architecture called Mixture of Experts (MoE) where different parts of the models can learn different tasks. As such, the models learn to translate between multiple languages simultaneously.
At the same time, newly introduced Z-code MoE models take advantage of transfer learning which enables efficient knowledge sharing across similar languages such as English and French. The models also use both parallel and monolingual data during the training process which allows for high quality machine translation beyond high-resource languages.
In October of last year, Microsoft announced in a blog post that Microsoft Translator is now capable of translating over 100 languages. To do this, the company used 200bn parameters supporting 100 language pairs. However, as training large models with billions of parameters is challenging, the Translator team worked together with Microsoft DeepSpeed to develop a high-performance system that it used to help train its massive scale Z-code MoE models.
Microsoft then partnered with Nvidia to optimize faster engines that can be used at runtime to deploy its new Z-code/MoE models on GPUs. For its part, Nvidia developed custom CUDA kernels that leveraged the CUTLASS and FasterTransformer libraries to implement MoE layers on a single V100 GPU.
Microsoft’s new Z-code models are now available by invitation to customers using its Document Translation feature that translates entire documents or even volumes of documents in a variety of different file formats while keeping their original formatting intact.