Nvidia and Microsoft have joined forces to create the Megatron-Turing Natural Language Generation model, which both the companies claim to be the “most powerful monolithic transformer language model trained to date”.
The AI model has 530 billion parameters, 105 layers and runs on chunky supercomputer hardware like Selene. By comparison, the vaunted GPT-3 has 175 billion parameters.
“Each model replica spans 280 NVIDIA A100 GPUs, with 8-way tensor-slicing within a node, and 35-way pipeline parallelism across nodes,” the pair said in a blog post.
The model was trained on 15 datasets that had 339 billion tokens, and it demonstrated how larger models require less training to function well.
To Read More: ZDnet