By ET Bureau - July 22, 2022 4 Mins Read
Run:ai, the leader in compute orchestration for AI workloads, today announced new features of its Atlas Platform, including two-step model deployment — which makes it easier and faster to get machinRun:ai , Advanced Model Serving Functionality , AI Deployment, NVIDIA Triton Inference Server, Ronen Dar, e learning models into production. The company also announced a new integration with NVIDIA Triton Inference Server. These capabilities are particularly focused on supporting organizations in deploying and using AI models for inference workloads on NVIDIA-accelerated computing, so they can provide accurate, real-time responses. The features cement Run:ai Atlas as a single unified platform where AI teams, from data scientists to MLOps engineers, can build, train and manage models in production from one simple interface.
AI models can be challenging to deploy into production; despite the time and effort spent to build and train models, most never leave the lab. Configuring a model, connecting it to data and containers, and dedicating only the required amount of compute are major barriers to making AI work in production. Deploying a model usually requires manually editing and loading tedious YAML configuration files. Run:ai’s new two-step deployment makes the process easy, enabling organizations to quickly switch between models, optimize for economical use of GPUs, and ensure that models run efficiently in production.
Run:ai also announced full integration with NVIDIA Triton Inference Server, which allows organizations to deploy multiple models — or multiple instances of the same model — and run them in parallel within a single container. NVIDIA Triton Inference Server is included in the NVIDIA AI Enterprise software suite, which is fully supported and optimized for AI development and deployment. Run:ai’s orchestration works on top of NVIDIA Triton and provides auto-scaling, allocation and prioritization on a per-model basis — which right-sizes Triton automatically. Using Run:ai’s Atlas with NVIDIA Triton leads to increased compute resource utilization while simplifying AI infrastructure. The Run.ai Atlas Platform is an NVIDIA AI Accelerated application, indicating it is developed on the NVIDIA AI platform for performance and reliability.
Running inference workloads in production requires fewer resources than training, which consumes large amounts of GPU compute and memory. Organizations sometimes run inference workloads on CPUs instead of GPUs, but this might mean higher latency. In many use cases for AI, the end user requires a real-time response: identification of a stop sign, facial recognition on a phone, or voice dictation, for example. CPU-based inference can be too slow for these applications.
Using GPUs for inference workloads gives lower latency and higher accuracy, but this can be costly and wasteful when GPUs are not fully utilized. Run:ai’s model-centric approach automatically adjusts to diverse workload requirements. With Run:ai, using a full GPU for a single lightweight workload is no longer required, saving considerable cost while maintaining low latency.
Also Read: How CFOs Can Reduce Costs and Boost Employee Productivity with Automation
Other new features of Run:ai Atlas for inference workloads include:
“With new advanced inference capabilities, Run:ai’s Altas Platform now offers a solution for the entire AI lifecycle — from build to train to inference — all delivered in a single platform,” said Ronen Dar, CTO and co-founder of Run:ai. “Instead of using multiple different MLOps and orchestration tools, data scientists can benefit from one unified, powerful platform to manage all their AI infrastructure needs.”
“The flexibility and portability of NVIDIA Triton Inference Server, available with NVIDIA AI Enterprise support, enables fast, simple scaling and deployment of trained AI models from any framework on any GPU- or CPU-based infrastructure,” said Shankar Chandrasekaran, senior product manager at NVIDIA. “Triton Inference Server’s advanced performance and ease of use together with orchestration from Run:ai’s Atlas Platform make it the ideal foundation for AI model deployment.”
Check Out The New Enterprisetalk Podcast. For more such updates follow us on Google News Enterprisetalk News.
The platform covers e entire enterprise technology space- including emerging technologies like RPA, AI, cloud, automation, and the entire gamut of digital transformation tools, strategies and management decisions.
A Peer Knowledge Resource – By the CXO, For the CXO.
Expert inputs on challenges, triumphs and innovative solutions from corporate Movers and Shakers in global Leadership space to add value to business decision making.
Media@EnterpriseTalk.com