IBM helps developers deploy AI and ML models on Kubernetes
Responding to a user request from an AI model -- 'model serving' -- is a key part of making use of the technology. But as the number of models expands serving them all raises problems and can lead to many being rarely used or abandoned.
Which is why IBM is introducing ModelMesh, a model serving management layer for Watson products that is designed to cope with high-scale, high-density and frequently-changing model use cases. It intelligently loads and unloads AI models to and from memory to strike an optimized trade-off between responsiveness to users and computational footprint.
ModelMesh already underpins many of Watson's cloud services including Watson Natural Language Understanding. It's open source and includes ModelMesh Serving, a controller for managing ModelMesh clusters via Kubernetes custom resources.
ModelMesh decides when and where to load and unload copies of models based on how recently they've been used and current request volumes -- if a particular model is under heavy load it will be scaled across more server pods. It's designed to minimize impact on runtime traffic but also allow urgent requests priority.
Using decentralized logic means there's no central controller involved in model management decisions. It also works with Kserve, an industry leading standardized model inference platform for trusted AI, with its origins in Kubeflow.