Serving in machine learning refers to the process of deploying and utilizing a trained machine learning model to make predictions or decisions based on new input data. This process is an integral part of the machine learning pipeline, as it allows the machine learning models to be applied to real-world problems and provide value to users. The serving process typically follows the completion of the training and evaluation stages of a machine learning project.
Model deployment is the act of integrating the trained machine learning model into an application, system, or service to enable it to make predictions or decisions. There are various deployment methods available, depending on the requirements and constraints of the application. Some common deployment strategies include:
Model serving systems are specialized software or infrastructure designed to manage and serve machine learning models in a production environment. These systems typically handle tasks such as:
Some popular model serving systems include TensorFlow Serving, NVIDIA Triton Inference Server, and MLflow.
Imagine you have a smart toy that can recognize shapes. Before it can do that, you have to teach it by showing it many examples of different shapes. This teaching process is called "training." Once the toy has learned what different shapes look like, you can use it to recognize shapes that it has never seen before. This part where the toy uses its knowledge to recognize new shapes is called "serving" in machine learning.
To make the smart toy work well for everyone, you need to put it in a special box (called a "serving system") that takes care of many important things. This box helps the toy remember different versions of its learning, makes sure it can handle many people asking it to recognize shapes at the same time, and checks that the toy is working well and not making mistakes.