TensorFlow Serving

See also: Machine learning terms

Introduction

TensorFlow Serving is a flexible, high-performance serving system for machine learning models, designed for production environments. Developed by Google, it is part of the larger TensorFlow ecosystem, an open-source machine learning library used to develop, train, and deploy ML models. TensorFlow Serving provides a standardized interface for deploying and serving machine learning models, enabling easy integration with other applications and systems.

Architecture

TensorFlow Serving's architecture consists of several key components, designed to streamline the process of deploying and serving machine learning models.

Core Components

Servables: A servable is the central abstraction in TensorFlow Serving, representing a version of a model that can be used for inference. It is a broad concept that allows various types of models to be supported, such as TensorFlow graphs or custom algorithms.
Loaders: Loaders manage the life cycle of a servable, including loading, unloading, and updating. They ensure the proper allocation and deallocation of resources when a servable is in use.
Sources: Sources are responsible for finding and providing servables to the system. They can be configured to pull servables from different locations, such as local storage or a remote server.
Managers: Managers are responsible for handling the lifecycle of servables, including loading, unloading, and versioning. They also provide APIs for querying the status of servables and requesting inference.

Aspired Versions Mechanism

TensorFlow Serving utilizes an aspired versions mechanism, which allows the serving system to decide which versions of a model to load or unload based on a set of predefined policies. This mechanism helps in efficient resource management and ensures that the latest version of a model is always available for serving.

Benefits

TensorFlow Serving offers several benefits for deploying and serving machine learning models in production environments:

Scalability: Designed to serve multiple models concurrently, TensorFlow Serving can handle thousands of queries per second, making it suitable for large-scale deployments.
Flexibility: TensorFlow Serving supports various types of models and can be easily extended to accommodate custom algorithms and data types.
Model Versioning: TensorFlow Serving automatically manages model versioning, ensuring that the latest version of a model is always available for serving, without any downtime.
Ease of Integration: TensorFlow Serving provides a standardized API, allowing it to be easily integrated with other systems and applications, such as RESTful services or gRPC.

Explain Like I'm 5 (ELI5)

Imagine you have a bunch of toy robots that can do different tasks. You want to share them with your friends, but you need a way to easily pass the robots between you and your friends. TensorFlow Serving is like a toy box that helps you share your robots (machine learning models) with your friends (other applications and systems) easily and quickly. It also makes sure that the toy box always has the newest robot toys available for you to use.