# SavedModel

> Source: https://aiwiki.ai/wiki/savedmodel
> Updated: 2026-06-25
> Categories: Developer Tools, MLOps
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

*See also: [Machine learning terms](/wiki/machine_learning_terms)*

A **SavedModel** is the standardized, language-neutral serialization format that [TensorFlow](/wiki/tensorflow) uses to package a complete, trained model (its computation graph, its weights, and any supporting assets) as a self-contained directory for sharing, serving, and long-term storage [1]. The official TensorFlow guide states it plainly: "A SavedModel contains a complete TensorFlow program, including trained parameters (i.e, `tf.Variable`s) and computation. It does not require the original model building code to run, which makes it useful for sharing or deploying with TFLite, TensorFlow.js, TensorFlow Serving, or TensorFlow Hub." [1] Because the format encodes the program as [Protocol Buffers](/wiki/protocol_buffers) rather than Python source, a SavedModel produced in Python can be loaded and executed by C++, Java, Go, or any client that speaks the protobuf wire format [1][4].

Introduced with TensorFlow 1.0 on 11 February 2017, SavedModel is the format that [TensorFlow Serving](/wiki/tensorflow_serving) consumes natively, that [Vertex AI](/wiki/vertex_ai) prediction endpoints accept for TensorFlow models, and that the [TensorFlow Lite](/wiki/tensorflow_lite) and [TensorFlow.js](/wiki/tensorflow_js) converters take as input [6][11]. Every module published on [TensorFlow Hub](/wiki/tensorflow_hub) is a SavedModel, loaded with a single `hub.load(url)` call [1].

## What is a SavedModel?

SavedModel is the standardized, language-agnostic serialization format used by TensorFlow to package a trained model for sharing, deployment, and long-term storage. A SavedModel directory contains a complete TensorFlow program: the computational graph encoded as a Protocol Buffers message, the values of every `tf.Variable` (weights, biases, optimizer state if requested), and one or more named signatures that declare the input and output tensor specs of callable entry points [1][4]. Because the format does not depend on the original Python source code, a SavedModel can be loaded and executed by Python, C++, Java, Go, or any other client that speaks the protobuf wire format [4].

The TensorFlow Serving documentation describes the core artifact this way: "`saved_model.pb` is the serialized `tensorflow::SavedModel`. It includes one or more graph definitions of the model, as well as metadata of the model such as signatures." [6] The same docs frame the format's job as saving a "snapshot" of the trained model to reliable storage so that it can be loaded later for inference [6].

SavedModel is the format that TensorFlow Serving consumes natively, that Vertex AI prediction endpoints accept for TensorFlow models, and that downstream tools such as the TensorFlow Lite converter and TensorFlow.js converter take as input [6][7]. It is also the format every TensorFlow Hub module is published in [1].

## When was SavedModel introduced?

SavedModel was introduced in TensorFlow 1.0, which Google released on 11 February 2017 [11]. The original design goal was to replace the older TensorFlow inference model format and unify several earlier serialization paths under one hermetic, language-neutral format that wrapped the existing `tf.train.Saver` and `MetaGraphDef` primitives [4]. By TensorFlow 1.1.0 (April 2017), Java had gained support for loading models exported through the SavedModel API, an early demonstration of the format's cross-language promise [11].

In the TensorFlow 1.x era, SavedModels were typically built with `tf.saved_model.Builder`, which let authors attach multiple `MetaGraphDef`s under different tag sets (commonly `serve` and `train`). When TensorFlow 2.0 shipped in 2019, the high-level `tf.saved_model.save(obj, path)` function became the standard entry point, and the `@tf.function` tracing system replaced the manual session and graph construction of TF 1.x [1].

The format has continued to evolve. TensorFlow 2.13 added `fingerprint.pb`, a small file containing 64-bit hashes that uniquely identify SavedModel contents and let downstream systems detect when a model has changed without parsing the full graph; the same release added the `tf.saved_model.experimental.read_fingerprint(export_dir)` API [5]. TensorFlow 2.15 added the experimental proto-splitting option (`SaveOptions(experimental_image_format=True)`) so that models whose graph definitions exceed protobuf's 2 GB limit can still be written to disk.

Keras, which ships with TensorFlow, has shifted its recommended whole-model format to the newer `.keras` zip archive (Keras v3), but the SavedModel directory format remains the canonical export target whenever the model needs to leave the Python process for production serving [3][12].

## What is inside a SavedModel directory?

A SavedModel is not a single file. It is a directory with a fixed layout that all TensorFlow loaders expect [4]:

| path | purpose |
| --- | --- |
| `saved_model.pb` | Serialized `MetaGraphDef` protobuf containing the graph and named signatures |
| `variables/variables.index` | Index file mapping variable names to byte offsets in the data shards |
| `variables/variables.data-00000-of-00001` | Binary checkpoint containing variable values; large models may be split into multiple shards |
| `assets/` | Auxiliary files referenced by graph ops, for example vocabulary lookup tables |
| `assets.extra/` | Optional files for higher-level tooling; never read by the TensorFlow loader |
| `fingerprint.pb` | Optional, TF 2.13+; 64-bit content hashes for integrity and lineage tracking |
| `keras_metadata.pb` | Optional; written when the source object is a Keras model, used to reconstruct the Python class |

The TensorFlow source README describes the two main subfolders directly: `assets` "contains auxiliary files such as vocabularies, etc.", while `assets.extra` is "a subfolder where higher-level libraries and users can add their own assets that co-exist with the model, but are not loaded by the graph" [4]. The `saved_model.pb` (or `saved_model.pbtxt`) file holds "the graph definitions as `MetaGraphDef` protocol buffers" [4].

The `variables/` subdirectory is a standard TensorFlow training checkpoint. The split between `saved_model.pb` (graph) and `variables/` (weights) lets large weight tensors stay outside the 2 GB protobuf limit, which is one reason the format has aged well [4].

The `fingerprint.pb` file, defined in the 2022 community RFC on SavedModel fingerprinting, is composed of several 64-bit hashes covering different slices of the model (its structure, graph, signatures, and weights) [5]. It is readable through `tf.saved_model.experimental.read_fingerprint()` and is used by Vertex AI and other serving stacks to deduplicate uploads and track lineage [5].

## What are MetaGraphDef, SignatureDef, variables, and assets?

Three concepts do most of the work inside a SavedModel.

**MetaGraphDef** is the top-level protobuf message that wraps the GraphDef (the actual node and edge structure of the computation), the collection definitions, the asset file list, and any saver definitions [4]. A SavedModel may contain more than one MetaGraphDef, distinguished by tag sets such as `serve`, `train`, or `gpu`; the loader returns the meta graph whose tag set exactly matches the tags passed to the loader API [4]. In practice TF 2.x users almost always have exactly one, tagged `serve`.

**SignatureDef** is the named API of a SavedModel. Each signature points at a specific concrete function inside the graph and declares the dtype and shape of every input and output tensor, along with a method name such as `tensorflow/serving/predict` [6]. The default signature key is `serving_default`, accessible through the constant `tf.saved_model.DEFAULT_SERVING_SIGNATURE_DEF_KEY` [1]. TensorFlow Serving and Vertex AI both look up signatures by key when routing requests [1][6].

**Variables** are the trainable parameters. They are not embedded in the graph protobuf; they are stored in the checkpoint files under `variables/`, which keeps the graph small and readable [1]. After loading, the restored object exposes them through standard attributes:

```python
loaded = tf.saved_model.load(path)
loaded.variables             # all variables
loaded.trainable_variables   # subset that was trainable at save time
loaded.signatures            # dict of named entry points
```

**Assets** are the fourth piece. If a graph op references an external file, for example a `tf.lookup.StaticVocabularyTable` initialized from a text file, that file is copied into `assets/` at save time and rewired so the loaded model finds it again [4].

## How do you save a model with tf.saved_model.save?

The low-level API is a single call, `tf.saved_model.save(model, path_to_dir)`, and it works on any `tf.Module` subclass that exposes `tf.Variable` attributes and `@tf.function` decorated methods [1]:

```python
class Counter(tf.Module):
    def __init__(self):
        self.v = tf.Variable(1.0)

    @tf.function(input_signature=[tf.TensorSpec([], tf.float32)])
    def __call__(self, x):
        return x * self.v

tf.saved_model.save(Counter(), '/tmp/counter')
```

For a Keras model the recommended path is:

```python
model = tf.keras.Sequential([...])
model.compile(...)
model.fit(...)
model.export('/tmp/my_model')   # writes a SavedModel directory
```

`model.export()` is the TF 2.16+ Keras API for producing a SavedModel suitable for serving [1]. The older `model.save('/tmp/my_model', save_format='tf')` call still works for legacy code. If multiple entry points are needed (for example separate `serving_default` and `array_input` signatures), the `signatures=` argument accepts a dict of concrete functions:

```python
signatures = {
    'serving_default': model.__call__.get_concrete_function(
        tf.TensorSpec([None, 224, 224, 3], tf.float32)),
    'array_input': model.__call__.get_concrete_function(
        tf.TensorSpec([None], tf.float32)),
}
tf.saved_model.save(model, path, signatures=signatures)
```

What gets saved: every `tf.Variable` attribute, every `@tf.function` decorated method (as one or more concrete functions), and any nested `tf.Module` instances [1]. What does not get saved: arbitrary Python attributes, plain Python functions, or any code outside `tf.function`. Only the traced graph survives the round trip [1].

## How do you load a SavedModel with tf.saved_model.load?

The low-level loader, `tf.saved_model.load(path_to_dir)`, returns a generic Trackable object whose signatures are callable [1]:

```python
loaded = tf.saved_model.load('/tmp/my_model')
infer = loaded.signatures['serving_default']
output = infer(input_1=tf.constant([[...]]))
```

For models saved through Keras, `tf.keras.models.load_model('/tmp/my_model')` reconstructs the original Keras class so that `model.fit`, `model.predict`, and the layer hierarchy continue to work [2][3]. The low-level loader gives back the raw signatures only and is the right choice when serving from C++, Go, or any non-Python runtime [1].

In C++ the corresponding entry point is `LoadSavedModel(session_options, run_options, export_dir, {kSavedModelTagServe}, &bundle)` from `tensorflow/cc/saved_model/loader.h`, which returns a `SavedModelBundle` ready for `Session::Run` calls [4].

## How do you inspect a SavedModel with saved_model_cli?

Every TensorFlow install ships a command-line tool, `saved_model_cli`, that reads a SavedModel directory and prints its tag sets, signatures, and tensor specs without touching Python [1]:

```
saved_model_cli show --dir /tmp/my_model --all
```

A typical fragment of the output looks like this:

```
MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:
signature_def['serving_default']:
  inputs['input_1'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 224, 224, 3)
  outputs['output_1'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 1000)
  Method name is: tensorflow/serving/predict
```

The tool exposes two commands: `show`, which inspects tag sets and SignatureDefs, and `run`, which executes a signature against numpy files (`--inputs`), Python or numpy expressions (`--input_exprs`), or `tf.train.Example` protos (`--input_examples`), writing results to stdout or, with `--outdir`, to `.npy` files [1]. The `run` subcommand is useful for sanity-checking a model before pushing it to production.

## How is a SavedModel different from a checkpoint?

A SavedModel and a checkpoint are related but solve different problems, and the difference confuses many TensorFlow users. A checkpoint stores only the values of a model's variables (the weights), keyed by the Python object graph that created them; it is meaningless without the original model-building code to reconstruct the architecture [1]. A SavedModel stores both the weights and the computation graph plus its signatures and assets, so it can run with no source code at all [1].

| dimension | checkpoint | SavedModel |
| --- | --- | --- |
| What it stores | Variable values only (the `variables/` shards) | Graph + variables + signatures + assets |
| Needs original Python code to run | Yes | No |
| Primary use | Resuming or fine-tuning training | Serving, deployment, sharing |
| File or directory | A set of checkpoint shards | A directory with `saved_model.pb` plus `variables/` |
| Cross-language | No (Python object graph) | Yes (protobuf graph) |

In fact, the `variables/` subfolder of a SavedModel is a standard TensorFlow [checkpoint](/wiki/checkpoint): a SavedModel is, in effect, a checkpoint wrapped together with the serialized graph and metadata that make it self-describing and portable [1][4]. Teams typically checkpoint frequently during training and export a SavedModel once, at the end, for [model deployment](/wiki/model_deployment).

## How is a SavedModel served and converted?

SavedModel is the lingua franca of TensorFlow deployment. The table below summarizes the main consumers:

| target | how it consumes a SavedModel |
| --- | --- |
| [TensorFlow Serving](/wiki/tensorflow_serving) | Reads the directory directly; exposes gRPC on port 8500 and REST on port 8501; routes requests by signature name |
| [Vertex AI](/wiki/vertex_ai) prediction | Upload the directory as a Model resource; deploy to an Endpoint for online or batch prediction |
| [Triton Inference Server](/wiki/nvidia_triton_inference_server) | Loads via the TensorFlow backend; expects a `model.savedmodel/` subdirectory inside the model repository |
| [TensorFlow Hub](/wiki/tensorflow_hub) | Every published module is a SavedModel; `hub.load(url)` downloads and loads it |
| [TensorFlow Lite](/wiki/tensorflow_lite) converter | `tf.lite.TFLiteConverter.from_saved_model(path)` produces a `.tflite` flatbuffer for mobile and embedded |
| [TensorFlow.js](/wiki/tensorflow_js) converter | `tensorflowjs_converter --input_format=tf_saved_model` produces a sharded JSON + binary bundle for the browser |
| ONNX | `python -m tf2onnx.convert --saved-model path --output model.onnx` converts to [ONNX](/wiki/onnx) for cross-framework deployment |

TensorFlow Serving expects each model version in a numbered subdirectory (`my_model/1/`, `my_model/2/`, and so on), and the ModelServer runs with `--port=8500` for gRPC and `--rest_api_port=8501` for REST [6]. A TensorFlow Serving REST request against a deployed SavedModel looks like this:

```
POST http://host:8501/v1/models/my_model/versions/1:predict
{
  "signature_name": "serving_default",
  "instances": [{"input_1": [[...]]}]
}
```

The `signature_name` field maps directly to the SignatureDef key embedded in `saved_model.pb`. If it is omitted, the server uses `serving_default` [6].

## How does SavedModel compare to ONNX, SafeTensors, and other formats?

SavedModel sits in a crowded field of model serialization formats. The right comparison depends on whether the goal is portability, serving compatibility, security, or pure tensor storage.

| format | scope | code in file | typical ecosystem |
| --- | --- | --- | --- |
| SavedModel | Graph + variables + signatures + assets | Yes (graph ops, custom ops via shared libraries) | TensorFlow, Keras, TF Serving, Vertex AI, TF Hub |
| Keras `.keras` (v3) | Whole Keras model in a zip archive (config JSON + H5 weights) | No serialized code; relies on registered Keras classes | [Keras](/wiki/keras) 3.x, recommended for in-Python workflows |
| Keras `.h5` (legacy) | Keras model architecture and weights in a single HDF5 file | No | Older Keras code; deprecated for new projects |
| [ONNX](/wiki/onnx) | Cross-framework graph + weights | Yes (operator graph) | ONNX Runtime, TensorRT, OpenVINO, Triton |
| [SafeTensors](/wiki/safetensors) | Tensor weights only, with a JSON header | No code, no eval | Hugging Face Transformers, Diffusers, Llama, Gemma, Phi |
| PyTorch `.pt` / `state_dict` | Tensors and optionally full model object | Yes (Python pickle) | PyTorch, TorchServe |
| GGUF | Quantized LLM weights and metadata in one file | No | llama.cpp, Ollama, local inference |

A few honest observations from this layout. SavedModel and ONNX both store the computation graph, which is what makes them portable across runtimes; SafeTensors and `state_dict` files store only the numbers and assume the loading process already knows the architecture. SafeTensors was specifically designed by Hugging Face to avoid the arbitrary code execution risk of pickle-based formats; it stores no code, only typed tensors with a JSON header, and was audited in 2023 before becoming the default for new Hugging Face uploads [9]. SavedModel is not vulnerable to pickle attacks the way `.pt` files are, but it can still execute custom ops if they are registered as shared libraries, so the official guidance is the same: do not load models from untrusted sources [10].

## What are the trade-offs and modern uses of SavedModel?

SavedModel remains the default for production TensorFlow workloads. Most large-scale computer vision pipelines that started in TensorFlow still ship as SavedModels into TF Serving or Vertex AI, and TensorFlow Hub continues to publish modules in this format. Inside Google, SavedModel is also the format that TPU-based serving stacks expect.

The picture in the LLM world is different. Most modern large language model work happens in PyTorch, and the de facto distribution format is SafeTensors weights paired with a config JSON, with GGUF taking the niche of quantized local inference [10]. SavedModel sees little use for new LLM releases, partly because the TensorFlow LLM ecosystem (Flax + JAX, T5X, MaxText) tends to use Orbax checkpoints rather than SavedModel for training, and PyTorch dominates the open-source release pipeline.

For teams that need to bridge the two worlds, the common path is `SavedModel -> ONNX -> target runtime`, using `tf2onnx` [7][8]. The reverse direction (`ONNX -> SavedModel`) is handled by `onnx-tf` or the newer `onnx2tf`, with the usual caveats about layout differences (TensorFlow defaults to NHWC, ONNX expects NCHW) and the occasional need to reimplement custom ops by hand.

Limitations worth naming:

- The format is TensorFlow-centric. Reading it from non-TF runtimes requires either a TensorFlow runtime, a converter, or both.
- SavedModel files are heavier than tensor-only formats like SafeTensors because they carry the full graph and any assets.
- The 2 GB protobuf limit on `saved_model.pb` historically required workarounds for very large graphs; the experimental proto-splitting option in TF 2.15+ addresses this.
- Custom Python logic outside `@tf.function` does not survive the save and load round trip, which sometimes surprises authors who expected behavior similar to pickling a Keras subclass [1].

None of these are fatal. For TF-native production serving, SavedModel is still the right answer, and the directory layout has stayed remarkably stable across nearly a decade of TensorFlow releases.

## Explain like I'm five

Imagine you built something cool out of LEGO and you want a friend to be able to rebuild it later, even if you forget how. So you put the finished model in a box, drop in all the spare bricks you used, and tape the building instructions to the lid. The box is a SavedModel. The instructions are the graph (`saved_model.pb`), the bricks are the weights (`variables/`), and any extra stickers or labels the model needs are the assets. Anyone with the box can rebuild and use the model, even if they speak a different programming language.

## References

1. TensorFlow. "Using the SavedModel format." https://www.tensorflow.org/guide/saved_model
2. TensorFlow. "Save and load models." https://www.tensorflow.org/tutorials/keras/save_and_load
3. TensorFlow. "Save, serialize, and export models." https://www.tensorflow.org/guide/keras/save_and_serialize
4. TensorFlow source. "python/saved_model README." https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/saved_model/README.md
5. TensorFlow community. "RFC: SavedModel fingerprinting." https://github.com/tensorflow/community/blob/master/rfcs/20220610-saved-model-fingerprinting.md
6. TensorFlow. "Serving a TensorFlow Model." https://www.tensorflow.org/tfx/serving/serving_basic
7. ONNX. "Getting Started Converting TensorFlow to ONNX." https://onnxruntime.ai/docs/tutorials/tf-get-started.html
8. tensorflow-onnx. "tf2onnx repository." https://github.com/onnx/tensorflow-onnx
9. Hugging Face. "Safetensors audited as really safe and becoming the default." https://huggingface.co/blog/safetensors-security-audit
10. Hugging Face. "Common AI Model Formats." https://huggingface.co/blog/ngxson/common-ai-model-formats
11. Wikipedia. "TensorFlow." https://en.wikipedia.org/wiki/TensorFlow
12. Keras. "Whole model saving and loading." https://keras.io/guides/serialization_and_saving/

