SavedModel

what is savedmodel?

SavedModel is the standardized, language-agnostic serialization format used by TensorFlow to package a trained model for sharing, deployment, and long-term storage. A SavedModel directory contains a complete TensorFlow program: the computational graph encoded as a Protocol Buffers message, the values of every tf.Variable (weights, biases, optimizer state if requested), and one or more named signatures that declare the input and output tensor specs of callable entry points. Because the format does not depend on the original Python source code, a SavedModel can be loaded and executed by Python, C++, Java, Go, or any other client that speaks the protobuf wire format.

The official guide states it directly: "A SavedModel contains a complete TensorFlow program, including trained parameters and computation. It does not require the original model building code to run, which makes it useful for sharing or deploying with TFLite, TensorFlow.js, TensorFlow Serving, or TensorFlow Hub."

SavedModel is the format that TensorFlow Serving consumes natively, that Vertex AI prediction endpoints accept for TensorFlow models, and that downstream tools such as the TensorFlow Lite converter and TensorFlow.js converter take as input. It is also the format every TensorFlow Hub module is published in.

origin and version history

SavedModel was introduced in TensorFlow 1.0, which Google released on 11 February 2017. The original design goal was to replace the older TensorFlow inference model format and unify several earlier serialization paths under one hermetic, language-neutral format that wrapped the existing tf.train.Saver and MetaGraphDef primitives.

In the TensorFlow 1.x era, SavedModels were typically built with tf.saved_model.Builder, which let authors attach multiple MetaGraphDefs under different tag sets (commonly serve and train). When TensorFlow 2.0 shipped in 2019, the high-level tf.saved_model.save(obj, path) function became the standard entry point, and the @tf.function tracing system replaced the manual session and graph construction of TF 1.x.

The format has continued to evolve. TensorFlow 2.13 added fingerprint.pb, a small file containing 64-bit hashes that uniquely identify SavedModel contents and let downstream systems detect when a model has changed without parsing the full graph. TensorFlow 2.15 added the experimental proto-splitting option (SaveOptions(experimental_image_format=True)) so that models whose graph definitions exceed protobuf's 2 GB limit can still be written to disk.

Keras, which ships with TensorFlow, has shifted its recommended whole-model format to the newer .keras zip archive (Keras v3), but the SavedModel directory format remains the canonical export target whenever the model needs to leave the Python process for production serving.

directory layout

A SavedModel is not a single file. It is a directory with a fixed layout that all TensorFlow loaders expect:

path	purpose
`saved_model.pb`	Serialized `MetaGraphDef` protobuf containing the graph and named signatures
`variables/variables.index`	Index file mapping variable names to byte offsets in the data shards
`variables/variables.data-00000-of-00001`	Binary checkpoint containing variable values; large models may be split into multiple shards
`assets/`	Auxiliary files referenced by graph ops, for example vocabulary lookup tables
`assets.extra/`	Optional files for higher-level tooling; never read by the TensorFlow loader
`fingerprint.pb`	Optional, TF 2.13+; 64-bit content hashes for integrity and lineage tracking
`keras_metadata.pb`	Optional; written when the source object is a Keras model, used to reconstruct the Python class

The variables/ subdirectory is a standard TensorFlow training checkpoint. The split between saved_model.pb (graph) and variables/ (weights) lets large weight tensors stay outside the 2 GB protobuf limit, which is one reason the format has aged well.

The fingerprint.pb file, defined in the 2022 community RFC on SavedModel fingerprinting, is composed of several 64-bit hashes covering different slices of the model. It is readable through tf.saved_model.experimental.read_fingerprint() and is used by Vertex AI and other serving stacks to deduplicate uploads.

components and concepts

Three concepts do most of the work inside a SavedModel.

MetaGraphDef is the top-level protobuf message that wraps the GraphDef (the actual node and edge structure of the computation), the collection definitions, the asset file list, and any saver definitions. A SavedModel may contain more than one MetaGraphDef, distinguished by tag sets such as serve, train, or gpu. In practice TF 2.x users almost always have exactly one, tagged serve.

SignatureDef is the named API of a SavedModel. Each signature points at a specific concrete function inside the graph and declares the dtype and shape of every input and output tensor, along with a method name such as tensorflow/serving/predict. The default signature key is serving_default, accessible through the constant tf.saved_model.DEFAULT_SERVING_SIGNATURE_DEF_KEY. TensorFlow Serving and Vertex AI both look up signatures by key when routing requests.

Variables are the trainable parameters. They are not embedded in the graph protobuf; they are stored in the checkpoint files under variables/, which keeps the graph small and readable. After loading, the restored object exposes them through standard attributes:

loaded = tf.saved_model.load(path)
loaded.variables             # all variables
loaded.trainable_variables   # subset that was trainable at save time
loaded.signatures            # dict of named entry points

Assets are the fourth piece. If a graph op references an external file, for example a tf.lookup.StaticVocabularyTable initialized from a text file, that file is copied into assets/ at save time and rewired so the loaded model finds it again.

saving a model

The low-level API works on any tf.Module subclass that exposes tf.Variable attributes and @tf.function decorated methods:

class Counter(tf.Module):
    def __init__(self):
        self.v = tf.Variable(1.0)

    @tf.function(input_signature=[tf.TensorSpec([], tf.float32)])
    def __call__(self, x):
        return x * self.v

tf.saved_model.save(Counter(), '/tmp/counter')

For a Keras model the recommended path is:

model = tf.keras.Sequential([...])
model.compile(...)
model.fit(...)
model.export('/tmp/my_model')   # writes a SavedModel directory

model.export() is the TF 2.16+ Keras API for producing a SavedModel suitable for serving. The older model.save('/tmp/my_model', save_format='tf') call still works for legacy code. If multiple entry points are needed (for example separate serving_default and array_input signatures), the signatures= argument accepts a dict of concrete functions:

signatures = {
    'serving_default': model.__call__.get_concrete_function(
        tf.TensorSpec([None, 224, 224, 3], tf.float32)),
    'array_input': model.__call__.get_concrete_function(
        tf.TensorSpec([None], tf.float32)),
}
tf.saved_model.save(model, path, signatures=signatures)

What gets saved: every tf.Variable attribute, every @tf.function decorated method (as one or more concrete functions), and any nested tf.Module instances. What does not get saved: arbitrary Python attributes, plain Python functions, or any code outside tf.function. Only the traced graph survives the round trip.

loading a model

The low-level loader returns a generic Trackable object whose signatures are callable:

loaded = tf.saved_model.load('/tmp/my_model')
infer = loaded.signatures['serving_default']
output = infer(input_1=tf.constant([[...]]))

For models saved through Keras, tf.keras.models.load_model('/tmp/my_model') reconstructs the original Keras class so that model.fit, model.predict, and the layer hierarchy continue to work. The low-level loader gives back the raw signatures only and is the right choice when serving from C++, Go, or any non-Python runtime.

In C++ the corresponding entry point is LoadSavedModel(session_options, run_options, export_dir, {kSavedModelTagServe}, &bundle) from tensorflow/cc/saved_model/loader.h, which returns a SavedModelBundle ready for Session::Run calls.

inspecting with saved_model_cli

Every TensorFlow install ships a command-line tool, saved_model_cli, that reads a SavedModel directory and prints its tag sets, signatures, and tensor specs without touching Python:

saved_model_cli show --dir /tmp/my_model --all

A typical fragment of the output looks like this:

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:
signature_def['serving_default']:
  inputs['input_1'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 224, 224, 3)
  outputs['output_1'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 1000)
  Method name is: tensorflow/serving/predict

The tool also has a run subcommand that executes a signature against numpy files, Python literals, or tf.train.Example protos, which is useful for sanity-checking a model before pushing it to production.

serving and downstream consumers

SavedModel is the lingua franca of TensorFlow deployment. The table below summarizes the main consumers:

target	how it consumes a SavedModel
TensorFlow Serving	Reads the directory directly; exposes gRPC on port 8500 and REST on port 8501; routes requests by signature name
Vertex AI prediction	Upload the directory as a Model resource; deploy to an Endpoint for online or batch prediction
Triton Inference Server	Loads via the TensorFlow backend; expects a `model.savedmodel/` subdirectory inside the model repository
TensorFlow Hub	Every published module is a SavedModel; `hub.load(url)` downloads and loads it
TensorFlow Lite converter	`tf.lite.TFLiteConverter.from_saved_model(path)` produces a `.tflite` flatbuffer for mobile and embedded
TensorFlow.js converter	`tensorflowjs_converter --input_format=tf_saved_model` produces a sharded JSON + binary bundle for the browser
ONNX	`python -m tf2onnx.convert --saved-model path --output model.onnx` converts to ONNX for cross-framework deployment

A TensorFlow Serving REST request against a deployed SavedModel looks like this:

POST http://host:8501/v1/models/my_model/versions/1:predict
{
  "signature_name": "serving_default",
  "instances": [{"input_1": [[...]]}]
}

The signature_name field maps directly to the SignatureDef key embedded in saved_model.pb. If it is omitted, the server uses serving_default.

comparison with peer formats

SavedModel sits in a crowded field of model serialization formats. The right comparison depends on whether the goal is portability, serving compatibility, security, or pure tensor storage.

format	scope	code in file	typical ecosystem
SavedModel	Graph + variables + signatures + assets	Yes (graph ops, custom ops via shared libraries)	TensorFlow, Keras, TF Serving, Vertex AI, TF Hub
Keras `.keras` (v3)	Whole Keras model in a zip archive (config JSON + H5 weights)	No serialized code; relies on registered Keras classes	Keras 3.x, recommended for in-Python workflows
Keras `.h5` (legacy)	Keras model architecture and weights in a single HDF5 file	No	Older Keras code; deprecated for new projects
ONNX	Cross-framework graph + weights	Yes (operator graph)	ONNX Runtime, TensorRT, OpenVINO, Triton
SafeTensors	Tensor weights only, with a JSON header	No code, no eval	Hugging Face Transformers, Diffusers, Llama, Gemma, Phi
PyTorch `.pt` / `state_dict`	Tensors and optionally full model object	Yes (Python pickle)	PyTorch, TorchServe
GGUF	Quantized LLM weights and metadata in one file	No	llama.cpp, Ollama, local inference

A few honest observations from this layout. SavedModel and ONNX both store the computation graph, which is what makes them portable across runtimes; SafeTensors and state_dict files store only the numbers and assume the loading process already knows the architecture. SafeTensors was specifically designed by Hugging Face to avoid the arbitrary code execution risk of pickle-based formats; it stores no code, only typed tensors with a JSON header, and was audited in 2023 before becoming the default for new Hugging Face uploads. SavedModel is not vulnerable to pickle attacks the way .pt files are, but it can still execute custom ops if they are registered as shared libraries, so the official guidance is the same: do not load models from untrusted sources.

modern context and trade-offs

SavedModel remains the default for production TensorFlow workloads. Most large-scale computer vision pipelines that started in TensorFlow still ship as SavedModels into TF Serving or Vertex AI, and TensorFlow Hub continues to publish modules in this format. Inside Google, SavedModel is also the format that TPU-based serving stacks expect.

The picture in the LLM world is different. Most modern large language model work happens in PyTorch, and the de facto distribution format is SafeTensors weights paired with a config JSON, with GGUF taking the niche of quantized local inference. SavedModel sees little use for new LLM releases, partly because the TensorFlow LLM ecosystem (Flax + JAX, T5X, MaxText) tends to use Orbax checkpoints rather than SavedModel for training, and PyTorch dominates the open-source release pipeline.

For teams that need to bridge the two worlds, the common path is SavedModel -> ONNX -> target runtime, using tf2onnx. The reverse direction (ONNX -> SavedModel) is handled by onnx-tf or the newer onnx2tf, with the usual caveats about layout differences (TensorFlow defaults to NHWC, ONNX expects NCHW) and the occasional need to reimplement custom ops by hand.

Limitations worth naming:

The format is TensorFlow-centric. Reading it from non-TF runtimes requires either a TensorFlow runtime, a converter, or both.
SavedModel files are heavier than tensor-only formats like SafeTensors because they carry the full graph and any assets.
The 2 GB protobuf limit on saved_model.pb historically required workarounds for very large graphs; the experimental proto-splitting option in TF 2.15+ addresses this.
Custom Python logic outside @tf.function does not survive the save and load round trip, which sometimes surprises authors who expected behavior similar to pickling a Keras subclass.

None of these are fatal. For TF-native production serving, SavedModel is still the right answer, and the directory layout has stayed remarkably stable across nearly a decade of TensorFlow releases.

explain like i'm five

Imagine you built something cool out of LEGO and you want a friend to be able to rebuild it later, even if you forget how. So you put the finished model in a box, drop in all the spare bricks you used, and tape the building instructions to the lid. The box is a SavedModel. The instructions are the graph (saved_model.pb), the bricks are the weights (variables/), and any extra stickers or labels the model needs are the assets. Anyone with the box can rebuild and use the model, even if they speak a different programming language.

references

TensorFlow. "Using the SavedModel format." https://www.tensorflow.org/guide/saved_model
TensorFlow. "Save and load models." https://www.tensorflow.org/tutorials/keras/save_and_load
TensorFlow. "Save, serialize, and export models." https://www.tensorflow.org/guide/keras/save_and_serialize
TensorFlow source. "python/saved_model README." https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/saved_model/README.md
TensorFlow community. "RFC: SavedModel fingerprinting." https://github.com/tensorflow/community/blob/master/rfcs/20220610-saved-model-fingerprinting.md
TensorFlow. "Serving a TensorFlow Model." https://www.tensorflow.org/tfx/serving/serving_basic
ONNX. "Getting Started Converting TensorFlow to ONNX." https://onnxruntime.ai/docs/tutorials/tf-get-started.html
tensorflow-onnx. "tf2onnx repository." https://github.com/onnx/tensorflow-onnx
Hugging Face. "Safetensors audited as really safe and becoming the default." https://huggingface.co/blog/safetensors-security-audit
Hugging Face. "Common AI Model Formats." https://huggingface.co/blog/ngxson/common-ai-model-formats
Wikipedia. "TensorFlow." https://en.wikipedia.org/wiki/tensorflow
Keras. "Whole model saving and loading." https://keras.io/guides/serialization_and_saving/

what is savedmodel?

origin and version history

directory layout

components and concepts

saving a model

loading a model

inspecting with saved_model_cli

serving and downstream consumers

comparison with peer formats

modern context and trade-offs

explain like i'm five

references

Improve this article

Related Articles

Dataset API (tf.data)

Layers API (tf.layers)

Node (TensorFlow graph)

TensorBoard

TensorFlow Playground

TensorFlow Serving

what is savedmodel?

origin and version history

directory layout

components and concepts

saving a model

loading a model

inspecting with saved_model_cli

serving and downstream consumers

comparison with peer formats

modern context and trade-offs

explain like i'm five

references

Related Articles

Dataset API (tf.data)

Layers API (tf.layers)

Node (TensorFlow graph)

TensorBoard

TensorFlow Playground

TensorFlow Serving