See also: Machine learning terms
SavedModel is the standardized, language-agnostic serialization format used by TensorFlow to package a trained model for sharing, deployment, and long-term storage. A SavedModel directory contains a complete TensorFlow program: the computational graph encoded as a Protocol Buffers message, the values of every tf.Variable (weights, biases, optimizer state if requested), and one or more named signatures that declare the input and output tensor specs of callable entry points. Because the format does not depend on the original Python source code, a SavedModel can be loaded and executed by Python, C++, Java, Go, or any other client that speaks the protobuf wire format.
The official guide states it directly: "A SavedModel contains a complete TensorFlow program, including trained parameters and computation. It does not require the original model building code to run, which makes it useful for sharing or deploying with TFLite, TensorFlow.js, TensorFlow Serving, or TensorFlow Hub."
SavedModel is the format that TensorFlow Serving consumes natively, that Vertex AI prediction endpoints accept for TensorFlow models, and that downstream tools such as the TensorFlow Lite converter and TensorFlow.js converter take as input. It is also the format every TensorFlow Hub module is published in.
SavedModel was introduced in TensorFlow 1.0, which Google released on 11 February 2017. The original design goal was to replace the older TensorFlow inference model format and unify several earlier serialization paths under one hermetic, language-neutral format that wrapped the existing tf.train.Saver and MetaGraphDef primitives.
In the TensorFlow 1.x era, SavedModels were typically built with tf.saved_model.Builder, which let authors attach multiple MetaGraphDefs under different tag sets (commonly serve and train). When TensorFlow 2.0 shipped in 2019, the high-level tf.saved_model.save(obj, path) function became the standard entry point, and the @tf.function tracing system replaced the manual session and graph construction of TF 1.x.
The format has continued to evolve. TensorFlow 2.13 added fingerprint.pb, a small file containing 64-bit hashes that uniquely identify SavedModel contents and let downstream systems detect when a model has changed without parsing the full graph. TensorFlow 2.15 added the experimental proto-splitting option (SaveOptions(experimental_image_format=True)) so that models whose graph definitions exceed protobuf's 2 GB limit can still be written to disk.
Keras, which ships with TensorFlow, has shifted its recommended whole-model format to the newer .keras zip archive (Keras v3), but the SavedModel directory format remains the canonical export target whenever the model needs to leave the Python process for production serving.
A SavedModel is not a single file. It is a directory with a fixed layout that all TensorFlow loaders expect:
| path | purpose |
|---|---|
saved_model.pb | Serialized MetaGraphDef protobuf containing the graph and named signatures |
variables/variables.index | Index file mapping variable names to byte offsets in the data shards |
variables/variables.data-00000-of-00001 | Binary checkpoint containing variable values; large models may be split into multiple shards |
assets/ | Auxiliary files referenced by graph ops, for example vocabulary lookup tables |
assets.extra/ | Optional files for higher-level tooling; never read by the TensorFlow loader |
fingerprint.pb | Optional, TF 2.13+; 64-bit content hashes for integrity and lineage tracking |
keras_metadata.pb | Optional; written when the source object is a Keras model, used to reconstruct the Python class |
The variables/ subdirectory is a standard TensorFlow training checkpoint. The split between saved_model.pb (graph) and variables/ (weights) lets large weight tensors stay outside the 2 GB protobuf limit, which is one reason the format has aged well.
The fingerprint.pb file, defined in the 2022 community RFC on SavedModel fingerprinting, is composed of several 64-bit hashes covering different slices of the model. It is readable through tf.saved_model.experimental.read_fingerprint() and is used by Vertex AI and other serving stacks to deduplicate uploads.
Three concepts do most of the work inside a SavedModel.
MetaGraphDef is the top-level protobuf message that wraps the GraphDef (the actual node and edge structure of the computation), the collection definitions, the asset file list, and any saver definitions. A SavedModel may contain more than one MetaGraphDef, distinguished by tag sets such as serve, train, or gpu. In practice TF 2.x users almost always have exactly one, tagged serve.
SignatureDef is the named API of a SavedModel. Each signature points at a specific concrete function inside the graph and declares the dtype and shape of every input and output tensor, along with a method name such as tensorflow/serving/predict. The default signature key is serving_default, accessible through the constant tf.saved_model.DEFAULT_SERVING_SIGNATURE_DEF_KEY. TensorFlow Serving and Vertex AI both look up signatures by key when routing requests.
Variables are the trainable parameters. They are not embedded in the graph protobuf; they are stored in the checkpoint files under variables/, which keeps the graph small and readable. After loading, the restored object exposes them through standard attributes:
loaded = tf.saved_model.load(path)
loaded.variables # all variables
loaded.trainable_variables # subset that was trainable at save time
loaded.signatures # dict of named entry points
Assets are the fourth piece. If a graph op references an external file, for example a tf.lookup.StaticVocabularyTable initialized from a text file, that file is copied into assets/ at save time and rewired so the loaded model finds it again.
The low-level API works on any tf.Module subclass that exposes tf.Variable attributes and @tf.function decorated methods:
class Counter(tf.Module):
def __init__(self):
self.v = tf.Variable(1.0)
@tf.function(input_signature=[tf.TensorSpec([], tf.float32)])
def __call__(self, x):
return x * self.v
tf.saved_model.save(Counter(), '/tmp/counter')
For a Keras model the recommended path is:
model = tf.keras.Sequential([...])
model.compile(...)
model.fit(...)
model.export('/tmp/my_model') # writes a SavedModel directory
model.export() is the TF 2.16+ Keras API for producing a SavedModel suitable for serving. The older model.save('/tmp/my_model', save_format='tf') call still works for legacy code. If multiple entry points are needed (for example separate serving_default and array_input signatures), the signatures= argument accepts a dict of concrete functions:
signatures = {
'serving_default': model.__call__.get_concrete_function(
tf.TensorSpec([None, 224, 224, 3], tf.float32)),
'array_input': model.__call__.get_concrete_function(
tf.TensorSpec([None], tf.float32)),
}
tf.saved_model.save(model, path, signatures=signatures)
What gets saved: every tf.Variable attribute, every @tf.function decorated method (as one or more concrete functions), and any nested tf.Module instances. What does not get saved: arbitrary Python attributes, plain Python functions, or any code outside tf.function. Only the traced graph survives the round trip.
The low-level loader returns a generic Trackable object whose signatures are callable:
loaded = tf.saved_model.load('/tmp/my_model')
infer = loaded.signatures['serving_default']
output = infer(input_1=tf.constant([[...]]))
For models saved through Keras, tf.keras.models.load_model('/tmp/my_model') reconstructs the original Keras class so that model.fit, model.predict, and the layer hierarchy continue to work. The low-level loader gives back the raw signatures only and is the right choice when serving from C++, Go, or any non-Python runtime.
In C++ the corresponding entry point is LoadSavedModel(session_options, run_options, export_dir, {kSavedModelTagServe}, &bundle) from tensorflow/cc/saved_model/loader.h, which returns a SavedModelBundle ready for Session::Run calls.
Every TensorFlow install ships a command-line tool, saved_model_cli, that reads a SavedModel directory and prints its tag sets, signatures, and tensor specs without touching Python:
saved_model_cli show --dir /tmp/my_model --all
A typical fragment of the output looks like this:
MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:
signature_def['serving_default']:
inputs['input_1'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 224, 224, 3)
outputs['output_1'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1000)
Method name is: tensorflow/serving/predict
The tool also has a run subcommand that executes a signature against numpy files, Python literals, or tf.train.Example protos, which is useful for sanity-checking a model before pushing it to production.
SavedModel is the lingua franca of TensorFlow deployment. The table below summarizes the main consumers:
| target | how it consumes a SavedModel |
|---|---|
| TensorFlow Serving | Reads the directory directly; exposes gRPC on port 8500 and REST on port 8501; routes requests by signature name |
| Vertex AI prediction | Upload the directory as a Model resource; deploy to an Endpoint for online or batch prediction |
| Triton Inference Server | Loads via the TensorFlow backend; expects a model.savedmodel/ subdirectory inside the model repository |
| TensorFlow Hub | Every published module is a SavedModel; hub.load(url) downloads and loads it |
| TensorFlow Lite converter | tf.lite.TFLiteConverter.from_saved_model(path) produces a .tflite flatbuffer for mobile and embedded |
| TensorFlow.js converter | tensorflowjs_converter --input_format=tf_saved_model produces a sharded JSON + binary bundle for the browser |
| ONNX | python -m tf2onnx.convert --saved-model path --output model.onnx converts to ONNX for cross-framework deployment |
A TensorFlow Serving REST request against a deployed SavedModel looks like this:
POST http://host:8501/v1/models/my_model/versions/1:predict
{
"signature_name": "serving_default",
"instances": [{"input_1": [[...]]}]
}
The signature_name field maps directly to the SignatureDef key embedded in saved_model.pb. If it is omitted, the server uses serving_default.
SavedModel sits in a crowded field of model serialization formats. The right comparison depends on whether the goal is portability, serving compatibility, security, or pure tensor storage.
| format | scope | code in file | typical ecosystem |
|---|---|---|---|
| SavedModel | Graph + variables + signatures + assets | Yes (graph ops, custom ops via shared libraries) | TensorFlow, Keras, TF Serving, Vertex AI, TF Hub |
Keras .keras (v3) | Whole Keras model in a zip archive (config JSON + H5 weights) | No serialized code; relies on registered Keras classes | Keras 3.x, recommended for in-Python workflows |
Keras .h5 (legacy) | Keras model architecture and weights in a single HDF5 file | No | Older Keras code; deprecated for new projects |
| ONNX | Cross-framework graph + weights | Yes (operator graph) | ONNX Runtime, TensorRT, OpenVINO, Triton |
| SafeTensors | Tensor weights only, with a JSON header | No code, no eval | Hugging Face Transformers, Diffusers, Llama, Gemma, Phi |
PyTorch .pt / state_dict | Tensors and optionally full model object | Yes (Python pickle) | PyTorch, TorchServe |
| GGUF | Quantized LLM weights and metadata in one file | No | llama.cpp, Ollama, local inference |
A few honest observations from this layout. SavedModel and ONNX both store the computation graph, which is what makes them portable across runtimes; SafeTensors and state_dict files store only the numbers and assume the loading process already knows the architecture. SafeTensors was specifically designed by Hugging Face to avoid the arbitrary code execution risk of pickle-based formats; it stores no code, only typed tensors with a JSON header, and was audited in 2023 before becoming the default for new Hugging Face uploads. SavedModel is not vulnerable to pickle attacks the way .pt files are, but it can still execute custom ops if they are registered as shared libraries, so the official guidance is the same: do not load models from untrusted sources.
SavedModel remains the default for production TensorFlow workloads. Most large-scale computer vision pipelines that started in TensorFlow still ship as SavedModels into TF Serving or Vertex AI, and TensorFlow Hub continues to publish modules in this format. Inside Google, SavedModel is also the format that TPU-based serving stacks expect.
The picture in the LLM world is different. Most modern large language model work happens in PyTorch, and the de facto distribution format is SafeTensors weights paired with a config JSON, with GGUF taking the niche of quantized local inference. SavedModel sees little use for new LLM releases, partly because the TensorFlow LLM ecosystem (Flax + JAX, T5X, MaxText) tends to use Orbax checkpoints rather than SavedModel for training, and PyTorch dominates the open-source release pipeline.
For teams that need to bridge the two worlds, the common path is SavedModel -> ONNX -> target runtime, using tf2onnx. The reverse direction (ONNX -> SavedModel) is handled by onnx-tf or the newer onnx2tf, with the usual caveats about layout differences (TensorFlow defaults to NHWC, ONNX expects NCHW) and the occasional need to reimplement custom ops by hand.
Limitations worth naming:
saved_model.pb historically required workarounds for very large graphs; the experimental proto-splitting option in TF 2.15+ addresses this.@tf.function does not survive the save and load round trip, which sometimes surprises authors who expected behavior similar to pickling a Keras subclass.None of these are fatal. For TF-native production serving, SavedModel is still the right answer, and the directory layout has stayed remarkably stable across nearly a decade of TensorFlow releases.
Imagine you built something cool out of LEGO and you want a friend to be able to rebuild it later, even if you forget how. So you put the finished model in a box, drop in all the spare bricks you used, and tape the building instructions to the lid. The box is a SavedModel. The instructions are the graph (saved_model.pb), the bricks are the weights (variables/), and any extra stickers or labels the model needs are the assets. Anyone with the box can rebuild and use the model, even if they speak a different programming language.