Safetensors

Developer Tools Machine Learning Open Source AI

21 min read

Updated Jun 23, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 23, 2026

Fact-checked

In review queue

Sources

12 citations

Revision

v8 · 4,141 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Safetensors is an open-source tensor serialization format developed by Hugging Face that stores machine learning model weights as raw tensor data plus a small JSON header, making it impossible to embed executable code in a model file. It was created as a safe, fast, and framework-agnostic replacement for Python's pickle module, the format behind PyTorch .bin files, where loading an untrusted model can trigger arbitrary code execution on the host system.^[1]^[2] After a May 2023 third-party security audit by Trail of Bits found no critical flaws, safetensors became the default serialization format in Hugging Face's Transformers library.^[2]^[3] Hugging Face summarized the result plainly: "Audit shows that safetensors is safe and ready to become the default."^[2]

The format stores only raw tensor data and a small JSON header, which makes it impossible to embed executable code in a model file.^[5] This design directly addresses the security risks of pickle-based formats like PyTorch .bin files, where loading an untrusted model can trigger arbitrary code execution on the host system.^[2]

The safetensors library was first released on September 22, 2022, as version 0.0.1 on PyPI.^[10] The primary author is Nicolas Patry, a machine learning engineer at Hugging Face known for his work on Rust-based ML infrastructure. The project is licensed under the Apache 2.0 license and hosted at github.com/huggingface/safetensors.^[1] As of late 2025, the repository has over 3,700 stars on GitHub, 76 contributors, and is used by more than 129,000 downstream projects.^[1]

Why was safetensors created?

Python's pickle module is the default serialization mechanism behind PyTorch model checkpoints (.pt, .pth, .bin files). When a user calls torch.load(), pickle deserializes the byte stream back into Python objects. The problem is that pickle was designed to reconstruct arbitrary Python objects, including their behavior, not just their data. An attacker can craft a pickle file that executes arbitrary Python code the moment it is loaded.^[2] This code could exfiltrate credentials, install a backdoor, open a reverse shell, or run any other system-level command.^[8]

This is not a theoretical risk. Researchers at JFrog, Rapid7, and Snyk have documented real-world malicious models uploaded to public model hubs.^[8] Weaponized .pth files have been found on the Hugging Face Hub containing payloads that perform system fingerprinting, credential theft, and remote access trojan installation.^[8] Pickle's REDUCE, GLOBAL, and STACK_GLOBAL opcodes allow a pickle program to import and invoke arbitrary Python callables during unpickling. An attacker can use these opcodes to call os.system or similar functions, achieving arbitrary shell command execution.^[8]

Pickle scanning tools like picklescan exist to detect known malicious patterns, but researchers have discovered multiple bypass vulnerabilities in these scanners.^[7] JFrog disclosed three zero-day vulnerabilities in picklescan in 2025, demonstrating that static analysis cannot fully secure an inherently unsafe format.^[7] Repositories with pickle models are downloaded over 2.1 billion times per month from the Hugging Face model hub, making the attack surface enormous.^[7] Even in 2025, roughly 44.9% of popular models on the Hub still ship in the insecure pickle format, which is why migration to safetensors remains an active security goal.^[7]

Hugging Face's own scanning infrastructure catches some malicious uploads, but the fundamental issue is that pickle allows code execution by design.^[2] No amount of scanning can guarantee safety for an inherently unsafe format. Safetensors was built from the ground up to solve this problem: it stores only numerical data and metadata, with no mechanism for embedding or executing code.^[1]

File format specification

The safetensors format is intentionally simple. A .safetensors file consists of three contiguous sections:

Section	Size	Description
Header size	8 bytes	Unsigned 64-bit little-endian integer specifying the length of the JSON header in bytes
JSON header	N bytes (value from header size)	UTF-8 encoded JSON object containing tensor metadata
Data buffer	Remaining bytes	Raw tensor data stored as a contiguous binary block, without compression

Header structure

The JSON header is a dictionary where each key is a tensor name and each value is an object with three fields:

dtype: The data type of the tensor (e.g., F16, BF16, F32, I32)
shape: An array of integers representing the tensor dimensions
data_offsets: A two-element array [BEGIN, END] specifying the byte range of the tensor's data within the data buffer (not absolute file offsets)

A special key called __metadata__ can store arbitrary string-to-string key-value pairs for user-defined metadata such as model architecture, training configuration, or authorship information.^[12] Because metadata values must be strings, complex data structures need to be serialized as JSON strings within the value.^[12] This metadata is commonly used by tools like CivitAI and LoRA training scripts to record learning rates, training epochs, and dataset information.

Here is a simplified example of a header:

{
  "__metadata__": {"format": "pt"},
  "model.embed.weight": {
    "dtype": "F16",
    "shape": [32000, 4096],
    "data_offsets": [0, 262144000]
  },
  "model.norm.weight": {
    "dtype": "F16",
    "shape": [4096],
    "data_offsets": [262144000, 262152192]
  }
}

Design constraints

Several constraints are enforced to prevent abuse and ensure correctness:

The header must begin with the { character (byte 0x7B).
The header size is capped at 100 MB. This prevents denial-of-service attacks that could force parsing of an extremely large JSON blob.
Duplicate tensor names are not allowed.
The entire data buffer must be indexed with no gaps or overlapping byte ranges, which prevents buffer overrun vulnerabilities.
Tensor data uses little-endian byte order and C-order (row-major) memory layout.
NaN and infinity values are permitted in floating-point tensors.

Supported data types

Safetensors supports a range of data types relevant to modern machine learning workloads:

Dtype string	Description	Bytes per element
`BOOL`	Boolean	1
`U8`	Unsigned 8-bit integer	1
`I8`	Signed 8-bit integer	1
`U16`	Unsigned 16-bit integer	2
`I16`	Signed 16-bit integer	2
`F16`	IEEE 754 half-precision float	2
`BF16`	Brain floating point (bfloat16)	2
`U32`	Unsigned 32-bit integer	4
`I32`	Signed 32-bit integer	4
`F32`	IEEE 754 single-precision float	4
`F64`	IEEE 754 double-precision float	8
`I64`	Signed 64-bit integer	8
`U64`	Unsigned 64-bit integer	8
`F8_E4M3`	FP8 (4-bit exponent, 3-bit mantissa)	1
`F8_E5M2`	FP8 (5-bit exponent, 2-bit mantissa)	1

The inclusion of bfloat16 and FP8 types is notable because some older formats (such as HDF5 and NumPy's .npz) lack native support for these types, which are now standard in large-scale model training and inference.

Is safetensors safe? Security properties

The central promise of safetensors is that loading a .safetensors file cannot execute arbitrary code, regardless of how the file was crafted.^[2] This holds because:

The header is parsed as JSON, a data-only format with no code execution semantics.
The data buffer contains only raw numerical bytes. There are no object graphs, no class references, and no callback hooks.
The Rust-based parser does not use eval, exec, or any form of dynamic code loading.

What did the Trail of Bits security audit find?

In May 2023, Hugging Face, EleutherAI, and Stability AI jointly commissioned an external security audit of the safetensors library.^[2] The audit was performed by Trail of Bits, a well-known cybersecurity firm that has audited projects across the blockchain, infrastructure, and AI security domains.^[3] The results were published on May 23, 2023.^[2]

The audit's findings were:

No critical security flaw leading to arbitrary code execution was found.^[3]
Three issues of medium severity were identified, all of which were addressed.^[3]
Imprecisions in the format specification were detected and corrected.^[3]
Missing validation that allowed "polyglot" files (files that are simultaneously valid in multiple formats) was fixed.^[3]
Improvements to the test suite were proposed and implemented.^[3]

Hugging Face's published conclusion was that "the audit shows that safetensors is safe and ready to become the default," and that the library would "now be installed in transformers by default."^[2] All three organizations agreed to make the full audit report public, in keeping with open-source transparency principles.^[2] The report is available through Trail of Bits' publications repository on GitHub.^[3] EleutherAI published a companion blog post confirming their endorsement of safetensors as a safe default format.^[6]

How does safetensors load models faster?

Safetensors achieves significant speed improvements over pickle-based loading, primarily through memory mapping (mmap).^[4] When a safetensors file is loaded, the operating system maps the file directly into the process's virtual address space. Tensor data can then be accessed without copying it into a separate memory allocation.^[4]

With pickle-based PyTorch loading (torch.load()), the process typically works as follows: PyTorch creates empty tensors, loads data from the pickle file, and then copies the data into the empty tensors. This requires approximately twice the model's size in available RAM, since both the loaded data and the destination tensors exist in memory simultaneously.^[4]

Safetensors avoids this double-allocation problem. On CPU, tensors reference the memory-mapped file directly. On GPU, the library uses cudaMemcpy to transfer data straight from the mapped file to GPU memory, bypassing intermediate CPU tensor allocations.^[4]

What does "zero-copy" mean?

The term "zero-copy" deserves some clarification. On CPU, when the file contents are already in the OS page cache (for example, after a recent read), loading a safetensors file can be truly zero-copy: the tensor objects point directly at the mapped memory with no data duplication at all. On GPU, a copy from host memory to device memory is always required (there is no way around the PCIe or NVLink transfer), but safetensors still eliminates the intermediate step of allocating CPU tensors.^[4] In practice, this means safetensors skips one full copy of the model's weights compared to PyTorch's standard loading path.

Benchmark results

The official benchmarks from the Hugging Face documentation use GPT-2 weights and were run on Ubuntu 18.04.6 LTS with an Intel Xeon CPU @ 2.00 GHz and a Tesla T4 GPU (CUDA 11.2, driver version 460.32.03):^[4]

Device	Safetensors load time	PyTorch load time	Speedup
CPU	0.004 seconds	0.307 seconds	76.6x faster
GPU (CUDA, Tesla T4)	0.165 seconds	0.354 seconds	2.1x faster

The CPU speedup is dramatic because memory mapping avoids all data copying. The GPU speedup is smaller because the bottleneck shifts to the PCIe bus transfer between host memory and GPU memory, which both formats must perform.^[4]

For large-scale models, the gains are even more pronounced. The BLOOM 176-billion-parameter model loaded on 8 GPUs in approximately 45 seconds using safetensors, compared to roughly 10 minutes with standard PyTorch pickle weights.^[2] This roughly 13x improvement comes from lazy loading, where safetensors reads specific tensor shards directly to each GPU without loading the entire model into CPU memory first.^[2]

Lazy loading and tensor slicing

Safetensors supports lazy loading, meaning individual tensors can be read from the file without loading the entire model into memory.^[5] This is particularly useful for distributed inference, where each GPU only needs a subset of the model's weights. The library also supports tensor slicing, allowing users to read a sub-range of a tensor:^[5]

from safetensors import safe_open

with safe_open("model.safetensors", framework="pt", device="cuda:0") as f:
    tensor_slice = f.get_slice("embedding")
    vocab_size, hidden_dim = tensor_slice.get_shape()
    partial = tensor_slice[:1000, :]  # Load only first 1000 rows

This capability is useful for vocabulary-parallel inference, where different GPUs handle different portions of the embedding table.

Rust implementation

The core safetensors library is implemented in Rust, which provides memory safety guarantees without a garbage collector.^[1] The Rust implementation handles all file parsing, header validation, and data access logic. Language-specific bindings (for Python, JavaScript, and others) call into the Rust code through foreign function interfaces built with PyO3 and setuptools-rust.^[1]

Using Rust adds a meaningful layer of security: common vulnerability classes such as buffer overflows, use-after-free errors, and data races are prevented by the Rust compiler's ownership and borrowing rules. This is particularly relevant for a library that parses untrusted binary data downloaded from the internet.

The Rust crate is published on crates.io as safetensors.^[9] It has accumulated over 980,000 monthly downloads and is used by more than 760 other Rust crates, including Hugging Face's Candle deep learning framework, which uses safetensors as its primary model loading mechanism.^[9]

Framework support

Safetensors provides native bindings for multiple deep learning frameworks:^[5]

Framework	Module	Notes
PyTorch	`safetensors.torch`	Full support; default in Transformers since v4.35.0
TensorFlow	`safetensors.tensorflow`	Full support
JAX / Flax	`safetensors.flax`	Full support
NumPy	`safetensors.numpy`	Full support
PaddlePaddle	`safetensors.paddle`	Full support

Because the file format stores raw tensor data without framework-specific metadata, the same .safetensors file can be loaded in different frameworks. A model saved from PyTorch can be loaded in TensorFlow or JAX, provided the consuming code knows the expected tensor names and shapes. This is a meaningful advantage over pickle, which is tightly coupled to Python and often to the specific framework version that created the file.

Sharded safetensors for large models

Models with billions of parameters often exceed the memory capacity of a single device or the practical limits of a single file. Safetensors supports sharding, where model weights are split across multiple files with a JSON index file that maps tensor names to their respective shard files.^[5]

A sharded model typically looks like this on disk:

model-00001-of-00008.safetensors
model-00002-of-00008.safetensors
...
model-00008-of-00008.safetensors
model.safetensors.index.json

The model.safetensors.index.json file contains two keys:

metadata: An object with information such as the total model size in bytes
weight_map: A dictionary mapping each parameter name to the shard file that contains it

The Hugging Face Transformers library and the Accelerate library handle sharding automatically. When calling save_pretrained(), the model is split into shards of a configurable maximum size (typically 5 GB per shard). During loading, from_pretrained() reads the index file and loads only the required shards, enabling efficient distribution of weights across multiple GPUs with minimal memory overhead.

Sharding is particularly important for models with tens or hundreds of billions of parameters. A 70-billion-parameter model stored in FP16 requires approximately 140 GB of storage, which would be impractical as a single file for downloading, checkpointing, and distributed loading across multiple GPUs.

Is safetensors the default on Hugging Face?

Default format on Hugging Face Hub

Following the successful Trail of Bits security audit, Hugging Face made safetensors the default serialization format in the Transformers library.^[2] Since that transition, save_pretrained() saves models as .safetensors files unless explicitly overridden. When loading, the library prefers .safetensors files over .bin files if both are available. Adoption is not yet universal: as of 2025 roughly 44.9% of popular models on the Hub still publish a pickle-format file alongside or instead of safetensors, though most newly released frontier models (such as Llama 4, Qwen 3, and DeepSeek-R1) ship in safetensors.^[7]

Hugging Face also provides a web-based conversion tool at huggingface.co/spaces/safetensors/convert that allows users to convert existing pickle-based models to safetensors format directly on the Hub.^[2]

Major model families released on the Hugging Face Hub, including Llama 3, Gemma, Phi, Mistral, Qwen, Stable Diffusion XL, and Flux, all use safetensors as their primary distribution format.

Projects using safetensors

The format has been adopted by a wide range of open-source projects:

Project	Use case
Transformers	Default model serialization for Hugging Face models
Diffusers	Image generation model storage
Candle	Rust-based ML inference
MLX	Apple Silicon ML framework
llama.cpp	CPU/GPU LLM inference (reads safetensors for conversion)
AUTOMATIC1111 Stable Diffusion WebUI	Image generation interface
ComfyUI	Node-based image generation workflow
CivitAI	Community model sharing platform
ColossalAI	Distributed training framework
InvokeAI	Creative AI toolkit
oobabooga text-generation-webui	Local LLM chat interface
vLLM	High-throughput LLM serving
pytorch-image-models (timm)	Computer vision model hub
BERTopic	Topic modeling library

How does safetensors compare to other model formats?

Several model serialization formats compete in the machine learning ecosystem. Each serves a different primary use case.

Feature	Safetensors	Pickle / PyTorch .bin	GGUF	ONNX
Primary use case	Model weight storage and sharing	PyTorch checkpointing	Local LLM inference (llama.cpp, Ollama)	Cross-framework model deployment
Safety (no code execution)	Yes	No (arbitrary code execution possible)	Yes	Yes
Memory mapping (mmap)	Yes	No	Yes	No
Lazy loading	Yes	No	Yes	No
Stores computation graph	No (weights only)	No (weights only)	No (weights + tokenizer + config)	Yes (full graph + weights)
Built-in quantization	No (stores pre-quantized weights)	No	Yes (Q2 through Q8, k-quants)	Yes (via ONNX quantization tools)
Single-file format	Yes (or sharded with index)	Yes (or sharded)	Yes	Yes
bfloat16 support	Yes	Yes	Yes	Limited
FP8 support	Yes (F8_E4M3, F8_E5M2)	Yes (via PyTorch)	Yes	No
Framework-agnostic	Yes	No (Python/PyTorch specific)	Yes (binary format)	Yes (uses protobuf)
Typical file extension	`.safetensors`	`.bin`, `.pt`, `.pth`	`.gguf`	`.onnx`
Typical size (7B model, FP16)	~14 GB	~14 GB	~4 GB (Q4 quantized)	~14 GB
Developer	Hugging Face	PyTorch / Meta	ggml.ai (Georgi Gerganov)	Linux Foundation (ONNX Project)
License	Apache 2.0	Python/BSD	MIT	Apache 2.0

Safetensors vs. pickle

The most direct comparison is between safetensors and pickle-based formats (.pt, .bin, .pth). Safetensors is strictly safer because it cannot execute code.^[2] It is also faster on CPU because of memory mapping.^[4] Pickle files, however, can serialize arbitrary Python objects, not just tensors; safetensors is limited to tensor data and string metadata. For pure model weight storage and distribution, safetensors is the preferred format. PyTorch 2.0 introduced a weights_only=True parameter for torch.load() that restricts deserialization to tensor data, but this mitigation is opt-in and does not change the fundamental pickle format.^[8]

Safetensors vs. GGUF

GGUF (Georgi Gerganov Universal Format) was designed for the llama.cpp project and is optimized for running quantized models on consumer hardware, especially on CPUs.^[11] GGUF files bundle model weights, tokenizer configuration, and architecture metadata into a single file, and support a wide range of quantization formats (from 2-bit to 8-bit integers, including k-quant variants like Q4_K_M and Q5_K_S).^[11] A 7-billion-parameter model quantized to Q4 in GGUF format is roughly 4 GB, compared to approximately 14 GB for the same model in FP16 safetensors.^[11] Safetensors and GGUF serve different stages of the model lifecycle: safetensors for training, fine-tuning, and distribution; GGUF for efficient local inference.^[11] Models are typically converted from safetensors to GGUF using conversion scripts provided by the llama.cpp project.

Safetensors vs. ONNX

ONNX (Open Neural Network Exchange) stores not just model weights but the full computation graph, including operator definitions and layer connections.^[11] This makes ONNX suitable for deploying models across different inference runtimes (ONNX Runtime, TensorRT, CoreML) without needing the original training framework. ONNX launched in 2017 as a collaboration between Microsoft and Facebook (now Meta) and uses Protocol Buffers for serialization.^[11] Safetensors does not store computation graphs; it is purely a weight storage format.^[11] Models stored in safetensors still require the original model architecture code (or a framework like transformers) to reconstruct the model for inference.

Why not other formats?

The safetensors documentation explains why several existing formats were considered and rejected:^[5]

Format	Reason rejected
Pickle	Allows arbitrary code execution
HDF5	Complex codebase (~210,000 lines of C) with historical CVEs
Protobuf	2 GB file size limit
Cap'n Proto	No native float16 support
NumPy .npz	Vulnerable to zip bomb attacks; uses pickle internally for object arrays
Flatbuffers	No native bfloat16/FP8 support at the time of development

This analysis led the Hugging Face team to create a new format from scratch rather than building on an existing one.

Limitations

Safetensors is intentionally limited in scope. It stores only tensor data and simple string metadata. It does not store:

Model architecture or computation graphs (unlike ONNX)
Optimizer state (though optimizer tensors can be saved separately)
Custom Python objects or classes
Quantization metadata beyond what is encoded in the dtype (unlike GGUF's rich quantization metadata)
Tokenizer configuration (unlike GGUF, which bundles tokenizer data)
Training hyperparameters (unless manually added to __metadata__)

This narrow scope is a deliberate trade-off. By refusing to serialize arbitrary objects, the format remains safe by construction. Model architecture information is typically stored in a separate config.json file alongside the safetensors weights, as is standard practice in the Hugging Face ecosystem.

Another limitation is that safetensors files for full-precision models are large. A 7-billion-parameter model in FP16 occupies approximately 14 GB. Unlike GGUF, safetensors does not perform quantization; it stores whatever precision the tensors are already in. Users who need smaller files for local inference typically convert safetensors models to GGUF with quantization applied.

Installation and basic usage

Safetensors can be installed from PyPI or conda:

pip install safetensors

conda install -c huggingface safetensors

Saving tensors (PyTorch)

import torch
from safetensors.torch import save_file

tensors = {
    "embedding": torch.zeros((2, 2)),
    "attention": torch.zeros((2, 3))
}
save_file(tensors, "model.safetensors")

Loading tensors

from safetensors import safe_open

tensors = {}
with safe_open("model.safetensors", framework="pt", device="cpu") as f:
    for key in f.keys():
        tensors[key] = f.get_tensor(key)

Loading directly to GPU

from safetensors.torch import load_file

tensors = load_file("model.safetensors", device="cuda:0")

When was safetensors released? Version history

The safetensors library has been under active development since its initial release:

Version	Date	Notes
0.0.1	September 22, 2022	Initial release on PyPI
0.2.0	September 29, 2022	First major iteration with Rust backend
0.2.4	November 7, 2022	Performance improvements (Nicolas Patry announced 150x CPU speedup)
0.2.5	November 23, 2022	Stability improvements
0.3.x	Early 2023	Added framework bindings for TensorFlow, JAX, Flax
0.4.0	October 2023	Improved GPU loading and metadata support
0.4.5	September 2024	Bug fixes and performance improvements
0.5.0	January 2025	Further refinements
0.7.0	November 19, 2025	Latest stable release

The project has not yet reached version 1.0, but it carries a "Production/Stable" development status designation on PyPI.^[10] The library is maintained by Nicolas Patry, Daniel de Kok, and other Hugging Face engineers. As of version 0.7.0, there have been 45 total releases and 372 commits to the repository.^[10]

Security recommendations

The safetensors project and the broader ML security community recommend the following practices:

Always prefer safetensors over pickle-based formats when downloading models from public repositories.
When pickle files must be used, load them with torch.load(..., weights_only=True) (available in PyTorch 2.0+), which restricts deserialization to tensor data only.
Verify model file checksums and signatures when available on the Hugging Face Hub.
Use Hugging Face's built-in malware scanning, which automatically checks uploaded models for known malicious patterns.
Convert legacy pickle models to safetensors using the conversion tools provided by Hugging Face.

References

Hugging Face. "Safetensors: Simple, safe way to store and distribute tensors." GitHub repository. https://github.com/huggingface/safetensors ↩
Hugging Face. "Safetensors audited as really safe and becoming the default." Hugging Face Blog, May 23, 2023. https://huggingface.co/blog/safetensors-security-audit ↩
Trail of Bits. "EleutherAI / Hugging Face Safetensors Library Security Assessment." May 3, 2023. https://github.com/trailofbits/publications/blob/master/reviews/2023-03-eleutherai-huggingface-safetensors-securityreview.pdf ↩
Hugging Face. "Safetensors documentation: Speed Comparison." https://huggingface.co/docs/safetensors/en/speed ↩
Hugging Face. "Safetensors documentation." https://huggingface.co/docs/safetensors/en/index ↩
EleutherAI. "Safetensors audited as really safe and becoming the default." EleutherAI Blog, May 2023. https://blog.eleuther.ai/safetensors-security-audit/ ↩
JFrog. "PyTorch Users at Risk: Unveiling 3 Zero-Day PickleScan Vulnerabilities." JFrog Blog, 2025. https://jfrog.com/blog/unveiling-3-zero-day-vulnerabilities-in-picklescan/ ↩
Rapid7. "From .pth to p0wned: Abuse of Pickle Files in AI Model Supply Chains." 2025. https://www.rapid7.com/blog/post/from-pth-to-p0wned-abuse-of-pickle-files-in-ai-model-supply-chains/ ↩
safetensors Rust crate. crates.io. https://crates.io/crates/safetensors ↩
safetensors Python package. PyPI. https://pypi.org/project/safetensors/ ↩
Ngxson. "Common AI Model Formats." Hugging Face Blog. https://huggingface.co/blog/ngxson/common-ai-model-formats ↩
Hugging Face. "Metadata Parsing." Safetensors documentation. https://huggingface.co/docs/safetensors/en/metadata_parsing ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

7 revisions by 1 contributors · full history

Suggest edit

What links here

AUTOMATIC1111 Candle (HuggingFace Rust ML)Checkpoint Clément Delangue GGML GGUF LM Studio Model hubs Ollama Optimum-Quanto PoisonGPT SavedModel Text Generation Inference (TGI)bfloat16

Why was safetensors created?

File format specification

Header structure

Design constraints

Supported data types

Is safetensors safe? Security properties

What did the Trail of Bits security audit find?

How does safetensors load models faster?

What does "zero-copy" mean?

Benchmark results

Lazy loading and tensor slicing

Rust implementation

Framework support

Sharded safetensors for large models

Is safetensors the default on Hugging Face?

Default format on Hugging Face Hub

Projects using safetensors

How does safetensors compare to other model formats?

Safetensors vs. pickle

Safetensors vs. GGUF

Safetensors vs. ONNX

Why not other formats?

Limitations

Installation and basic usage

Saving tensors (PyTorch)

Loading tensors

Loading directly to GPU

When was safetensors released? Version history

Security recommendations

See also

References

Improve this article

Related Articles

Hugging Face

PyTorch

llama.cpp

Gradio

MLflow

GGML

What links here

Related Articles

Hugging Face

PyTorch

llama.cpp

Gradio

MLflow

GGML

What links here