# Safetensors

> Source: https://aiwiki.ai/wiki/safetensors
> Updated: 2026-06-23
> Categories: Developer Tools, Machine Learning, Open Source AI
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

**Safetensors** is an open-source tensor serialization format developed by [Hugging Face](/wiki/hugging_face) that stores machine learning model weights as raw tensor data plus a small JSON header, making it impossible to embed executable code in a model file. It was created as a safe, fast, and framework-agnostic replacement for Python's [pickle](/wiki/pickle) module, the format behind [PyTorch](/wiki/pytorch) `.bin` files, where loading an untrusted model can trigger arbitrary code execution on the host system.[1][2] After a May 2023 third-party security audit by Trail of Bits found no critical flaws, safetensors became the default serialization format in Hugging Face's Transformers library.[2][3] Hugging Face summarized the result plainly: "Audit shows that safetensors is safe and ready to become the default."[2]

The format stores only raw tensor data and a small JSON header, which makes it impossible to embed executable code in a model file.[5] This design directly addresses the security risks of pickle-based formats like PyTorch `.bin` files, where loading an untrusted model can trigger arbitrary code execution on the host system.[2]

The safetensors library was first released on September 22, 2022, as version 0.0.1 on [PyPI](/wiki/pypi).[10] The primary author is Nicolas Patry, a machine learning engineer at Hugging Face known for his work on Rust-based ML infrastructure. The project is licensed under the Apache 2.0 license and hosted at `github.com/huggingface/safetensors`.[1] As of late 2025, the repository has over 3,700 stars on GitHub, 76 contributors, and is used by more than 129,000 downstream projects.[1]

## Why was safetensors created?

Python's pickle module is the default serialization mechanism behind PyTorch model checkpoints (`.pt`, `.pth`, `.bin` files). When a user calls `torch.load()`, pickle deserializes the byte stream back into Python objects. The problem is that pickle was designed to reconstruct arbitrary Python objects, including their behavior, not just their data. An attacker can craft a pickle file that executes arbitrary Python code the moment it is loaded.[2] This code could exfiltrate credentials, install a backdoor, open a reverse shell, or run any other system-level command.[8]

This is not a theoretical risk. Researchers at JFrog, Rapid7, and Snyk have documented real-world malicious models uploaded to public model hubs.[8] Weaponized `.pth` files have been found on the [Hugging Face Hub](/wiki/hugging_face) containing payloads that perform system fingerprinting, credential theft, and remote access trojan installation.[8] Pickle's `REDUCE`, `GLOBAL`, and `STACK_GLOBAL` opcodes allow a pickle program to import and invoke arbitrary Python callables during unpickling. An attacker can use these opcodes to call `os.system` or similar functions, achieving arbitrary shell command execution.[8]

Pickle scanning tools like picklescan exist to detect known malicious patterns, but researchers have discovered multiple bypass vulnerabilities in these scanners.[7] JFrog disclosed three zero-day vulnerabilities in picklescan in 2025, demonstrating that static analysis cannot fully secure an inherently unsafe format.[7] Repositories with pickle models are downloaded over 2.1 billion times per month from the Hugging Face model hub, making the attack surface enormous.[7] Even in 2025, roughly 44.9% of popular models on the Hub still ship in the insecure pickle format, which is why migration to safetensors remains an active security goal.[7]

Hugging Face's own scanning infrastructure catches some malicious uploads, but the fundamental issue is that pickle allows code execution by design.[2] No amount of scanning can guarantee safety for an inherently unsafe format. Safetensors was built from the ground up to solve this problem: it stores only numerical data and metadata, with no mechanism for embedding or executing code.[1]

## File format specification

The safetensors format is intentionally simple. A `.safetensors` file consists of three contiguous sections:

| Section | Size | Description |
|---|---|---|
| Header size | 8 bytes | Unsigned 64-bit little-endian integer specifying the length of the JSON header in bytes |
| JSON header | N bytes (value from header size) | UTF-8 encoded JSON object containing tensor metadata |
| Data buffer | Remaining bytes | Raw tensor data stored as a contiguous binary block, without compression |

### Header structure

The JSON header is a dictionary where each key is a tensor name and each value is an object with three fields:

- `dtype`: The data type of the tensor (e.g., `F16`, `BF16`, `F32`, `I32`)
- `shape`: An array of integers representing the tensor dimensions
- `data_offsets`: A two-element array `[BEGIN, END]` specifying the byte range of the tensor's data within the data buffer (not absolute file offsets)

A special key called `__metadata__` can store arbitrary string-to-string key-value pairs for user-defined metadata such as model architecture, training configuration, or authorship information.[12] Because metadata values must be strings, complex data structures need to be serialized as JSON strings within the value.[12] This metadata is commonly used by tools like CivitAI and [LoRA](/wiki/lora) training scripts to record learning rates, training epochs, and dataset information.

Here is a simplified example of a header:

```json
{
  "__metadata__": {"format": "pt"},
  "model.embed.weight": {
    "dtype": "F16",
    "shape": [32000, 4096],
    "data_offsets": [0, 262144000]
  },
  "model.norm.weight": {
    "dtype": "F16",
    "shape": [4096],
    "data_offsets": [262144000, 262152192]
  }
}
```

### Design constraints

Several constraints are enforced to prevent abuse and ensure correctness:

- The header must begin with the `{` character (byte `0x7B`).
- The header size is capped at 100 MB. This prevents denial-of-service attacks that could force parsing of an extremely large JSON blob.
- Duplicate tensor names are not allowed.
- The entire data buffer must be indexed with no gaps or overlapping byte ranges, which prevents buffer overrun vulnerabilities.
- Tensor data uses little-endian byte order and C-order (row-major) memory layout.
- NaN and infinity values are permitted in floating-point tensors.

### Supported data types

Safetensors supports a range of data types relevant to modern [machine learning](/wiki/machine_learning) workloads:

| Dtype string | Description | Bytes per element |
|---|---|---|
| `BOOL` | Boolean | 1 |
| `U8` | Unsigned 8-bit integer | 1 |
| `I8` | Signed 8-bit integer | 1 |
| `U16` | Unsigned 16-bit integer | 2 |
| `I16` | Signed 16-bit integer | 2 |
| `F16` | IEEE 754 half-precision float | 2 |
| `BF16` | Brain floating point ([bfloat16](/wiki/bfloat16)) | 2 |
| `U32` | Unsigned 32-bit integer | 4 |
| `I32` | Signed 32-bit integer | 4 |
| `F32` | IEEE 754 single-precision float | 4 |
| `F64` | IEEE 754 double-precision float | 8 |
| `I64` | Signed 64-bit integer | 8 |
| `U64` | Unsigned 64-bit integer | 8 |
| `F8_E4M3` | FP8 (4-bit exponent, 3-bit mantissa) | 1 |
| `F8_E5M2` | FP8 (5-bit exponent, 2-bit mantissa) | 1 |

The inclusion of bfloat16 and FP8 types is notable because some older formats (such as HDF5 and [NumPy](/wiki/numpy)'s `.npz`) lack native support for these types, which are now standard in large-scale model training and inference.

## Is safetensors safe? Security properties

The central promise of safetensors is that loading a `.safetensors` file cannot execute arbitrary code, regardless of how the file was crafted.[2] This holds because:

1. The header is parsed as JSON, a data-only format with no code execution semantics.
2. The data buffer contains only raw numerical bytes. There are no object graphs, no class references, and no callback hooks.
3. The Rust-based parser does not use `eval`, `exec`, or any form of dynamic code loading.

### What did the Trail of Bits security audit find?

In May 2023, Hugging Face, [EleutherAI](/wiki/eleutherai), and [Stability AI](/wiki/stability_ai) jointly commissioned an external security audit of the safetensors library.[2] The audit was performed by Trail of Bits, a well-known cybersecurity firm that has audited projects across the blockchain, infrastructure, and AI security domains.[3] The results were published on May 23, 2023.[2]

The audit's findings were:

- No critical security flaw leading to arbitrary code execution was found.[3]
- Three issues of medium severity were identified, all of which were addressed.[3]
- Imprecisions in the format specification were detected and corrected.[3]
- Missing validation that allowed "polyglot" files (files that are simultaneously valid in multiple formats) was fixed.[3]
- Improvements to the test suite were proposed and implemented.[3]

Hugging Face's published conclusion was that "the audit shows that safetensors is safe and ready to become the default," and that the library would "now be installed in `transformers` by default."[2] All three organizations agreed to make the full audit report public, in keeping with open-source transparency principles.[2] The report is available through Trail of Bits' publications repository on GitHub.[3] EleutherAI published a companion blog post confirming their endorsement of safetensors as a safe default format.[6]

## How does safetensors load models faster?

Safetensors achieves significant speed improvements over pickle-based loading, primarily through memory mapping (mmap).[4] When a safetensors file is loaded, the operating system maps the file directly into the process's virtual address space. Tensor data can then be accessed without copying it into a separate memory allocation.[4]

With pickle-based PyTorch loading (`torch.load()`), the process typically works as follows: PyTorch creates empty tensors, loads data from the pickle file, and then copies the data into the empty tensors. This requires approximately twice the model's size in available RAM, since both the loaded data and the destination tensors exist in memory simultaneously.[4]

Safetensors avoids this double-allocation problem. On CPU, tensors reference the memory-mapped file directly. On [GPU](/wiki/gpu_computing), the library uses `cudaMemcpy` to transfer data straight from the mapped file to GPU memory, bypassing intermediate CPU tensor allocations.[4]

### What does "zero-copy" mean?

The term "zero-copy" deserves some clarification. On CPU, when the file contents are already in the OS page cache (for example, after a recent read), loading a safetensors file can be truly zero-copy: the tensor objects point directly at the mapped memory with no data duplication at all. On GPU, a copy from host memory to device memory is always required (there is no way around the PCIe or NVLink transfer), but safetensors still eliminates the intermediate step of allocating CPU tensors.[4] In practice, this means safetensors skips one full copy of the model's weights compared to PyTorch's standard loading path.

### Benchmark results

The official benchmarks from the Hugging Face documentation use [GPT-2](/wiki/gpt-2) weights and were run on Ubuntu 18.04.6 LTS with an Intel Xeon CPU @ 2.00 GHz and a Tesla T4 GPU ([CUDA](/wiki/cuda) 11.2, driver version 460.32.03):[4]

| Device | Safetensors load time | PyTorch load time | Speedup |
|---|---|---|---|
| CPU | 0.004 seconds | 0.307 seconds | 76.6x faster |
| GPU (CUDA, Tesla T4) | 0.165 seconds | 0.354 seconds | 2.1x faster |

The CPU speedup is dramatic because memory mapping avoids all data copying. The GPU speedup is smaller because the bottleneck shifts to the PCIe bus transfer between host memory and GPU memory, which both formats must perform.[4]

For large-scale models, the gains are even more pronounced. The [BLOOM](/wiki/bloom) 176-billion-parameter model loaded on 8 GPUs in approximately 45 seconds using safetensors, compared to roughly 10 minutes with standard PyTorch pickle weights.[2] This roughly 13x improvement comes from lazy loading, where safetensors reads specific tensor shards directly to each GPU without loading the entire model into CPU memory first.[2]

### Lazy loading and tensor slicing

Safetensors supports lazy loading, meaning individual tensors can be read from the file without loading the entire model into memory.[5] This is particularly useful for distributed inference, where each GPU only needs a subset of the model's weights. The library also supports tensor slicing, allowing users to read a sub-range of a tensor:[5]

```python
from safetensors import safe_open

with safe_open("model.safetensors", framework="pt", device="cuda:0") as f:
    tensor_slice = f.get_slice("embedding")
    vocab_size, hidden_dim = tensor_slice.get_shape()
    partial = tensor_slice[:1000, :]  # Load only first 1000 rows
```

This capability is useful for vocabulary-parallel inference, where different GPUs handle different portions of the embedding table.

## Rust implementation

The core safetensors library is implemented in Rust, which provides memory safety guarantees without a garbage collector.[1] The Rust implementation handles all file parsing, header validation, and data access logic. Language-specific bindings (for Python, JavaScript, and others) call into the Rust code through foreign function interfaces built with [PyO3](/wiki/pyo3) and `setuptools-rust`.[1]

Using Rust adds a meaningful layer of security: common vulnerability classes such as buffer overflows, use-after-free errors, and data races are prevented by the Rust compiler's ownership and borrowing rules. This is particularly relevant for a library that parses untrusted binary data downloaded from the internet.

The Rust crate is published on [crates.io](https://crates.io/crates/safetensors) as `safetensors`.[9] It has accumulated over 980,000 monthly downloads and is used by more than 760 other Rust crates, including Hugging Face's [Candle](/wiki/candle) deep learning framework, which uses safetensors as its primary model loading mechanism.[9]

## Framework support

Safetensors provides native bindings for multiple [deep learning](/wiki/deep_learning) frameworks:[5]

| Framework | Module | Notes |
|---|---|---|
| [PyTorch](/wiki/pytorch) | `safetensors.torch` | Full support; default in Transformers since v4.35.0 |
| [TensorFlow](/wiki/tensorflow) | `safetensors.tensorflow` | Full support |
| [JAX](/wiki/jax) / [Flax](/wiki/flax) | `safetensors.flax` | Full support |
| [NumPy](/wiki/numpy) | `safetensors.numpy` | Full support |
| [PaddlePaddle](/wiki/paddlepaddle) | `safetensors.paddle` | Full support |

Because the file format stores raw tensor data without framework-specific metadata, the same `.safetensors` file can be loaded in different frameworks. A model saved from PyTorch can be loaded in TensorFlow or JAX, provided the consuming code knows the expected tensor names and shapes. This is a meaningful advantage over pickle, which is tightly coupled to Python and often to the specific framework version that created the file.

## Sharded safetensors for large models

Models with billions of parameters often exceed the memory capacity of a single device or the practical limits of a single file. Safetensors supports sharding, where model weights are split across multiple files with a JSON index file that maps tensor names to their respective shard files.[5]

A sharded model typically looks like this on disk:

```
model-00001-of-00008.safetensors
model-00002-of-00008.safetensors
...
model-00008-of-00008.safetensors
model.safetensors.index.json
```

The `model.safetensors.index.json` file contains two keys:

- `metadata`: An object with information such as the total model size in bytes
- `weight_map`: A dictionary mapping each parameter name to the shard file that contains it

The Hugging Face [Transformers](/wiki/transformers_library) library and the [Accelerate](/wiki/accelerate) library handle sharding automatically. When calling `save_pretrained()`, the model is split into shards of a configurable maximum size (typically 5 GB per shard). During loading, `from_pretrained()` reads the index file and loads only the required shards, enabling efficient distribution of weights across multiple GPUs with minimal memory overhead.

Sharding is particularly important for models with tens or hundreds of billions of parameters. A 70-billion-parameter model stored in FP16 requires approximately 140 GB of storage, which would be impractical as a single file for downloading, checkpointing, and distributed loading across multiple GPUs.

## Is safetensors the default on Hugging Face?

### Default format on Hugging Face Hub

Following the successful Trail of Bits security audit, Hugging Face made safetensors the default serialization format in the Transformers library.[2] Since that transition, `save_pretrained()` saves models as `.safetensors` files unless explicitly overridden. When loading, the library prefers `.safetensors` files over `.bin` files if both are available. Adoption is not yet universal: as of 2025 roughly 44.9% of popular models on the Hub still publish a pickle-format file alongside or instead of safetensors, though most newly released frontier models (such as Llama 4, Qwen 3, and DeepSeek-R1) ship in safetensors.[7]

Hugging Face also provides a web-based conversion tool at [huggingface.co/spaces/safetensors/convert](https://huggingface.co/spaces/safetensors/convert) that allows users to convert existing pickle-based models to safetensors format directly on the Hub.[2]

Major model families released on the Hugging Face Hub, including [Llama 3](/wiki/llama), [Gemma](/wiki/gemma), [Phi](/wiki/phi), [Mistral](/wiki/mistral), [Qwen](/wiki/qwen), [Stable Diffusion XL](/wiki/stable_diffusion), and [Flux](/wiki/flux), all use safetensors as their primary distribution format.

### Projects using safetensors

The format has been adopted by a wide range of open-source projects:

| Project | Use case |
|---|---|
| [Transformers](/wiki/transformers_library) | Default model serialization for Hugging Face models |
| [Diffusers](/wiki/diffusers) | Image generation model storage |
| [Candle](/wiki/candle) | Rust-based ML inference |
| [MLX](https://github.com/ml-explore/mlx) | Apple Silicon ML framework |
| [llama.cpp](/wiki/llama_cpp) | CPU/GPU LLM inference (reads safetensors for conversion) |
| [AUTOMATIC1111 Stable Diffusion WebUI](/wiki/stable_diffusion) | Image generation interface |
| [ComfyUI](/wiki/comfyui) | Node-based image generation workflow |
| [CivitAI](https://civitai.com/) | Community model sharing platform |
| [ColossalAI](https://github.com/hpcaitech/ColossalAI) | Distributed training framework |
| [InvokeAI](https://github.com/invoke-ai/InvokeAI) | Creative AI toolkit |
| [oobabooga text-generation-webui](https://github.com/oobabooga/text-generation-webui) | Local LLM chat interface |
| [vLLM](/wiki/vllm) | High-throughput LLM serving |
| [pytorch-image-models (timm)](https://github.com/huggingface/pytorch-image-models) | Computer vision model hub |
| [BERTopic](https://github.com/MaartenGr/BERTopic) | Topic modeling library |

## How does safetensors compare to other model formats?

Several model serialization formats compete in the machine learning ecosystem. Each serves a different primary use case.

| Feature | Safetensors | Pickle / PyTorch .bin | [GGUF](/wiki/gguf) | [ONNX](/wiki/onnx) |
|---|---|---|---|---|
| Primary use case | Model weight storage and sharing | PyTorch checkpointing | Local LLM inference ([llama.cpp](/wiki/llama_cpp), [Ollama](/wiki/ollama)) | Cross-framework model deployment |
| Safety (no code execution) | Yes | No (arbitrary code execution possible) | Yes | Yes |
| Memory mapping (mmap) | Yes | No | Yes | No |
| Lazy loading | Yes | No | Yes | No |
| Stores computation graph | No (weights only) | No (weights only) | No (weights + tokenizer + config) | Yes (full graph + weights) |
| Built-in quantization | No (stores pre-quantized weights) | No | Yes (Q2 through Q8, k-quants) | Yes (via ONNX quantization tools) |
| Single-file format | Yes (or sharded with index) | Yes (or sharded) | Yes | Yes |
| bfloat16 support | Yes | Yes | Yes | Limited |
| FP8 support | Yes (F8_E4M3, F8_E5M2) | Yes (via PyTorch) | Yes | No |
| Framework-agnostic | Yes | No (Python/PyTorch specific) | Yes (binary format) | Yes (uses protobuf) |
| Typical file extension | `.safetensors` | `.bin`, `.pt`, `.pth` | `.gguf` | `.onnx` |
| Typical size (7B model, FP16) | ~14 GB | ~14 GB | ~4 GB (Q4 quantized) | ~14 GB |
| Developer | Hugging Face | PyTorch / Meta | ggml.ai (Georgi Gerganov) | Linux Foundation (ONNX Project) |
| License | Apache 2.0 | Python/BSD | MIT | Apache 2.0 |

### Safetensors vs. pickle

The most direct comparison is between safetensors and pickle-based formats (`.pt`, `.bin`, `.pth`). Safetensors is strictly safer because it cannot execute code.[2] It is also faster on CPU because of memory mapping.[4] Pickle files, however, can serialize arbitrary Python objects, not just tensors; safetensors is limited to tensor data and string metadata. For pure model weight storage and distribution, safetensors is the preferred format. PyTorch 2.0 introduced a `weights_only=True` parameter for `torch.load()` that restricts deserialization to tensor data, but this mitigation is opt-in and does not change the fundamental pickle format.[8]

### Safetensors vs. GGUF

GGUF (Georgi Gerganov Universal Format) was designed for the [llama.cpp](/wiki/llama_cpp) project and is optimized for running quantized models on consumer hardware, especially on CPUs.[11] GGUF files bundle model weights, tokenizer configuration, and architecture metadata into a single file, and support a wide range of quantization formats (from 2-bit to 8-bit integers, including k-quant variants like Q4_K_M and Q5_K_S).[11] A 7-billion-parameter model quantized to Q4 in GGUF format is roughly 4 GB, compared to approximately 14 GB for the same model in FP16 safetensors.[11] Safetensors and GGUF serve different stages of the model lifecycle: safetensors for training, [fine-tuning](/wiki/fine_tuning), and distribution; GGUF for efficient local inference.[11] Models are typically converted from safetensors to GGUF using conversion scripts provided by the llama.cpp project.

### Safetensors vs. ONNX

ONNX (Open Neural Network Exchange) stores not just model weights but the full computation graph, including operator definitions and layer connections.[11] This makes ONNX suitable for deploying models across different inference runtimes ([ONNX Runtime](/wiki/onnx), [TensorRT](/wiki/tensorrt), CoreML) without needing the original training framework. ONNX launched in 2017 as a collaboration between Microsoft and Facebook (now Meta) and uses Protocol Buffers for serialization.[11] Safetensors does not store computation graphs; it is purely a weight storage format.[11] Models stored in safetensors still require the original model architecture code (or a framework like `transformers`) to reconstruct the model for inference.

## Why not other formats?

The safetensors documentation explains why several existing formats were considered and rejected:[5]

| Format | Reason rejected |
|---|---|
| Pickle | Allows arbitrary code execution |
| HDF5 | Complex codebase (~210,000 lines of C) with historical CVEs |
| Protobuf | 2 GB file size limit |
| Cap'n Proto | No native float16 support |
| NumPy .npz | Vulnerable to zip bomb attacks; uses pickle internally for object arrays |
| Flatbuffers | No native bfloat16/FP8 support at the time of development |

This analysis led the Hugging Face team to create a new format from scratch rather than building on an existing one.

## Limitations

Safetensors is intentionally limited in scope. It stores only tensor data and simple string metadata. It does not store:

- Model architecture or computation graphs (unlike ONNX)
- Optimizer state (though optimizer tensors can be saved separately)
- Custom Python objects or classes
- Quantization metadata beyond what is encoded in the dtype (unlike GGUF's rich quantization metadata)
- Tokenizer configuration (unlike GGUF, which bundles tokenizer data)
- Training hyperparameters (unless manually added to `__metadata__`)

This narrow scope is a deliberate trade-off. By refusing to serialize arbitrary objects, the format remains safe by construction. Model architecture information is typically stored in a separate `config.json` file alongside the safetensors weights, as is standard practice in the Hugging Face ecosystem.

Another limitation is that safetensors files for full-precision models are large. A 7-billion-parameter model in FP16 occupies approximately 14 GB. Unlike GGUF, safetensors does not perform quantization; it stores whatever precision the tensors are already in. Users who need smaller files for local inference typically convert safetensors models to GGUF with quantization applied.

## Installation and basic usage

Safetensors can be installed from PyPI or conda:

```bash
pip install safetensors
```

```bash
conda install -c huggingface safetensors
```

### Saving tensors (PyTorch)

```python
import torch
from safetensors.torch import save_file

tensors = {
    "embedding": torch.zeros((2, 2)),
    "attention": torch.zeros((2, 3))
}
save_file(tensors, "model.safetensors")
```

### Loading tensors

```python
from safetensors import safe_open

tensors = {}
with safe_open("model.safetensors", framework="pt", device="cpu") as f:
    for key in f.keys():
        tensors[key] = f.get_tensor(key)
```

### Loading directly to GPU

```python
from safetensors.torch import load_file

tensors = load_file("model.safetensors", device="cuda:0")
```

## When was safetensors released? Version history

The safetensors library has been under active development since its initial release:

| Version | Date | Notes |
|---|---|---|
| 0.0.1 | September 22, 2022 | Initial release on PyPI |
| 0.2.0 | September 29, 2022 | First major iteration with Rust backend |
| 0.2.4 | November 7, 2022 | Performance improvements (Nicolas Patry announced 150x CPU speedup) |
| 0.2.5 | November 23, 2022 | Stability improvements |
| 0.3.x | Early 2023 | Added framework bindings for TensorFlow, JAX, Flax |
| 0.4.0 | October 2023 | Improved GPU loading and metadata support |
| 0.4.5 | September 2024 | Bug fixes and performance improvements |
| 0.5.0 | January 2025 | Further refinements |
| 0.7.0 | November 19, 2025 | Latest stable release |

The project has not yet reached version 1.0, but it carries a "Production/Stable" development status designation on PyPI.[10] The library is maintained by Nicolas Patry, Daniel de Kok, and other Hugging Face engineers. As of version 0.7.0, there have been 45 total releases and 372 commits to the repository.[10]

## Security recommendations

The safetensors project and the broader ML security community recommend the following practices:

1. Always prefer safetensors over pickle-based formats when downloading models from public repositories.
2. When pickle files must be used, load them with `torch.load(..., weights_only=True)` (available in PyTorch 2.0+), which restricts deserialization to tensor data only.
3. Verify model file checksums and signatures when available on the Hugging Face Hub.
4. Use Hugging Face's built-in malware scanning, which automatically checks uploaded models for known malicious patterns.
5. Convert legacy pickle models to safetensors using the conversion tools provided by Hugging Face.

## See also

- [Hugging Face](/wiki/hugging_face)
- [PyTorch](/wiki/pytorch)
- [GGUF](/wiki/gguf)
- [ONNX](/wiki/onnx)
- [Pickle](/wiki/pickle)
- [Quantization](/wiki/quantization)
- [Large language model](/wiki/large_language_model)

## References

1. Hugging Face. "Safetensors: Simple, safe way to store and distribute tensors." GitHub repository. [https://github.com/huggingface/safetensors](https://github.com/huggingface/safetensors)
2. Hugging Face. "Safetensors audited as really safe and becoming the default." Hugging Face Blog, May 23, 2023. [https://huggingface.co/blog/safetensors-security-audit](https://huggingface.co/blog/safetensors-security-audit)
3. Trail of Bits. "EleutherAI / Hugging Face Safetensors Library Security Assessment." May 3, 2023. [https://github.com/trailofbits/publications/blob/master/reviews/2023-03-eleutherai-huggingface-safetensors-securityreview.pdf](https://github.com/trailofbits/publications/blob/master/reviews/2023-03-eleutherai-huggingface-safetensors-securityreview.pdf)
4. Hugging Face. "Safetensors documentation: Speed Comparison." [https://huggingface.co/docs/safetensors/en/speed](https://huggingface.co/docs/safetensors/en/speed)
5. Hugging Face. "Safetensors documentation." [https://huggingface.co/docs/safetensors/en/index](https://huggingface.co/docs/safetensors/en/index)
6. EleutherAI. "Safetensors audited as really safe and becoming the default." EleutherAI Blog, May 2023. [https://blog.eleuther.ai/safetensors-security-audit/](https://blog.eleuther.ai/safetensors-security-audit/)
7. JFrog. "PyTorch Users at Risk: Unveiling 3 Zero-Day PickleScan Vulnerabilities." JFrog Blog, 2025. [https://jfrog.com/blog/unveiling-3-zero-day-vulnerabilities-in-picklescan/](https://jfrog.com/blog/unveiling-3-zero-day-vulnerabilities-in-picklescan/)
8. Rapid7. "From .pth to p0wned: Abuse of Pickle Files in AI Model Supply Chains." 2025. [https://www.rapid7.com/blog/post/from-pth-to-p0wned-abuse-of-pickle-files-in-ai-model-supply-chains/](https://www.rapid7.com/blog/post/from-pth-to-p0wned-abuse-of-pickle-files-in-ai-model-supply-chains/)
9. safetensors Rust crate. crates.io. [https://crates.io/crates/safetensors](https://crates.io/crates/safetensors)
10. safetensors Python package. PyPI. [https://pypi.org/project/safetensors/](https://pypi.org/project/safetensors/)
11. Ngxson. "Common AI Model Formats." Hugging Face Blog. [https://huggingface.co/blog/ngxson/common-ai-model-formats](https://huggingface.co/blog/ngxson/common-ai-model-formats)
12. Hugging Face. "Metadata Parsing." Safetensors documentation. [https://huggingface.co/docs/safetensors/en/metadata_parsing](https://huggingface.co/docs/safetensors/en/metadata_parsing)

