tinygrad

Deep Learning Developer Tools Open Source AI

20 min read

Updated Jul 16, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 16, 2026

Fact-checked

In review queue

Sources

22 citations

Revision

v3 · 4,081 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

tinygrad is an open-source deep learning framework written primarily in Python that aims to occupy the space between Andrej Karpathy's pedagogical micrograd and full-scale production stacks like PyTorch. It was started by George Hotz (geohot) on 17 October 2020 as a deliberately minimal alternative whose readable source code makes the entire compiler and intermediate representation visible to the user.^[1]^[2] The framework is maintained by the tiny corp, a company Hotz founded that announced a $5.1 million seed round on 24 May 2023 and now sells the tinybox, a multi-GPU workstation built around consumer Radeon and GeForce cards.^[3]^[4] tinygrad's stated mission is to "commoditize the petaflop," and the project deliberately targets accelerators beyond NVIDIA, with a particular focus on AMD GPUs through a near-complete software stack of its own.^[3]^[5]

Infobox

Field	Value
First commit	17 October 2020^[2]
Original author	George Hotz
Maintainer	the tiny corp^[1]
License	MIT^[1]
Language	Python (with C, CUDA, Metal kernels)^[1]
Latest documented version	0.12.0 (12 January 2026)^[1]
Approximate size (excl. tests)	~18,935 lines^[2]
GitHub stars (Jan 2026)	32.7k^[1]
Company funding	$5.1M seed, May 2023^[3]
Primary product	tinybox (red v2, green v2)^[6]

History

Origins as a toy project (2020 to 2022)

The first commit to the tinygrad repository was made on 17 October 2020 by Hotz, the security researcher widely known for jailbreaking the iPhone and the PlayStation 3 and for founding the autonomous-driving startup comma.ai.^[2]^[7] The project began as an experiment to see how small a working deep learning framework could be while still supporting the operations needed to train neural networks. Hotz framed it as something "between PyTorch and micrograd," with the readability and hackability of Karpathy's 200-line educational engine but the practical surface area of a real tensor library.^[1]

Through 2021 and 2022 tinygrad accumulated backends and operators while staying nominally under a self-imposed ceiling of a few thousand lines. The project's README has long emphasized that adding a new accelerator requires implementing only roughly 25 low-level operations, which kept the surface area small enough to be ported by small teams.^[1] Hotz's company comma.ai adopted tinygrad as a model-runtime alternative to its earlier thneed and Qualcomm SNPE pipelines, eventually moving the openpilot driving model entirely onto a tinygrad QCOM backend on the comma 3X devkit.^[8]

During this early period, much of tinygrad's design was driven by Hotz's livestream-based development style, where he would often refactor large portions of the codebase on camera. The repository's commit history reflects an emphasis on deletions rather than additions: an unusually high ratio of red-line to green-line changes, motivated by the goal of keeping the entire framework small enough that a new contributor could read every file in a single weekend.^[1]^[9]

Founding the tiny corp (2022 to 2023)

Hotz incorporated "the tiny corp" in late 2022 and announced a $5.1 million seed round on 24 May 2023 in a blog post titled "the tiny corp raised $5.1M." The post described the company's plan to "commoditize the petaflop" by building a consumer-priced AI workstation and porting machine-learning kernels to non-NVIDIA hardware.^[3] The same post outlined a hiring model entirely unlike a traditional startup: the only way to be hired was to submit "high quality pull requests" to the tinygrad repository, and contributors could earn cash bounties posted on GitHub for tasks aligned with the project's roadmap.^[3]

Hotz repeatedly stated that the company's initial commercial focus would be a small-form-factor box capable of training and inferring on large models, and that the long arc would be to challenge NVIDIA's CUDA software moat by making AMD's RDNA3 hardware practical for ML.^[3]^[5] In the same announcement post he described tinygrad's design as deliberately constrained to twelve operations supporting only addition and multiplication, and explicitly avoiding Turing-complete kernels so that the framework could perform static analysis over memory access patterns. He argued that this constraint was the structural reason a small team could optimise modern ML workloads without relying on opaque vendor compilers.^[3]

Public benchmarks and AMD work (2023 to 2025)

Through 2024 the tiny corp publicly worked on getting AMD's Radeon RX 7900 XTX into the MLPerf benchmark suite. The company posted detailed engineering notes on stability problems with AMD's firmware and driver, eventually documenting the 7900 XTX in its own tinygrad/7900xtx repository.^[5] By early 2026, Hotz announced a "completely sovereign" compute stack for AMD GPUs, in which tinygrad implements its own kernel-level driver in Python, bypasses AMD's Micro Engine Scheduler (MES) firmware, and submits PM4 packets directly rather than going through the higher-level ROCm/HSA stack.^[5]^[9]

In a fifth-anniversary blog post dated 29 December 2025, Hotz wrote that tinygrad's code (excluding tests) had grown to about 18,935 lines, that the team had grown to six people, that the computer-sales business generated roughly $2 million a year, and that the project earned additional revenue from AMD contracts related to MLPerf benchmarking.^[2]^[9] He also noted that during the year the project had eliminated its remaining LLVM dependency, so a fresh checkout of tinygrad now requires nothing more than a Python installation for AMD code generation, with the LLVM path retained only as one optional fallback.^[9]^[5]

Design philosophy

tinygrad is organized around three load-bearing ideas.

Laziness and a lazy tensor abstraction

Every operation on a Tensor in tinygrad returns a new lazy node rather than executing immediately. Computation is only triggered when the user calls .realize() or when an operation otherwise needs a concrete numeric result, such as conversion to NumPy or printing.^[10] This delayed execution lets the scheduler see a larger graph of pending operations before committing to a code-generation strategy. By contrast, PyTorch's eager mode executes most operators immediately and relies on torch.compile for similar fusion opportunities.^[10]^[1]

Kernel fusion via the ShapeTracker

Because tensors are lazy, tinygrad can examine many pending operations and decide which can be fused into a single GPU kernel. Movement operations (reshape, permute, expand, pad, shrink, stride) are represented symbolically through a data structure called the ShapeTracker, which composes views of an underlying buffer without copying data. The scheduling phase determines which operations can be merged into one fused kernel and which need to be "realized" first, while the lowering phase emits target-specific code for that kernel.^[11]^[10]

The framework distills neural-network workloads down to a small set of fundamental UOps (unary, binary, reduce, movement) rather than implementing specialized convolutions or matrix multiplies as monolithic operators. This means autodifferentiation and accelerator support fall out automatically for any operation expressible in those primitives.^[3]^[1]

Visible compiler, no autograd graph

A core ergonomic claim of tinygrad is that the entire compiler and IR are user-visible. The DEBUG=2 environment variable, for instance, prints every kernel that is compiled and dispatched, including timing, FLOPS, and bandwidth estimates. Hotz frames this as the project's "show me the kernel" thesis: a deep-learning framework should let the user trace the path from a high-level nn.Linear call all the way down to the bytes that run on the GPU.^[1]^[11]

tinygrad does not maintain a PyTorch-style dynamic Python autograd tape. Backward passes are computed symbolically against the lazy computation graph, which makes whole-program optimization, JIT replay via the @TinyJit decorator, and ahead-of-time compilation possible without an additional graph capture step.^[10]

Minimalism as a design constraint

The project's working slogan, repeated in Hotz's blog and on tinygrad.org, is "the best part is no part."^[9] The team explicitly resists adding compatibility layers, vendored kernel libraries, or auto-tuning systems if a smaller implementation suffices. As of late 2025 tinygrad operates without any required external dependency beyond Python itself, having removed its prior reliance on LLVM for AMD code generation as part of the sovereign-stack push.^[9]^[5]

Technical architecture

The execution pipeline can be summarised in three layers:

Frontend. The Tensor class exposes a PyTorch-like API: element-wise math, reductions, conv2d, matmul, softmax, and so on. Calls construct a lazy graph of high-level operations.^[10]
Scheduler and lowering. When realization is requested, the scheduler walks the lazy graph, applies the ShapeTracker to decide fusion boundaries, and emits a sequence of UOp programs to be compiled. The lowering stage rewrites these UOps into device-specific instructions.^[11]
Runtime and driver. Each backend provides a tiny runtime that compiles the emitted source, allocates buffers, dispatches kernels, and synchronises results back. For AMD, tinygrad ships its own user-space queue submission code that talks to the GPU through PM4 packets, bypassing the higher-level user-mode drivers.^[5]

The frontend deliberately omits the nn.Module base class familiar from PyTorch. Neural-network modules in tinygrad are plain Python classes whose forward pass is written as __call__ rather than forward, and stateless operations are exposed as plain methods on Tensor rather than as wrapped classes. Tensor sharding across multiple GPUs is built in via Tensor.shard, which annotates a tensor with a list of devices and a shard axis, so the same model code can be moved to multiple GPUs by changing a single argument rather than by introducing distributed-training wrappers.^[10]

The @TinyJit decorator captures the kernels launched on the first call into a function and replays them on subsequent calls, giving JIT-style performance without requiring a separate compilation phase or a static graph language. Because the captured kernels are exactly the ones the user can already inspect via DEBUG=2, the JIT does not introduce a new black box.^[10]^[1]

Backends and runtimes

tinygrad's documentation lists the following first-party runtimes, each selectable through the DEV (or legacy CUDA=1, METAL=1, etc.) environment variables.

Runtime	Hardware	Compiler / interface	Notes
NV	NVIDIA Ampere, Ada, Blackwell	nvrtc or PTX	Native NVIDIA path^[12]
CUDA	NVIDIA	nvrtc or PTX	Uses NVIDIA's CUDA driver^[12]
AMD	AMD RDNA3, RDNA4, CDNA3, CDNA4	LLVM or HIP/COMGR	Includes "AM" sovereign driver bypassing ROCm^[12]^[5]
METAL	Apple M1 and later	Metal Shading Language	Production-ready on macOS/iOS^[12]
QCOM	Qualcomm Adreno 6xx	OpenCL kernels	Used by comma.ai openpilot^[12]^[8]
CL	OpenCL-capable GPUs	OpenCL	Generic fallback^[12]
WEBGPU	Browsers, Dawn	WGSL via Google Dawn	Runs inference inside Chrome^[12]
CPU	x86, ARM, RISC-V	Clang or LLVM	Reference path^[12]

The framework also provides zero-copy interoperation with PyTorch CUDA/Metal tensors and with OpenCL on Qualcomm through the Tensor.from_blob API.^[12]

The tinybox product line

The tinybox is a self-contained workstation aimed at researchers and small teams who want local training and inference of large models. It is built by the tiny corp and ships in several SKUs colour-coded by GPU vendor.^[6]^[4]

Common chassis

The tinybox is a 12U rack-mountable case, measuring 19" wide, 21" tall, and 16.25" deep, and weighs roughly 70 lb in the v2 form. Original boxes used dual 1600 W power supplies and required either a 120 V 30 A circuit or a 220 V 20 A circuit, with an option to power-limit GPUs to about 150 W each for single-outlet operation. Every variant ships with Ubuntu 22.04, tinygrad pre-installed, and PyTorch available as a fallback runtime.^[6]^[13]

tinybox red (AMD, original)

The original red tinybox uses six AMD Radeon RX 7900 XTX cards, paired with a 32-core AMD EPYC Genoa CPU, 128 GB of system RAM, four Western Digital SN850X 1 TB NVMe SSDs in RAID plus a separate 1 TB boot drive, and an empty 16x OCP 3.0 slot for networking. Tom's Hardware reported the headline performance figure as 738 FP16 TFLOPS with 96 GB of aggregate GDDR6 memory and 21 TB/s of aggregate memory bandwidth, at a retail price of $15,000.^[4]^[14]

A central design choice was that the consumer-grade Radeon RX 7900 XTX exposes the peer-to-peer interconnect that the GeForce RTX 4090 does not, allowing six cards to share data efficiently over PCIe for distributed AI workloads.^[4]

tinybox green (NVIDIA, original)

The green variant substitutes six NVIDIA GeForce RTX 4090 GPUs into the same chassis, reaching roughly 991 FP16 TFLOPS and 144 GB of GDDR6X, at a retail price of $25,000.^[13]^[14] Both red and green originals went on retail in 2024, after a long period of pre-orders during which the company refined the cooling and PCIe topology.^[14]

tinybox pro (NVIDIA, dense)

In 2024 the tiny corp also opened pre-orders for a "tinybox pro" configured with eight RTX 4090 GPUs and two AMD EPYC Genoa processors, listed at $40,000 and aimed at users who wanted denser NVIDIA compute in one box.^[15]

tinybox red v2 (RDNA4)

The current red v2, sold through the tiny corp Shopify store, drops to four AMD Radeon RX 9070 XT (RDNA4) GPUs and is rated at 778 FP16 TFLOPS with 64 GB of VRAM, in a single 15 A plug enclosure. It pairs the four GPUs with a 32-core AMD EPYC CPU, 128 GB of system RAM, a 2 TB NVMe drive, and a 1600 W PSU, at a list price of $12,000.^[16]

tinybox green v2 (Blackwell) and exabox

The green v2 uses four NVIDIA RTX 5090 (Blackwell) GPUs and is sold made-to-order at $65,000, while the tiny corp has publicly described an "exabox" research target priced at roughly $10 million and intended for delivery in 2027, aimed at delivering on the order of one exaflop in a single rack.^[17]^[6]

Summary table

SKU	GPUs	FP16 TFLOPS	GPU memory	Price (USD)
tinybox red (original)	6x Radeon RX 7900 XTX	738^[4]	96 GB GDDR6^[4]	$15,000^[4]
tinybox green (original)	6x GeForce RTX 4090	991^[13]	144 GB GDDR6X^[13]	$25,000^[14]
tinybox pro	8x GeForce RTX 4090	not published	not published	$40,000^[15]
tinybox red v2	4x Radeon RX 9070 XT	778^[16]	64 GB^[16]	$12,000^[16]
tinybox green v2	4x GeForce RTX 5090	not published	not published	$65,000^[17]

The "blue" name has been informally associated by community coverage with a planned AMD Instinct MI300X data-center configuration; the tiny corp has not published a formal blue tinybox product page, and that SKU should not be treated as confirmed for retail sale.^[4]

The bounty system

Rather than running a traditional engineering interview process, the tiny corp posts cash bounties on its GitHub repository for tasks aligned with the project's published roadmap. The amounts are tiered roughly as: $100 for trivial fixes, $200 for a few hours of standalone work, $500 for several days of work with some prerequisites, $1,000 for changes that require refactoring core tinygrad, and larger amounts for multi-week efforts.^[18] Bounties are paid out at Hotz's discretion when a pull request is merged, with payment via USDC on Ethereum or PayPal. The same blog post that announced the seed round stated bluntly that job interviews are "obsolete" and that the only way to be hired at the tiny corp is to submit high-quality pull requests.^[3]^[18]

Many of the public bounties target AMD GPU support specifically, including matrix-instruction "MMAPEAK" benchmarks on the 7900 XTX and 9070 XT, an ACO shader compiler backend, and improvements to the matching engine that schedules fused kernels.^[18] Other bounties touch concerns familiar from any compiler project: faster pattern-matching in the rewrite engine, missing fusion opportunities, and edge cases in shape arithmetic. Hotz has been explicit in public discussions that bounties are not paid for pull requests that introduce serious hacks or unmaintainable code, even if they superficially solve the requested problem, and that the bar for what counts as "clean" is set by his own review.^[18]^[22]

AMD focus and the "show me the kernel" thesis

Since 2023 the tiny corp has framed itself, in Hotz's own posts and livestreams, as the most credible third party trying to make AMD GPUs competitive with NVIDIA for ML.^[3]^[5] The work has two parts.

First, tinygrad's lowering pipeline targets AMD shader ISAs directly and, in the most recent code, no longer requires LLVM as a dependency to do so.^[9] Second, tinygrad implements its own user-space driver, sometimes called "AM," that submits work to RDNA3 GPUs via PM4 packets while largely bypassing the Micro Engine Scheduler firmware. According to Hotz's posts and Phoronix coverage, this gives tinygrad reproducible kernel launches and avoids stability problems that the team encountered with AMD's stock ROCm/HSA stack.^[5]^[9]

The "show me the kernel" rhetorical move is the user-facing counterpart of this work: because the entire compiler is in Python and visible, a developer can set DEBUG=4 and see the exact GPU assembly that tinygrad generated, then file targeted bounties or patches against the lowering stage. Hotz argues that this transparency is the structural advantage that lets a small team make an AMD stack work where larger projects have struggled.^[9]^[11]

A typical ML training loop on a 7900 XTX, according to Hotz's published notes on the tinygrad/7900xtx work, resubmits the same roughly 100-millisecond run-queue containing the fused kernels for forward, backward, and optimizer steps, pointing at different input buffers across iterations. Hotz has stated that this submission pattern, executed by tinygrad's own Python-resident driver, outperforms the conventional ROCm-mediated path on the same hardware while needing no privileged firmware updates.^[5] The work is framed as a deliberate "sovereign" stance: the entire stack from Tensor API down to the GPU command processor is owned by tinygrad, so a developer with a hardware question can read every line involved without consulting an opaque vendor binary.^[9]^[5]

Performance and benchmarks

The tiny corp has emphasized two public benchmarking efforts:

MLPerf Training. The tinybox running tinygrad has been entered against MLPerf Training reference workloads, with the company claiming competitive results against systems many times more expensive.^[9]^[19]
Vendor comparisons. In a September 2024 post on X, the tiny corp claimed the tinybox green outperformed an NVIDIA H100 on certain LLM training workloads at comparable price points, citing the 6x4090 configuration's 144 GB of aggregate memory.^[20]

Independent reviews by Tom's Hardware noted that the original tinybox red delivered roughly 37% of an H100's compute performance but with more aggregate memory (96 GB versus 80 GB) and substantially higher aggregate memory bandwidth (21 TB/s versus 3.35 TB/s) for the price.^[4] The framework's claimed "1,000x smaller" code footprint compared with PyTorch plus CUDA plus LLVM is repeatedly cited by Hotz and discussed at length in Hacker News commentary, although critics note that the comparison excludes the vendor-supplied compilers tinygrad itself emits source code for.^[9]

The MLPerf reference implementations maintained inside tinygrad cover image classification, object detection, language-model training, and text-to-image generation; their presence in the repository is one way the tiny corp demonstrates that the framework can execute industry-standard workloads end-to-end on its hardware. Documentation summaries describe these reference implementations as live tests for the scheduler and lowering pipeline rather than only as benchmark submissions, so changes to the IR or to a backend are exercised against the same workloads used to assess external claims.^[19] Tom's Hardware coverage of the production launch reported that the tiny corp had taken 583 pre-orders ahead of the first 100-unit production run, and that the marginal cost of building a tinybox red was around $10,000 against its $15,000 retail price.^[4]

Applications and adoption

tinygrad's most prominent production user is comma.ai's openpilot, an open-source advanced driver-assistance system for more than 300 supported car models. As of the openpilot 0.9.8 and 0.9.9 release cycles, the driving model and the driver-monitoring model run end-to-end on tinygrad using its QCOM Adreno backend on the comma 3X device.^[8] comma.ai's release notes describe the migration from the older thneed runtime and Qualcomm SNPE pipeline to tinygrad as both shrinking lines of code and reducing dependencies, while leaving headroom to ship larger driving models in the future. The same notes mention that tinygrad will support plugging an external GPU into the comma 3X's auxiliary USB-C port so that the device can run models that exceed the on-board Adreno's capacity.^[8]

Beyond comma.ai, the project is widely used as an educational and prototyping framework because of its small size. Developer write-ups commonly recommend tinygrad for learning the internals of automatic differentiation, for running smaller Llama and Stable Diffusion models on consumer hardware, and for experimenting with new accelerators since each backend can be written in well under a thousand lines of code.^[21]^[1] The project's own examples directory ships reference implementations of LLaMA, Stable Diffusion, Whisper, and YOLO families of models, which contributors and reviewers use as starting points when bringing up new hardware.^[1]

Limitations and criticisms

tinygrad's minimalism comes with trade-offs that contributors and outside reviewers regularly discuss:

Limited operator coverage compared with PyTorch / JAX. Although tinygrad implements the operations needed for common transformer, convolutional, and diffusion workloads, exotic operators in research code often need to be reimplemented in terms of tinygrad's primitives.^[1]
Code density versus readability. Hotz has publicly defended an aggressive style that prioritises few lines over conventional Python style. Hacker News discussion around the five-year retrospective surfaced disagreement about whether the resulting code is genuinely easier to understand or merely shorter.^[22]
Bounty-based development risk. The bounty/hiring model attracts strong contributors but also leaves the project sensitive to Hotz's review throughput and his discretionary payout decisions.^[22]^[18]
Hardware bring-up has been slow. The AMD sovereign stack was described publicly as nearing completion for years before the late-2025 announcement, and Hotz has repeatedly aired his frustration with AMD firmware in public livestreams.^[5]^[9]
Not yet at version 1.0. The official documentation still says "we are not 1.0 yet" as of the 0.12 release, indicating ongoing API churn.^[10]

Comparison with adjacent frameworks

Framework	Primary language	Eager vs. lazy	Distinguishing trait
PyTorch	Python + C++	Eager (with `torch.compile` for graph mode)	De facto industry standard, huge operator surface
JAX	Python + XLA	Traced/lazy via `jit`	Functional API, XLA backend, TPUs
GGML / llama.cpp	C / C++	Eager	Minimal CPU/Metal inference for LLMs
tinygrad	Python	Lazy by default	Visible compiler, sovereign AMD stack, tiny code base

References

tinygrad, "tinygrad README", GitHub, 2026-01-12. https://github.com/tinygrad/tinygrad. Accessed 2026-05-21. ↩
George Hotz, "Five years of tinygrad", the singularity is nearer (geohot.github.io), 2025-12-29. https://geohot.github.io/blog/jekyll/update/2025/12/29/five-years-of-tinygrad.html. Accessed 2026-05-21. ↩
George Hotz, "the tiny corp raised $5.1M", the singularity is nearer (geohot.github.io), 2023-05-24. https://geohot.github.io/blog/jekyll/update/2023/05/24/the-tiny-corp-raised-5M.html. Accessed 2026-05-21. ↩
Anton Shilov, "TinyBox packs a punch with six of AMD's fastest gaming GPUs repurposed for AI", Tom's Hardware, 2024-04-04. https://www.tomshardware.com/tech-industry/artificial-intelligence/tinybox-packs-a-punch-with-six-of-amds-fastest-gaming-gpus-repurposed-for-ai-george-hotzs-new-box-uses-radeon-7900-xtx-and-retails-for-dollar15k-now-in-production. Accessed 2026-05-21. ↩
Michael Larabel, "Tiny Corp Nearing 'Completely Sovereign' Compute Stack For AMD GPUs With Tinygrad", Phoronix forums thread on tinygrad AMD work, 2025. https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/open-source-amd-linux/1519271-tiny-corp-nearing-completely-sovereign-compute-stack-for-amd-gpus-with-tinygrad. Accessed 2026-05-21. ↩
the tiny corp, "tinybox", tinygrad documentation, 2026. https://docs.tinygrad.org/tinybox/. Accessed 2026-05-21. ↩
George Hotz, "geohot profile", GitHub, undated. https://github.com/geohot. Accessed 2026-05-21. ↩
comma.ai, "openpilot 0.9.9 release notes", comma.ai blog, 2024. https://blog.comma.ai/099release/. Accessed 2026-05-21. ↩
Hacker News discussion, "Five Years of Tinygrad", Hacker News (Y Combinator), 2025-12-29. https://news.ycombinator.com/item?id=46422757. Accessed 2026-05-21. ↩
the tiny corp, "tinygrad documentation", tinygrad docs, 2026. https://docs.tinygrad.org/. Accessed 2026-05-21. ↩
mesozoic-egg, "How kernel fusion starts (tinygrad-notes)", mesozoic-egg.github.io, 2025. https://mesozoic-egg.github.io/tinygrad-notes/scheduleitem.html. Accessed 2026-05-21. ↩
the tiny corp, "Runtimes", tinygrad docs, 2026. https://docs.tinygrad.org/runtime/. Accessed 2026-05-21. ↩
Michael Larabel, "Tiny Corp Details More Of Their Planned Tinybox System Specs", Phoronix, 2024-03-30. https://www.phoronix.com/news/Tinybox-Green-Red-Specs. Accessed 2026-05-21. ↩
Anton Shilov, "TinyBox AI accelerator now available starting at $15k, available in AMD 7900XTX and Nvidia RTX 4090 variants", Tom's Hardware, 2024. https://www.tomshardware.com/pc-components/gpus/tinybox-ai-accelerator-now-available-starting-at-dollar15k-available-in-7900xtx-and-rtx-4090-variants. Accessed 2026-05-21. ↩
Anton Shilov, "AI accelerator tinybox pro goes up for preorder for $40,000", Tom's Hardware, 2024. https://www.tomshardware.com/tech-industry/artificial-intelligence/ai-accelerator-tinybox-pro-goes-up-for-preorder-for-usd40-000-the-device-features-eight-rtx-4090s-and-two-amd-genoa-epyc-processors. Accessed 2026-05-21. ↩
the tiny corp, "tinybox red v2 product page", tiny shop (Shopify), 2025. https://tinycorp.myshopify.com/products/tinybox-red-v2. Accessed 2026-05-21. ↩
the tiny corp, "tinybox green v2 product page", tiny shop (Shopify), 2025. https://tinycorp.myshopify.com/products/tinybox-green-v2. Accessed 2026-05-21. ↩
tinygrad, "Tinygrad Bounty Opportunities", tinygrad GitHub bounties listing, 2024. https://www.scribd.com/document/929371877/Bounties-Tinygrad-Bounties. Accessed 2026-05-21. ↩
tinygrad, "MLPerf Benchmarks (tinygrad/tinygrad)", DeepWiki summary of tinygrad MLPerf code, 2025. https://deepwiki.com/tinygrad/tinygrad/10.3-mlperf-benchmarks. Accessed 2026-05-21. ↩
the tiny corp, "benchmarked: tinybox green outperforms an H100 at LLM training", X (Twitter) post by @__tinygrad__, 2024-09-19. https://x.com/__tinygrad__/status/1836640456983319003. Accessed 2026-05-21. ↩
Omar Olivares Urrutia, "Introduction to Tinygrad", olivares.cl blog, 2025-06-10. https://olivares.cl/blog/2025/06/10/introduction-to-tinygrad/. Accessed 2026-05-21. ↩
Hacker News commenters, "Five Years of Tinygrad (comments thread)", Hacker News, 2025-12-29. https://news.ycombinator.com/item?id=46422757. Accessed 2026-05-21. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributor · full history

Suggest edit

What links here

Candle (HuggingFace Rust ML)George Hotz MLX

Infobox

History

Origins as a toy project (2020 to 2022)

Founding the tiny corp (2022 to 2023)

Public benchmarks and AMD work (2023 to 2025)

Design philosophy

Laziness and a lazy tensor abstraction

Kernel fusion via the ShapeTracker

Visible compiler, no autograd graph

Minimalism as a design constraint

Technical architecture

Backends and runtimes

The tinybox product line

Common chassis

tinybox red (AMD, original)

tinybox green (NVIDIA, original)

tinybox pro (NVIDIA, dense)

tinybox red v2 (RDNA4)

tinybox green v2 (Blackwell) and exabox

Summary table

The bounty system

AMD focus and the "show me the kernel" thesis

Performance and benchmarks

Applications and adoption

Limitations and criticisms

Comparison with adjacent frameworks

See also

References

Improve this article

Related Articles

PyTorch

Dataset API (tf.data)

NVIDIA Triton Inference Server

Node (TensorFlow graph)

tf.keras

Estimator (tf.estimator)

What links here

Related Articles

PyTorch

Dataset API (tf.data)

NVIDIA Triton Inference Server

Node (TensorFlow graph)

tf.keras

Estimator (tf.estimator)

What links here