Tinker (Thinking Machines Lab)
Last reviewed
May 31, 2026
Sources
11 citations
Review status
Source-backed
Revision
v1 · 1,862 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 31, 2026
Sources
11 citations
Review status
Source-backed
Revision
v1 · 1,862 words
Add missing citations, update stale details, or suggest a clearer explanation.
Tinker is a fine-tuning API for open-weight language models, built by Thinking Machines Lab, the artificial intelligence startup founded by former OpenAI chief technology officer Mira Murati. It was the lab's first product, and it launched in a private beta on October 1, 2025. [1][2][3] Tinker lets a user write the training loop in Python while the service runs the work on its own pool of distributed GPUs. The idea is to hand researchers low-level control over data, loss functions, and algorithms, and to take the cluster management off their hands. [1][4]
The product sits at a particular spot in the post-training landscape. Most fine-tuning services either lock you into a fixed recipe, where you upload a dataset and get a tuned model back, or they hand you raw machines and leave the hard distributed-systems work to you. Tinker tries to split the difference. You keep the parts that matter for research, the algorithm and the data, and you give up the parts that are mostly toil, the multi-node scheduling, the fault tolerance, and the memory plumbing. [3][4][5]
Tinker is a managed training service that you reach through a Python client. You point it at a base model, you write a loop that feeds in batches and computes gradients, and the service executes that loop across its internal clusters. [4][6] Thinking Machines describes the design goal as letting people "tinker with the weights" of open models without standing up their own training infrastructure. [4]
The service is aimed at researchers and developers rather than at consumers. There is no chatbot front end and no point-and-click tuning wizard. The audience is people who want to run their own experiments, including academic groups and small teams that have ideas about training methods but not the budget or the time to operate a GPU fleet. [2][3] Early users came from research groups at Princeton, Stanford, the University of California, Berkeley, and Redwood Research, covering work on theorem proving, chemistry reasoning, multi-agent reinforcement learning, and AI control. [4][7]
The core of Tinker is a small set of low-level primitives. Rather than a single train() call that hides everything, the API exposes the individual steps of a training loop so you can assemble whatever procedure you want. [4][6] The main operations are these.
| Primitive | What it does |
|---|---|
| forward_backward | Runs a forward pass and a backward pass to compute gradients for a batch |
| optim_step | Updates the model weights from the accumulated gradients |
| sample | Generates tokens from the current model, for evaluation, inference, or reinforcement learning rollouts |
| save_state | Writes a checkpoint so a run can be paused and resumed |
With those four pieces you can express supervised fine-tuning, preference tuning, and on-policy reinforcement learning, because you control how the loop is wired together. [4][6][8] The sample primitive matters here. Reinforcement learning needs the model to generate its own data during training, so having sampling sit inside the same loop as the gradient steps is what makes methods like policy-gradient training practical on the same service. [6][8]
A short program shows the shape of it. You create a service client, ask it for a training client tied to a base model, and from there you call the primitives yourself. [6]
import tinker
service_client = tinker.ServiceClient()
training_client = service_client.create_lora_training_client(
base_model="meta-llama/Llama-3.2-1B"
)
The client objects map onto the workflow. A service client is the entry point. A training client holds a model you are tuning and exposes forward_backward, optim_step, and save_state. A sampling client serves tokens from a trained model. [6] Because the abstraction is the loop and not the dataset, switching to a different base model is mostly a matter of changing the model string. The same training code can run against a small dense model or a large mixture-of-experts model without a rewrite. [3][5]
What Tinker handles underneath is the distributed execution. The user does not see node counts, sharding strategies, or the scheduler. The service places the work, recovers from hardware faults, and shares a common compute pool across many concurrent runs. [3][4] That last point is tied to how the service uses low-rank adaptation, described below.
Tinker works with open-weight models, not closed frontier models. At launch it supported families from Meta's Llama line and Alibaba's Qwen line, spanning small dense models up through large mixture-of-experts systems. [1][3] The largest model the company named at launch was Qwen3-235B-A22B, a mixture-of-experts model with hundreds of billions of total parameters and a much smaller count active per token. [3][5] Supporting models of that size is part of the pitch, since running a 235B-parameter model across many GPUs is exactly the kind of work most researchers cannot easily do alone. [3]
Fine-tuning on Tinker is built on LoRA, short for low-rank adaptation. Instead of updating every weight in the base model, LoRA freezes the original weights and trains small low-rank adapter matrices that are added to them. [1][6] This is far cheaper in memory and compute than full fine-tuning, and it has a useful side effect for a shared service. Because the large base weights stay fixed, many different LoRA runs can sit on top of the same loaded copy of a model and share one pool of compute, which is how Tinker can serve many users economically. [1][3]
The choice of LoRA was not only a cost decision. Around the same period, Thinking Machines published research arguing that LoRA, set up correctly, can match full fine-tuning across a wide range of supervised and reinforcement learning workloads, with the gap appearing mainly when a task tries to push a very large amount of new information into the model. [9] That work, titled "LoRA Without Regret," gave the company a public rationale for making adapter-based tuning the default path in Tinker rather than a compromise. [9]
Thinking Machines announced Tinker on October 1, 2025, as a private beta with a waitlist. [1][2][3] The company said the service would be free to start and that it would add usage-based pricing in the following weeks. [3][7] The launch was the lab's first public product, arriving after a February 2025 founding and a seed round of about two billion dollars that valued the company near twelve billion dollars. [2][10] The founding team carried weight on its own, since it included John Schulman as chief scientist, Barret Zoph as chief technology officer, and other senior researchers drawn from OpenAI. [10]
Alongside the API, the company released an open-source companion called the Tinker Cookbook under the Apache 2.0 license. [6][11] The Cookbook is a library of worked training recipes that sit on top of the four primitives, so users do not have to write every method from scratch. [11] It includes examples for supervised learning, reinforcement learning, preference tuning and reinforcement learning from human feedback, math reasoning, tool use, multi-agent training, and prompt distillation, along with helper code for tokenizers, evaluation, and LoRA hyperparameters. [11] The split is deliberate. The API stays small and low-level, and the Cookbook carries the higher-level patterns that most people actually want to reuse. [6][11]
Fine-tuning a large language model used to mean one of a few uncomfortable options. You could rent a cluster and shoulder all of the distributed-training engineering, you could use a hosted service that only offered a fixed supervised recipe, or you could stay small and tune little models on a single machine. Tinker aims at the gap in the middle, where you want real methodological control, including reinforcement learning and custom objectives, but you do not want to run the hardware. [3][4][5]
The bet also lines up with the rise of strong open-weight models. As open-source AI releases from the Llama and Qwen families have closed much of the gap with closed models, the value of being able to adapt them has grown. [3] Thinking Machines leaned into that, leading with a tool for customizing open models rather than with a proprietary assistant, a posture commentators read as a contrast with the more closed approach of some rival labs. [3][5] The choice also fits the company's stated emphasis on open science, since the Cookbook and the LoRA research were both published openly. [4][9][11]
Coverage at launch was largely positive, and much of it focused on the access angle. Reporters framed Tinker as a way to put frontier-scale fine-tuning in reach of academic labs and individual researchers who could not otherwise touch models the size of Qwen3-235B. [3][5][7] The choice to ship a developer tool first, from a company best known for its founders and its funding, was read as a signal that Thinking Machines meant to build for researchers and builders before chasing a consumer product. [2][3]
The research community's early reaction tracked the same theme. Practitioners noted that the primitive-level API was unusually low for a hosted service, which appealed to people doing reinforcement learning and novel training methods, since most managed offerings hide exactly the parts those users need. [5][8] The endorsement of named research groups at the launch added credibility, because it showed the service running real workloads in theorem proving, chemistry, and reinforcement learning rather than only demos. [4][7]
Tinker carries the limits of its design. It only supports open-weight models, so it cannot fine-tune closed frontier models, and the model list at launch was confined to specific Llama and Qwen releases. [1][3] The service is a hosted, closed platform, which means a user's training runs depend on Thinking Machines' infrastructure and availability rather than on hardware they control. [4][5] At launch it was a private beta behind a waitlist, so access was gated, and the pricing was not yet settled beyond a free initial period and a promise of usage-based billing to come. [3][7] LoRA itself, while efficient, is not identical to full fine-tuning in every case, and the company's own research located a limit when a task needs to absorb a very large body of new knowledge. [9] For most post-training work, though, the design trades a small amount of flexibility for a large reduction in operational burden, which is the point of the product. [4][9]