Task arithmetic
Last reviewed
Jun 8, 2026
Sources
7 citations
Review status
Source-backed
Revision
v1 · 2,047 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 8, 2026
Sources
7 citations
Review status
Source-backed
Revision
v1 · 2,047 words
Add missing citations, update stale details, or suggest a clearer explanation.
Task arithmetic is a model-editing technique that steers the behavior of a neural network by adding or subtracting vectors in its weight space. The central object is the task vector, defined as the elementwise difference between the weights of a fine-tuned model and the pre-trained weights it started from. Because these vectors live in the same coordinate system, they can be negated, summed, and combined by simple linear algebra, and the resulting model inherits the behavior implied by that arithmetic. Negating a task vector makes a model forget a task or suppress a behavior; adding several task vectors builds a single multi-task model; and combining vectors that stand in an analogy relationship can synthesize a new capability with little or no task-specific data [1].
The technique was introduced in "Editing Models with Task Arithmetic" by Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi, with authors affiliated with the University of Washington and the Allen Institute for AI. The paper first appeared on arXiv in December 2022 and was published at the International Conference on Learning Representations (ICLR) in 2023 [1]. Task arithmetic is significant less as a standalone editing tool than as the algebraic foundation of model merging: later methods such as TIES-Merging and DARE operate directly on task vectors, and the operation called "task arithmetic" is a named merge method in popular open-source toolkits [1][4][5].
A task vector isolates the change that fine-tuning produced. Let theta_pre denote the parameters of a pre-trained model written as a single flat vector, and let theta_ft be the parameters of the same model after fine-tuning on some task t. The task vector for that task is
tau_t = theta_ft - theta_pre.
It has exactly the same dimension as the model's weights, and every coordinate records how much fine-tuning moved that parameter. Applying a task vector means moving the base model along this direction with a scaling coefficient lambda:
theta_new = theta_pre + lambda * tau_t.
Setting lambda = 1 reconstructs the fine-tuned model exactly, while smaller positive values apply a fraction of the fine-tuning update and negative values move in the opposite direction [1]. In practice lambda is a single scalar chosen on a held-out validation set and shared across all task vectors being combined, which keeps the method free of any further gradient training [1].
Two preconditions make task vectors meaningful. First, all models must be homologous: they must share an identical architecture and the same pre-trained initialization, because only then do their parameters align coordinate by coordinate and become comparable. Second, the differences must be small enough that linear combinations stay inside a low-loss region of weight space, which holds for ordinary fine-tuning but not necessarily for extended continued pre-training [1][5]. The same delta-parameter object appears under different names across the literature; the DARE work calls it the "delta parameters," and it underlies essentially all training-free merging of fine-tuned checkpoints [5].
Task arithmetic defines three families of operations, summarized below and detailed in the subsections that follow.
| Operation | Weight-space update | Effect | Headline example from Ilharco et al. |
|---|---|---|---|
| Negation (forgetting) | theta_pre - lambda * tau_t | Suppresses or removes a task or behavior | Cutting GPT-2 toxic generations roughly 6x |
| Addition (learning) | theta_pre + lambda * sum_t tau_t | Builds one multi-task model from many | Merging 8 CLIP image classifiers |
| Analogy | tau_D = tau_C + (tau_B - tau_A) | Synthesizes a task with little or no data | Building Yelp sentiment from Amazon vectors |
Subtracting a task vector pushes the model in the direction opposite to fine-tuning, which degrades performance on the targeted task while leaving unrelated behavior largely intact. On image classification, Ilharco et al. negated task vectors for CLIP vision transformer image encoders across eight datasets and reduced accuracy on the targeted dataset by tens of percentage points, while accuracy on a held-out control task (ImageNet) changed by less than one point on average [1]. In language, they negated a "toxicity" task vector for GPT-2 and reduced the proportion of toxic generations by a factor of about six, from roughly 4.8 percent to 0.8 percent, while perplexity on the WikiText-103 control rose only slightly, from about 16.4 to 16.9 [1]. This targeted suppression connects task arithmetic to machine unlearning and to detoxification, although negation reduces rather than provably erases a capability, so it is not a guaranteed unlearning procedure.
Summing several task vectors and adding the result to the base model produces a single set of weights that performs many tasks at once:
theta_multitask = theta_pre + lambda * sum_t tau_t.
This is the prototypical merge operation. Adding the eight CLIP image-classification task vectors yielded one model whose accuracy averaged roughly 91 percent of the accuracy of the eight separately fine-tuned specialists, recovering most of their combined ability without any joint training and without storing eight models [1]. Addition is also used constructively in the reverse direction: starting from a model and adding a task vector can inject a skill, and the same machinery supports building robust models by combining a fine-tuned checkpoint with the direction back toward its pre-trained weights.
The most striking demonstration is that task vectors compose in analogy relationships of the form "A is to B as C is to D." When four tasks form such an analogy, the vector for the fourth can be assembled from the other three:
tau_D = tau_C + (tau_B - tau_A).
Ilharco et al. used this to build a sentiment classifier for a target domain without any labeled sentiment data from that domain. Writing the four tasks as Amazon language modeling, Yelp language modeling, Amazon sentiment, and Yelp sentiment, they synthesized a Yelp sentiment vector as tau(Yelp sentiment) = tau(Amazon sentiment) + (tau(Yelp language modeling) - tau(Amazon language modeling)), and the resulting T5 model classified Yelp reviews with accuracy close to a model that had access to in-domain sentiment labels [1]. A related experiment improved accuracy on data subpopulations for which almost no training examples existed, by transporting a capability learned on a well-resourced subpopulation along an analogy direction [1]. These results echo the classic word-embedding analogies (the "king minus man plus woman" pattern), now lifted from token embeddings to the full weight space of a network.
Task arithmetic depends on fine-tuned models staying close to their shared initialization. Work on linear mode connectivity showed that networks trained from the same pre-trained weights tend to remain in a single connected, low-loss basin, so their weights can be linearly interpolated without crossing a high-loss barrier [7]. The same property motivated model soups, which average the weights of several models fine-tuned on one task with different hyperparameters to improve accuracy at no extra inference cost [3]. Task arithmetic generalizes that idea from one task to many: model soups average task vectors for a single task, whereas task arithmetic sums task vectors for different tasks [1][3].
A more precise account came from the follow-up paper "Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models" by Guillermo Ortiz-Jimenez, Alessandro Favero, and Pascal Frossard of EPFL, presented as an oral at the 37th Conference on Neural Information Processing Systems (NeurIPS) in 2023 [2]. Their central finding is that the decisive property is not linearity of the network itself but weight disentanglement: distinct directions in weight space govern separate, spatially localized regions of the input space, so that adding two task vectors composes their effects on their respective inputs without interfering [2]. They argue this disentanglement emerges during pre-training and explains why arithmetic edits stay local to the intended task. They further show that fine-tuning the model in its tangent space, meaning a linearized version of the network operating in the neural tangent kernel (NTK) regime, amplifies weight disentanglement and yields consistent accuracy gains on the same task-arithmetic benchmarks, and they relate the effect to the spatial localization of the NTK eigenfunctions [2]. This reframing matters because it identifies weight disentanglement, rather than mere weight-space proximity, as the mechanism that makes the algebra reliable.
Task arithmetic is the algebraic substrate of modern model merging. Its addition rule, theta_pre + lambda * sum_t tau_t, is exactly the "linear" or "task_arithmetic" merge that combines specialist checkpoints into one model, and it is the baseline against which newer methods are measured [1][4]. The newer methods refine the same sum to control interference:
Both operate on the task-vector abstraction that task arithmetic introduced, and both fall back to plain task arithmetic when their extra steps are disabled. Open-source merging toolkits such as mergekit expose task_arithmetic, ties, dare_linear, and dare_ties as selectable methods over the same delta parameters, which is how the open-weights community routinely fuses LoRA adapters and full fine-tunes into combined models [6]. Task arithmetic is also a natural primitive for continual learning and for distributing model updates as compact diffs rather than full checkpoints.
Task arithmetic has well-documented failure modes. It applies only to homologous models that share an architecture and a pre-trained initialization; independently pre-trained models cannot be combined this way because their parameters are not aligned, and permutation symmetries would have to be resolved first [1]. When many task vectors are summed, interference accumulates: redundant small updates act as noise and parameters that receive opposite signs from different tasks partially cancel, so merged accuracy falls further below the single-task specialists as the number of tasks grows, which is the precise problem that TIES-Merging and DARE were built to mitigate [4][5]. Performance is sensitive to the scaling coefficient lambda and generally requires a held-out validation set to tune it, undercutting the "data-free" appeal in settings where no validation data exists [1]. Even at its best, a merged multi-task model still trails both per-task fine-tuning and joint multi-task training, so task arithmetic trades some accuracy for the convenience of avoiding retraining [1][4]. The redundancy that makes aggressive sparsification safe is specific to lightweight supervised fine-tuning; for models produced by continued pre-training the deltas are larger and combining them is more destructive [5]. Finally, negation suppresses but does not provably remove knowledge, so it should be treated as a behavioral edit rather than a security guarantee. These caveats explain why task arithmetic is most often used not in isolation but as the linear core inside a more careful merging pipeline.