Making Models Unmergeable
via Scaling-Sensitive Loss Landscape
Abstract
The rise of model hubs has made it easier to access reusable model components, making model merging a practical tool for combining capabilities. Yet, this modularity also creates a governance gap: downstream users can recompose released weights into unauthorized mixtures that bypass safety alignment or licensing terms. Because existing defenses are largely post-hoc and architecture-specific, they provide inconsistent protection across diverse architectures and release formats in practice. To close this gap, we propose Trap² (Training-time Protection via Task-Robust Adversarial Perturbation), an architecture-agnostic protection framework that encodes protection into updates during fine-tuning, regardless of whether they are released as adapters or full models. Instead of relying on architecture-dependent approaches, Trap² uses weight re-scaling as a simple proxy for the merging process. It keeps released weights effective in standalone use, but degrades them under re-scaling that often arises in merging, undermining unauthorized recomposition.
Key Idea: Let's Shape the Loss Landscape Over Scaling
Common merging operators (such as Task Arithmetic, TIES-Merging, and DARE) aggregate released updates as \(W_0 + \sum_i s_i \cdot \Delta W_i\), where \(W_0\) is the frozen base model, each \(\Delta W_i\) is a released update (a LoRA adapter or a full-parameter difference), and \(s_i\) is its merging coefficient. Trap² exploits this by treating a single scaling factor \(s\) on the update as a lightweight proxy for merging: it solves an adversarial objective over update re-scaling, preserving utility at the authorized scale (\(s = 1\)) while inducing degradation under off-nominal scaling (\(s \neq 1\)), precisely the regime that merging pipelines introduce. The result is an update that is sharp and effective standalone, yet brittle the moment it is re-weighted and blended with third-party updates.
[Figure 2] Trap² retains high utility at the authorized scale (\(s = 1\)) but collapses under off-nominal scaling (\(s \neq 1\)), unlike an unprotected adapter of comparable standalone accuracy.
The Trap² Objective
We parameterize the released update as a LoRA adapter \(\Delta W = BA\) on a frozen base \(W_0\), with low-rank factors \(B \in \mathbb{R}^{d_{\text{out}} \times r}\), \(A \in \mathbb{R}^{r \times d_{\text{in}}}\), and \(r \ll \min\{d_{\text{out}}, d_{\text{in}}\}\). Let \(\mathcal{L}_{\text{scaled}}(\Delta W; s)\) be the expected task loss when the update is deployed at scale \(s\), i.e., at weights \(W_0 + s \cdot \Delta W\). The nominal loss is the loss at the authorized scale, while the off-nominal loss averages the loss over off-nominal scales \(s\) sampled from a distribution \(\mathcal{S}\) whose support excludes a margin \(\delta\) around the nominal scale \(s = 1\):
\[ \operatorname{supp}(\mathcal{S}) \subseteq [s_{\min},\,1-\delta]\,\cup\,[1+\delta,\,s_{\max}]. \]Concretely, the two losses are
\[ \mathcal{L}_{\text{nominal}}(\Delta W) := \mathcal{L}_{\text{scaled}}(\Delta W; 1), \] \[ \mathcal{L}_{\text{off}}(\Delta W) := \mathbb{E}_{s\sim\mathcal{S}}\,\big[\,w(s) \cdot \mathcal{L}_{\text{scaled}}(\Delta W; s)\,\big]. \]Then, Trap² minimizes
\[ J(\Delta W) = \mathcal{L}_{\text{nominal}}(\Delta W) \;-\; \lambda \cdot \mathcal{L}_{\text{off}}(\Delta W), \]where \(\lambda \geq 0\) trades off standalone utility against sensitivity to re-scaling, with \(\lambda = 0\) recovering standard fine-tuning. The first term preserves intended deployment performance at \(s = 1\), and the second increases loss under off-nominal re-scaling, which is precisely what makes the update unmergeable.
[Why this breaks merging] Many merging operators effectively re-scale each constituent update (e.g., uniform averaging of \(N\) adapters scales each by \(1/N\)), so a merged model is pushed away from the nominal scale \(s = 1\). Trap² is trained to be brittle exactly there, so that merging fails. The weighting \(w(s)\) normalizes the training signal across scales. Since gradients shrink for down-scaled updates (\(s < 1\)), we set \(w(s) = 1/s\) by default.
[Extension to full fine-tuning] So far \(\Delta W\) is a LoRA adapter, but the same objective applies to full-model fine-tuning. We define a step-dependent update \(\Delta W_t := W_t - W_0\). At each step \(t\) we evaluate the nominal loss at \(W_t\) and the off-nominal loss at the scaled model \(W_0 + s \cdot \Delta W_t\), then update \(W_t\). This gives one architecture-agnostic formulation that covers both adapter-only and full-checkpoint releases.
Experimental Results
Across architectures and release formats, Trap² preserves strong standalone utility while consistently degrading performance under widely used merging operators. It even holds up under stronger threat models that have access to task-relevant data: both data-dependent mergers and data-driven recovery attacks. We refer readers to the paper for further experiments, including other LoRA variants, LLMs, and additional analyses.
LoRA Adapters
We first consider adapter-only releases. Trap² preserves standalone accuracy on par with naïve fine-tuning, yet sharply degrades once the protected adapter is merged under diverse operators and spaces.
[Table 1] Per-task accuracy (%; ↑) on 8 vision benchmarks when each adapter is used standalone, across three CLIP backbones. Trap² matches naïve fine-tuning.
| Backbone | Method | Cars | DTD | EuroSAT | GTSRB | MNIST | RESISC | Aircraft | SVHN | Average |
|---|---|---|---|---|---|---|---|---|---|---|
| ViT-B/32 | Zero-Shot | 59.4 | 44.1 | 45.5 | 32.3 | 48.0 | 60.3 | 19.0 | 31.4 | 42.5 |
| Fine-Tuned | 99.5 | 68.7 | 98.3 | 98.4 | 99.1 | 93.0 | 51.7 | 96.2 | 88.1 | |
| Trap² (Ours) | 99.8 | 66.1 | 98.5 | 98.9 | 99.5 | 93.3 | 52.4 | 96.6 | 88.1 | |
| ViT-L/14 | Zero-Shot | 77.9 | 55.5 | 62.3 | 50.7 | 76.2 | 71.4 | 32.6 | 58.6 | 60.6 |
| Fine-Tuned | 99.8 | 76.7 | 98.8 | 98.5 | 99.6 | 95.6 | 72.6 | 97.8 | 92.4 | |
| Trap² (Ours) | 99.9 | 78.8 | 98.5 | 99.6 | 99.4 | 96.2 | 80.1 | 97.2 | 93.7 | |
| ConvNeXt | Zero-Shot | 89.6 | 59.6 | 54.5 | 48.9 | 54.9 | 67.1 | 27.5 | 34.3 | 54.6 |
| Fine-Tuned | 98.5 | 76.4 | 98.9 | 99.3 | 99.3 | 96.1 | 59.2 | 97.1 | 90.6 | |
| Trap² (Ours) | 97.6 | 76.1 | 98.6 | 98.9 | 99.5 | 94.1 | 64.3 | 96.2 | 90.7 |
[Table 2] Averaged per-task accuracy (%; ↓) under 8-way LoRA merging: a protected target adapter is merged with seven unprotected ones, across various methods (TA, TIES, TIES+DARE, TSV, CART) and spaces (Full, KnOTS, Core). The merging coefficient is tuned on validation, optimistic for the adversarial merger. Trap² drives merged accuracy far below the unprotected baseline.
| Protection | TA | TIES | TIES+DARE | TSV | CART | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Full | Full | KnOTS | Core | Full | KnOTS | Core | Full | KnOTS | Core | Full | KnOTS | Core | |
| Backbone: ViT-B/32 (Zero-shot: 42.5) | |||||||||||||
| Unprotected | 48.3 | 48.0 | 49.9 | 53.3 | 48.2 | 49.9 | 54.7 | 51.4 | 49.2 | 55.0 | 49.6 | 49.9 | 51.2 |
| Trap² (Ours) | 23.1 | 36.5 | 37.5 | 37.2 | 33.9 | 28.4 | 36.8 | 24.8 | 24.2 | 24.4 | 41.3 | 41.0 | 41.6 |
| Backbone: ViT-L/14 (Zero-shot: 60.6) | |||||||||||||
| Unprotected | 62.9 | 67.9 | 68.9 | 68.8 | 68.0 | 69.9 | 68.8 | 72.4 | 66.7 | 74.6 | 64.4 | 65.0 | 66.2 |
| Trap² (Ours) | 33.6 | 53.2 | 47.7 | 54.5 | 50.6 | 43.3 | 54.8 | 42.0 | 37.4 | 39.4 | 58.6 | 58.4 | 59.5 |
| Backbone: ConvNeXt (Zero-shot: 54.6) | |||||||||||||
| Unprotected | 49.2 | 59.6 | 60.2 | 63.6 | 59.6 | 60.3 | 63.6 | 65.1 | 60.4 | 65.8 | 60.1 | 60.0 | 61.2 |
| Trap² (Ours) | 14.7 | 32.9 | 15.2 | 17.2 | 15.6 | 13.1 | 25.3 | 14.1 | 15.9 | 15.3 | 47.0 | 45.9 | 47.2 |
Full Fine-Tuning
Beyond adapters, Trap² extends to full fine-tuning. On CLIP ViT-B/32 it preserves standalone utility comparable to naïve fine-tuning, while still inducing strong post-merge degradation under full-model merging.
[Table 3] Standalone per-task accuracy (%; ↑) on 8 vision benchmarks for CLIP ViT-B/32 under full fine-tuning. Trap² preserves standalone performance comparable to naïve fine-tuning.
| Backbone | Method | Cars | DTD | EuroSAT | GTSRB | MNIST | RESISC | Aircraft | SVHN | Average |
|---|---|---|---|---|---|---|---|---|---|---|
| ViT-B/32 | Zero-Shot | 59.4 | 44.1 | 45.5 | 32.3 | 48.0 | 60.3 | 19.0 | 31.4 | 42.5 |
| Fine-Tuned | 99.8 | 65.0 | 98.6 | 97.6 | 99.2 | 93.2 | 51.8 | 95.6 | 87.6 | |
| Trap² (Ours) | 99.8 | 62.8 | 98.4 | 99.1 | 99.2 | 93.3 | 49.6 | 96.7 | 87.4 |
[Table 4] Averaged per-task accuracy (%; ↓) on 8 vision benchmarks for CLIP ViT-B/32 under full-model merging. Trap² remains effective under full fine-tuning, strongly degrading the merged model relative to the unprotected baseline.
| Protection | TA | TIES | TIES+DARE | TSV | CART |
|---|---|---|---|---|---|
| Backbone: ViT-B/32 (Zero-shot: 42.5) | |||||
| Unprotected | 50.0 | 51.0 | 51.7 | 62.4 | 61.0 |
| Trap² (Ours) | 28.5 | 38.0 | 37.9 | 38.2 | 42.1 |
Robustness against Data-aware Methods
[Data-dependent Merging] We test Trap² against two data-dependent mergers, RegMean and Chain-of-Merges (CoM), which use extra task-specific samples to compute aggregation weights. Trap² induces substantial post-merge degradation under both, so protection holds even when the merger leverages additional data signals. Notably, CoM often assigns comparable or even larger weight to the Trap²-protected adapter than to its unprotected counterpart, rather than down-weighting it to recover utility.
[Data-driven Recovery] As a stronger, white-box threat, the adversary uses a small task-relevant dataset to recover utility from the merged model. We evaluate ProDistill, which learns merging coefficients via teacher-student distillation, and post-merge supervised fine-tuning (SFT). Recovery from a Trap²-protected merge gains only a few points over the zero-shot reference, whereas the unprotected baseline gains far more, indicating that Trap² substantially impedes recovery even with in-domain samples.
[Table 5] Averaged accuracy (%; ↓) under stronger threat models on CLIP ViT-B/32. Trap² substantially degrades performance under both data-dependent merging (RegMean, CoM) and data-driven recovery (ProDistill, SFT), relative to the unprotected baseline.
| Protection | Data-dependent Merging | Data-driven Recovery | ||
|---|---|---|---|---|
| RegMean | CoM | ProDistill | SFT | |
| Backbone: ViT-B/32 (Zero-shot: 42.5) | ||||
| Unprotected | 49.1 | 64.8 | 73.8 | 56.9 |
| Trap² (Ours) | 9.5 | 32.3 | 49.2 | 45.5 |
BibTeX
@inproceedings{jang2026making,
title={Making Models Unmergeable via Scaling-Sensitive Loss Landscape},
author={Minwoo Jang and Hoyoung Kim and Jabin Koo and Jungseul Ok},
booktitle={Proceedings of the 43rd International Conference on Machine Learning},
year={2026}
}