Making Models Unmergeable
via Scaling-Sensitive Loss Landscape

{1Graduate School of AI, 3Department of CSE}, POSTECH · 2National AI Research Lab
Corresponding Author
ICML 2026  ·  Seoul, Republic of Korea
Unmergeability protection in model sharing and limitations of prior work

[Figure 1] Most prior defenses are post-hoc: they apply function-preserving transformations to the full model weights after fine-tuning, which ties them to Transformer symmetries and full-weight access. Trap² is a training-time defense: it injects unmergeability into the update during fine-tuning by making it scaling-sensitive, preserving utility at the authorized scale (\(s = 1\)) while degrading under the off-nominal re-scaling (\(s \neq 1\)) that merging induces.

Abstract

The rise of model hubs has made it easier to access reusable model components, making model merging a practical tool for combining capabilities. Yet, this modularity also creates a governance gap: downstream users can recompose released weights into unauthorized mixtures that bypass safety alignment or licensing terms. Because existing defenses are largely post-hoc and architecture-specific, they provide inconsistent protection across diverse architectures and release formats in practice. To close this gap, we propose Trap² (Training-time Protection via Task-Robust Adversarial Perturbation), an architecture-agnostic protection framework that encodes protection into updates during fine-tuning, regardless of whether they are released as adapters or full models. Instead of relying on architecture-dependent approaches, Trap² uses weight re-scaling as a simple proxy for the merging process. It keeps released weights effective in standalone use, but degrades them under re-scaling that often arises in merging, undermining unauthorized recomposition.

Key Idea: Let's Shape the Loss Landscape Over Scaling

Common merging operators (such as Task Arithmetic, TIES-Merging, and DARE) aggregate released updates as \(W_0 + \sum_i s_i \cdot \Delta W_i\), where \(W_0\) is the frozen base model, each \(\Delta W_i\) is a released update (a LoRA adapter or a full-parameter difference), and \(s_i\) is its merging coefficient. Trap² exploits this by treating a single scaling factor \(s\) on the update as a lightweight proxy for merging: it solves an adversarial objective over update re-scaling, preserving utility at the authorized scale (\(s = 1\)) while inducing degradation under off-nominal scaling (\(s \neq 1\)), precisely the regime that merging pipelines introduce. The result is an update that is sharp and effective standalone, yet brittle the moment it is re-weighted and blended with third-party updates.

Loss shaping over the scaling factor s: high utility at s=1, degradation at s!=1

[Figure 2] Trap² retains high utility at the authorized scale (\(s = 1\)) but collapses under off-nominal scaling (\(s \neq 1\)), unlike an unprotected adapter of comparable standalone accuracy.

The Trap² Objective

We parameterize the released update as a LoRA adapter \(\Delta W = BA\) on a frozen base \(W_0\), with low-rank factors \(B \in \mathbb{R}^{d_{\text{out}} \times r}\), \(A \in \mathbb{R}^{r \times d_{\text{in}}}\), and \(r \ll \min\{d_{\text{out}}, d_{\text{in}}\}\). Let \(\mathcal{L}_{\text{scaled}}(\Delta W; s)\) be the expected task loss when the update is deployed at scale \(s\), i.e., at weights \(W_0 + s \cdot \Delta W\). The nominal loss is the loss at the authorized scale, while the off-nominal loss averages the loss over off-nominal scales \(s\) sampled from a distribution \(\mathcal{S}\) whose support excludes a margin \(\delta\) around the nominal scale \(s = 1\):

\[ \operatorname{supp}(\mathcal{S}) \subseteq [s_{\min},\,1-\delta]\,\cup\,[1+\delta,\,s_{\max}]. \]

Concretely, the two losses are

\[ \mathcal{L}_{\text{nominal}}(\Delta W) := \mathcal{L}_{\text{scaled}}(\Delta W; 1), \] \[ \mathcal{L}_{\text{off}}(\Delta W) := \mathbb{E}_{s\sim\mathcal{S}}\,\big[\,w(s) \cdot \mathcal{L}_{\text{scaled}}(\Delta W; s)\,\big]. \]

Then, Trap² minimizes

\[ J(\Delta W) = \mathcal{L}_{\text{nominal}}(\Delta W) \;-\; \lambda \cdot \mathcal{L}_{\text{off}}(\Delta W), \]

where \(\lambda \geq 0\) trades off standalone utility against sensitivity to re-scaling, with \(\lambda = 0\) recovering standard fine-tuning. The first term preserves intended deployment performance at \(s = 1\), and the second increases loss under off-nominal re-scaling, which is precisely what makes the update unmergeable.

[Why this breaks merging] Many merging operators effectively re-scale each constituent update (e.g., uniform averaging of \(N\) adapters scales each by \(1/N\)), so a merged model is pushed away from the nominal scale \(s = 1\). Trap² is trained to be brittle exactly there, so that merging fails. The weighting \(w(s)\) normalizes the training signal across scales. Since gradients shrink for down-scaled updates (\(s < 1\)), we set \(w(s) = 1/s\) by default.

[Extension to full fine-tuning] So far \(\Delta W\) is a LoRA adapter, but the same objective applies to full-model fine-tuning. We define a step-dependent update \(\Delta W_t := W_t - W_0\). At each step \(t\) we evaluate the nominal loss at \(W_t\) and the off-nominal loss at the scaled model \(W_0 + s \cdot \Delta W_t\), then update \(W_t\). This gives one architecture-agnostic formulation that covers both adapter-only and full-checkpoint releases.

Experimental Results

Across architectures and release formats, Trap² preserves strong standalone utility while consistently degrading performance under widely used merging operators. It even holds up under stronger threat models that have access to task-relevant data: both data-dependent mergers and data-driven recovery attacks. We refer readers to the paper for further experiments, including other LoRA variants, LLMs, and additional analyses.

LoRA Adapters

We first consider adapter-only releases. Trap² preserves standalone accuracy on par with naïve fine-tuning, yet sharply degrades once the protected adapter is merged under diverse operators and spaces.

[Table 1] Per-task accuracy (%; ↑) on 8 vision benchmarks when each adapter is used standalone, across three CLIP backbones. Trap² matches naïve fine-tuning.

Backbone Method Cars DTD EuroSAT GTSRB MNIST RESISC Aircraft SVHN Average
ViT-B/32 Zero-Shot 59.444.145.532.348.060.319.031.442.5
Fine-Tuned 99.568.798.398.499.193.051.796.288.1
Trap² (Ours) 99.866.198.598.999.593.352.496.688.1
ViT-L/14 Zero-Shot 77.955.562.350.776.271.432.658.660.6
Fine-Tuned 99.876.798.898.599.695.672.697.892.4
Trap² (Ours) 99.978.898.599.699.496.280.197.293.7
ConvNeXt Zero-Shot 89.659.654.548.954.967.127.534.354.6
Fine-Tuned 98.576.498.999.399.396.159.297.190.6
Trap² (Ours) 97.676.198.698.999.594.164.396.290.7

[Table 2] Averaged per-task accuracy (%; ↓) under 8-way LoRA merging: a protected target adapter is merged with seven unprotected ones, across various methods (TA, TIES, TIES+DARE, TSV, CART) and spaces (Full, KnOTS, Core). The merging coefficient is tuned on validation, optimistic for the adversarial merger. Trap² drives merged accuracy far below the unprotected baseline.

Protection TA TIES TIES+DARE TSV CART
Full FullKnOTSCore FullKnOTSCore FullKnOTSCore FullKnOTSCore
Backbone: ViT-B/32   (Zero-shot: 42.5)
Unprotected 48.3 48.049.953.3 48.249.954.7 51.449.255.0 49.649.951.2
Trap² (Ours) 23.1 36.537.537.2 33.928.436.8 24.824.224.4 41.341.041.6
Backbone: ViT-L/14   (Zero-shot: 60.6)
Unprotected 62.9 67.968.968.8 68.069.968.8 72.466.774.6 64.465.066.2
Trap² (Ours) 33.6 53.247.754.5 50.643.354.8 42.037.439.4 58.658.459.5
Backbone: ConvNeXt   (Zero-shot: 54.6)
Unprotected 49.2 59.660.263.6 59.660.363.6 65.160.465.8 60.160.061.2
Trap² (Ours) 14.7 32.915.217.2 15.613.125.3 14.115.915.3 47.045.947.2

Full Fine-Tuning

Beyond adapters, Trap² extends to full fine-tuning. On CLIP ViT-B/32 it preserves standalone utility comparable to naïve fine-tuning, while still inducing strong post-merge degradation under full-model merging.

[Table 3] Standalone per-task accuracy (%; ↑) on 8 vision benchmarks for CLIP ViT-B/32 under full fine-tuning. Trap² preserves standalone performance comparable to naïve fine-tuning.

Backbone Method Cars DTD EuroSAT GTSRB MNIST RESISC Aircraft SVHN Average
ViT-B/32 Zero-Shot 59.444.145.532.348.060.319.031.442.5
Fine-Tuned 99.865.098.697.699.293.251.895.687.6
Trap² (Ours) 99.862.898.499.199.293.349.696.787.4

[Table 4] Averaged per-task accuracy (%; ↓) on 8 vision benchmarks for CLIP ViT-B/32 under full-model merging. Trap² remains effective under full fine-tuning, strongly degrading the merged model relative to the unprotected baseline.

Protection TA TIES TIES+DARE TSV CART
Backbone: ViT-B/32   (Zero-shot: 42.5)
Unprotected 50.051.051.762.461.0
Trap² (Ours) 28.538.037.938.242.1

Robustness against Data-aware Methods

[Data-dependent Merging] We test Trap² against two data-dependent mergers, RegMean and Chain-of-Merges (CoM), which use extra task-specific samples to compute aggregation weights. Trap² induces substantial post-merge degradation under both, so protection holds even when the merger leverages additional data signals. Notably, CoM often assigns comparable or even larger weight to the Trap²-protected adapter than to its unprotected counterpart, rather than down-weighting it to recover utility.

[Data-driven Recovery] As a stronger, white-box threat, the adversary uses a small task-relevant dataset to recover utility from the merged model. We evaluate ProDistill, which learns merging coefficients via teacher-student distillation, and post-merge supervised fine-tuning (SFT). Recovery from a Trap²-protected merge gains only a few points over the zero-shot reference, whereas the unprotected baseline gains far more, indicating that Trap² substantially impedes recovery even with in-domain samples.

[Table 5] Averaged accuracy (%; ↓) under stronger threat models on CLIP ViT-B/32. Trap² substantially degrades performance under both data-dependent merging (RegMean, CoM) and data-driven recovery (ProDistill, SFT), relative to the unprotected baseline.

Protection Data-dependent Merging Data-driven Recovery
RegMean CoM ProDistill SFT
Backbone: ViT-B/32   (Zero-shot: 42.5)
Unprotected 49.164.873.856.9
Trap² (Ours) 9.532.349.245.5

BibTeX

@inproceedings{jang2026making,
  title={Making Models Unmergeable via Scaling-Sensitive Loss Landscape},
  author={Minwoo Jang and Hoyoung Kim and Jabin Koo and Jungseul Ok},
  booktitle={Proceedings of the 43rd International Conference on Machine Learning},
  year={2026}
}