Making Models Unmergeable via Scaling-Sensitive Loss Landscape

Jang, Minwoo; Kim, Hoyoung; Koo, Jabin; Ok, Jungseul

Making Models Unmergeable
via Scaling-Sensitive Loss Landscape

Minwoo Jang¹, Hoyoung Kim², Jabin Koo³, Jungseul Ok^{1,3 †}

{¹Graduate School of AI, ³Department of CSE}, POSTECH · ²National AI Research Lab

^†Corresponding Author

ICML 2026 · Seoul, Republic of Korea

arXiv OpenReview Poster Code

Unmergeability protection in model sharing and limitations of prior work

[Figure 1] Most prior defenses are post-hoc: they apply function-preserving transformations to the full model weights after fine-tuning, which ties them to Transformer symmetries and full-weight access. Trap² is a training-time defense: it injects unmergeability into the update during fine-tuning by making it scaling-sensitive, preserving utility at the authorized scale (\(s = 1\)) while degrading under the off-nominal re-scaling (\(s \neq 1\)) that merging induces.

Abstract

The rise of model hubs has made it easier to access reusable model components, making model merging a practical tool for combining capabilities. Yet, this modularity also creates a governance gap: downstream users can recompose released weights into unauthorized mixtures that bypass safety alignment or licensing terms. Because existing defenses are largely post-hoc and architecture-specific, they provide inconsistent protection across diverse architectures and release formats in practice. To close this gap, we propose Trap² (Training-time Protection via Task-Robust Adversarial Perturbation), an architecture-agnostic protection framework that encodes protection into updates during fine-tuning, regardless of whether they are released as adapters or full models. Instead of relying on architecture-dependent approaches, Trap² uses weight re-scaling as a simple proxy for the merging process. It keeps released weights effective in standalone use, but degrades them under re-scaling that often arises in merging, undermining unauthorized recomposition.

Key Idea: Let's Shape the Loss Landscape Over Scaling

Common merging operators (such as Task Arithmetic, TIES-Merging, and DARE) aggregate released updates as \(W_0 + \sum_i s_i \cdot \Delta W_i\), where \(W_0\) is the frozen base model, each \(\Delta W_i\) is a released update (a LoRA adapter or a full-parameter difference), and \(s_i\) is its merging coefficient. Trap² exploits this by treating a single scaling factor \(s\) on the update as a lightweight proxy for merging: it solves an adversarial objective over update re-scaling, preserving utility at the authorized scale (\(s = 1\)) while inducing degradation under off-nominal scaling (\(s \neq 1\)), precisely the regime that merging pipelines introduce. The result is an update that is sharp and effective standalone, yet brittle the moment it is re-weighted and blended with third-party updates.

Loss shaping over the scaling factor s: high utility at s=1, degradation at s!=1

[Figure 2] Trap² retains high utility at the authorized scale (\(s = 1\)) but collapses under off-nominal scaling (\(s \neq 1\)), unlike an unprotected adapter of comparable standalone accuracy.

The Trap² Objective

We parameterize the released update as a LoRA adapter \(\Delta W = BA\) on a frozen base \(W_0\), with low-rank factors \(B \in \mathbb{R}^{d_{\text{out}} \times r}\), \(A \in \mathbb{R}^{r \times d_{\text{in}}}\), and \(r \ll \min\{d_{\text{out}}, d_{\text{in}}\}\). Let \(\mathcal{L}_{\text{scaled}}(\Delta W; s)\) be the expected task loss when the update is deployed at scale \(s\), i.e., at weights \(W_0 + s \cdot \Delta W\). The nominal loss is the loss at the authorized scale, while the off-nominal loss averages the loss over off-nominal scales \(s\) sampled from a distribution \(\mathcal{S}\) whose support excludes a margin \(\delta\) around the nominal scale \(s = 1\):

\[ \operatorname{supp}(\mathcal{S}) \subseteq [s_{\min},\,1-\delta]\,\cup\,[1+\delta,\,s_{\max}]. \]

Concretely, the two losses are

\[ \mathcal{L}_{\text{nominal}}(\Delta W) := \mathcal{L}_{\text{scaled}}(\Delta W; 1), \] \[ \mathcal{L}_{\text{off}}(\Delta W) := \mathbb{E}_{s\sim\mathcal{S}}\,\big[\,w(s) \cdot \mathcal{L}_{\text{scaled}}(\Delta W; s)\,\big]. \]

Then, Trap² minimizes

\[ J(\Delta W) = \mathcal{L}_{\text{nominal}}(\Delta W) \;-\; \lambda \cdot \mathcal{L}_{\text{off}}(\Delta W), \]

where \(\lambda \geq 0\) trades off standalone utility against sensitivity to re-scaling, with \(\lambda = 0\) recovering standard fine-tuning. The first term preserves intended deployment performance at \(s = 1\), and the second increases loss under off-nominal re-scaling, which is precisely what makes the update unmergeable.

[Why this breaks merging] Many merging operators effectively re-scale each constituent update (e.g., uniform averaging of \(N\) adapters scales each by \(1/N\)), so a merged model is pushed away from the nominal scale \(s = 1\). Trap² is trained to be brittle exactly there, so that merging fails. The weighting \(w(s)\) normalizes the training signal across scales. Since gradients shrink for down-scaled updates (\(s < 1\)), we set \(w(s) = 1/s\) by default.

[Extension to full fine-tuning] So far \(\Delta W\) is a LoRA adapter, but the same objective applies to full-model fine-tuning. We define a step-dependent update \(\Delta W_t := W_t - W_0\). At each step \(t\) we evaluate the nominal loss at \(W_t\) and the off-nominal loss at the scaled model \(W_0 + s \cdot \Delta W_t\), then update \(W_t\). This gives one architecture-agnostic formulation that covers both adapter-only and full-checkpoint releases.

Experimental Results

Across architectures and release formats, Trap² preserves strong standalone utility while consistently degrading performance under widely used merging operators. It even holds up under stronger threat models that have access to task-relevant data: both data-dependent mergers and data-driven recovery attacks. We refer readers to the paper for further experiments, including other LoRA variants, LLMs, and additional analyses.

LoRA Adapters

We first consider adapter-only releases. Trap² preserves standalone accuracy on par with naïve fine-tuning, yet sharply degrades once the protected adapter is merged under diverse operators and spaces.

[Table 1] Per-task accuracy (%; ↑) on 8 vision benchmarks when each adapter is used standalone, across three CLIP backbones. Trap² matches naïve fine-tuning.

Backbone	Method	Cars	DTD	EuroSAT	GTSRB	MNIST	RESISC	Aircraft	SVHN	Average
ViT-B/32	Zero-Shot	59.4	44.1	45.5	32.3	48.0	60.3	19.0	31.4	42.5
	Fine-Tuned	99.5	68.7	98.3	98.4	99.1	93.0	51.7	96.2	88.1
	Trap² (Ours)	99.8	66.1	98.5	98.9	99.5	93.3	52.4	96.6	88.1
ViT-L/14	Zero-Shot	77.9	55.5	62.3	50.7	76.2	71.4	32.6	58.6	60.6
	Fine-Tuned	99.8	76.7	98.8	98.5	99.6	95.6	72.6	97.8	92.4
	Trap² (Ours)	99.9	78.8	98.5	99.6	99.4	96.2	80.1	97.2	93.7
ConvNeXt	Zero-Shot	89.6	59.6	54.5	48.9	54.9	67.1	27.5	34.3	54.6
	Fine-Tuned	98.5	76.4	98.9	99.3	99.3	96.1	59.2	97.1	90.6
	Trap² (Ours)	97.6	76.1	98.6	98.9	99.5	94.1	64.3	96.2	90.7

[Table 2] Averaged per-task accuracy (%; ↓) under 8-way LoRA merging: a protected target adapter is merged with seven unprotected ones, across various methods (TA, TIES, TIES+DARE, TSV, CART) and spaces (Full, KnOTS, Core). The merging coefficient is tuned on validation, optimistic for the adversarial merger. Trap² drives merged accuracy far below the unprotected baseline.

Protection	TA	TIES			TIES+DARE			TSV			CART
Protection	Full	Full	KnOTS	Core	Full	KnOTS	Core	Full	KnOTS	Core	Full	KnOTS	Core
Backbone: ViT-B/32 (Zero-shot: 42.5)
Unprotected	48.3	48.0	49.9	53.3	48.2	49.9	54.7	51.4	49.2	55.0	49.6	49.9	51.2
Trap² (Ours)	23.1	36.5	37.5	37.2	33.9	28.4	36.8	24.8	24.2	24.4	41.3	41.0	41.6
Backbone: ViT-L/14 (Zero-shot: 60.6)
Unprotected	62.9	67.9	68.9	68.8	68.0	69.9	68.8	72.4	66.7	74.6	64.4	65.0	66.2
Trap² (Ours)	33.6	53.2	47.7	54.5	50.6	43.3	54.8	42.0	37.4	39.4	58.6	58.4	59.5
Backbone: ConvNeXt (Zero-shot: 54.6)
Unprotected	49.2	59.6	60.2	63.6	59.6	60.3	63.6	65.1	60.4	65.8	60.1	60.0	61.2
Trap² (Ours)	14.7	32.9	15.2	17.2	15.6	13.1	25.3	14.1	15.9	15.3	47.0	45.9	47.2

Full Fine-Tuning

Beyond adapters, Trap² extends to full fine-tuning. On CLIP ViT-B/32 it preserves standalone utility comparable to naïve fine-tuning, while still inducing strong post-merge degradation under full-model merging.

[Table 3] Standalone per-task accuracy (%; ↑) on 8 vision benchmarks for CLIP ViT-B/32 under full fine-tuning. Trap² preserves standalone performance comparable to naïve fine-tuning.

Backbone	Method	Cars	DTD	EuroSAT	GTSRB	MNIST	RESISC	Aircraft	SVHN	Average
ViT-B/32	Zero-Shot	59.4	44.1	45.5	32.3	48.0	60.3	19.0	31.4	42.5
	Fine-Tuned	99.8	65.0	98.6	97.6	99.2	93.2	51.8	95.6	87.6
	Trap² (Ours)	99.8	62.8	98.4	99.1	99.2	93.3	49.6	96.7	87.4

[Table 4] Averaged per-task accuracy (%; ↓) on 8 vision benchmarks for CLIP ViT-B/32 under full-model merging. Trap² remains effective under full fine-tuning, strongly degrading the merged model relative to the unprotected baseline.

Protection	TA	TIES	TIES+DARE	TSV	CART
Backbone: ViT-B/32 (Zero-shot: 42.5)
Unprotected	50.0	51.0	51.7	62.4	61.0
Trap² (Ours)	28.5	38.0	37.9	38.2	42.1

Robustness against Data-aware Methods

[Data-dependent Merging] We test Trap² against two data-dependent mergers, RegMean and Chain-of-Merges (CoM), which use extra task-specific samples to compute aggregation weights. Trap² induces substantial post-merge degradation under both, so protection holds even when the merger leverages additional data signals. Notably, CoM often assigns comparable or even larger weight to the Trap²-protected adapter than to its unprotected counterpart, rather than down-weighting it to recover utility.

[Data-driven Recovery] As a stronger, white-box threat, the adversary uses a small task-relevant dataset to recover utility from the merged model. We evaluate ProDistill, which learns merging coefficients via teacher-student distillation, and post-merge supervised fine-tuning (SFT). Recovery from a Trap²-protected merge gains only a few points over the zero-shot reference, whereas the unprotected baseline gains far more, indicating that Trap² substantially impedes recovery even with in-domain samples.

[Table 5] Averaged accuracy (%; ↓) under stronger threat models on CLIP ViT-B/32. Trap² substantially degrades performance under both data-dependent merging (RegMean, CoM) and data-driven recovery (ProDistill, SFT), relative to the unprotected baseline.

Protection	Data-dependent Merging		Data-driven Recovery
Protection	RegMean	CoM	ProDistill	SFT
Backbone: ViT-B/32 (Zero-shot: 42.5)
Unprotected	49.1	64.8	73.8	56.9
Trap² (Ours)	9.5	32.3	49.2	45.5

BibTeX

@inproceedings{jang2026making,
  title={Making Models Unmergeable via Scaling-Sensitive Loss Landscape},
  author={Minwoo Jang and Hoyoung Kim and Jabin Koo and Jungseul Ok},
  booktitle={Forty-third International Conference on Machine Learning},
  year={2026}
}

More Works

Federated Variational Preference Alignment with Gumbel-Softmax Prior for Personalized User Preferences

ChimeraLoRA: Multi-Head LoRA-Guided Synthetic Datasets

Towards Robust and Efficient Federated Low-Rank Adaptation with Heterogeneous Clients

Making Models Unmergeablevia Scaling-Sensitive Loss Landscape