MLoRQ: Bridging Low-Rank and Quantization for Transformer Compression


Ofir Gordon (Sony Semiconductor Israel), Ariel Lapid (Sony Semiconductor Israel), Elad Cohen (Sony Semiconductor Israel), Yarden Yagil (Sony Semiconductor Israel), Arnon Netzer (Sony Semiconductor Israel), Hai Victor Habi (Sony Semiconductor Israel)
The 35th British Machine Vision Conference

Abstract

Deploying transformer-based neural networks on resource-constrained edge devices presents a significant challenge. This is commonly addressed using techniques like low-rank approximation and mixed-precision quantization. In this work, we introduce Mixed Low-Rank and Quantization (MLoRQ), a novel method that integrates both techniques. MLoRQ employs a two-stage optimization process to determine optimal bit-width and rank assignments for each layer, adhering to predefined memory constraints. This process includes: (i) an intra-layer optimization that identifies potentially optimal compression solutions out of all low-rank and quantization combinations; (ii) an inter-layer optimization that assigns bit-width precision and rank to each layer while ensuring the memory constraint is met. An optional final step applies a sequential optimization process using a modified adaptive rounding technique to mitigate compression-induced errors in joint low-rank approximation and quantization. The method is compatible and can be seamlessly integrated with most existing quantization algorithms. MLoRQ shows state-of-the-art results with up to 15% performance improvement, evaluated on Vision Transformers for image classification, object detection and instance segmentation tasks.

Citation

@inproceedings{Gordon_2025_BMVC,
author    = {Ofir Gordon and Ariel Lapid and Elad Cohen and Yarden Yagil and Arnon Netzer and Hai Victor Habi},
title     = {MLoRQ: Bridging Low-Rank and Quantization for Transformer Compression},
booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025},
publisher = {BMVA},
year      = {2025},
url       = {https://bmva-archive.org.uk/bmvc/2025/assets/papers/Paper_668/paper.pdf}
}


Copyright © 2025 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection