AQ2PNN: Enabling Two-party Privacy-Preserving Deep Neural Network Inference with Adaptive Quantization

Yukui Luo, Nuo Xu, Hongwu Peng, Chenghong Wang, Shijin Duan, Kaleel Mahmood, Wujie Wen, Caiwen Ding, Xiaolin Xu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The growing prevalence of Machine Learning as a Service (MLaaS) enables a wide range of applications but simultaneously raises numerous security and privacy concerns. A key issue involves the potential privacy exposure of involved parties, such as the customer's input data and the vendor's model. Consequently, two-party computing (2PC) has emerged as a promising solution to safeguard the privacy of different parties during deep neural network (DNN) inference. However, the state-of-the-art (SOTA) 2PC-DNN techniques are tailored explicitly to traditional instruction set architecture (ISA) systems like CPUs and CPU+GPU. This reliance on ISA systems significantly constrains their energy efficiency, as these architectures typically employ 32- or 64-bit instruction sets. In contrast, the possibilities of harnessing dynamic and adaptive quantization to build high-performance 2PC-DNNs remain largely unexplored due to the lack of compatible algorithms and hardware accelerators. To mitigate the bottleneck of SOTA solutions and fill the existing research gaps, this work investigates the construction of 2PC-DNNs on field programmable gate arrays (FPGAs). We introduce AQ2PNN, an end-to-end framework that effectively employs adaptive quantization schemes to develop high-performance 2PC-DNNs on FPGAs. From an algorithmic perspective, AQ2PNN introduces an innovative 2PC-ReLU method to replace Yao's Garbled Circuits (GC). Regarding hardware, AQ2PNN employs an extensive set of building blocks for linear operators, non-linear operators, and a specialized Oblivious Transfer (OT) module for secure data exchange, respectively. These algorithm-hardware co-designed modules extremely utilize the fine-grained reconfigurability of FPGAs, to adapt the data bit-width of different DNN layers in the ciphertext domain, thereby reducing communication overhead between parties without compromising DNN performance, such as accuracy. We thoroughly assess AQ2PNN using widely adopted DNN architectures, including ResNet18, ResNet50, and VGG16, all trained on ImageNet and producing quantized models. Experimental results demonstrate that AQ2PNN outperforms SOTA solutions, achieving significantly reduced communication overhead by , improved energy efficiency by 26.3 ×, and comparable or even superior throughput and accuracy.

Original languageEnglish (US)
Title of host publicationProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023
PublisherAssociation for Computing Machinery, Inc
Pages628-640
Number of pages13
ISBN (Electronic)9798400703294
DOIs
StatePublished - Oct 28 2023
Externally publishedYes
Event56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023 - Toronto, Canada
Duration: Oct 28 2023Nov 1 2023

Publication series

NameProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023

Conference

Conference56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023
Country/TerritoryCanada
CityToronto
Period10/28/2311/1/23

Bibliographical note

Publisher Copyright:
© 2023 ACM.

Keywords

  • Deep learning
  • FPGA
  • Privacy-Preserving machine learning
  • Quantization
  • Two-party computing

Fingerprint

Dive into the research topics of 'AQ2PNN: Enabling Two-party Privacy-Preserving Deep Neural Network Inference with Adaptive Quantization'. Together they form a unique fingerprint.

Cite this