AdaEmb-Encoder: Adaptive Embedding Spatial Encoder-Based Deduplication for Backing up Classifier Training Data

Yaobin Qin, David J. Lilja

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

The advent of the AI era has made it increasingly important to have an efficient backup system to protect training data from loss. Furthermore, a backup of the training data makes it possible to update or retrain the learned model as more data are collected. However, a huge backup overhead will result if a complete copy of all daily collected training data is always made to backup storage, especially because the data typically contain highly redundant information that makes no contribution to model learning. Deduplication is a common technique in modern backup systems to reduce data redundancy. However, existing deduplication methods are invalid for training data. Hence, this paper proposes a novel deduplication strategy for the training data used for learning in a deep neural network classifier. Experimental results showed that the proposed deduplication strategy achieved 93% backup storage space reduction with only 1.3% loss of classification accuracy.

Original languageEnglish (US)
Title of host publication2020 IEEE 39th International Performance Computing and Communications Conference, IPCCC 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728198293
DOIs
StatePublished - Nov 6 2020
Event39th IEEE International Performance Computing and Communications Conference, IPCCC 2020 - Austin, United States
Duration: Nov 6 2020Nov 8 2020

Publication series

Name2020 IEEE 39th International Performance Computing and Communications Conference, IPCCC 2020

Conference

Conference39th IEEE International Performance Computing and Communications Conference, IPCCC 2020
CountryUnited States
CityAustin
Period11/6/2011/8/20

Bibliographical note

Funding Information:
This work was supported in part by the Center for Research in Intelligent Storage (CRIS), which is supported by National Science Foundation grant no. IIP-1439622 and member companies. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF.

Publisher Copyright:
© 2020 IEEE.

Keywords

  • Backup systems
  • Deduplication
  • Deep learning
  • Training data

Fingerprint Dive into the research topics of 'AdaEmb-Encoder: Adaptive Embedding Spatial Encoder-Based Deduplication for Backing up Classifier Training Data'. Together they form a unique fingerprint.

Cite this