Summarization - Compressing data into an informative representation

Varun Chandola, Vipin Kumar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Scopus citations

Abstract

In this paper, we formulate the problem of summarization of a dataset of transactions with categorical attributes as an optimization problem involving two objective functions - compaction gain and information loss. We propose metrics to characterize the output of any summarization algorithm. We investigate two approaches to address this problem. The first approach is an adaptation of clustering and the second approach makes use of frequent itemsets from the association analysis domain. We illustrate one application of summarization in the field of network data where we show how our technique can be effectively used to summarize network traffic into a compact but meaningful representation. Specifically, we evaluate our proposed algorithms on the 1998 DARPA Off-line Intrusion Detection Evaluation data and network data generated by SKAION Corp for the ARDA information assurance program.

Original languageEnglish (US)
Title of host publicationProceedings - Fifth IEEE International Conference on Data Mining, ICDM 2005
Pages98-105
Number of pages8
DOIs
StatePublished - 2005
Event5th IEEE International Conference on Data Mining, ICDM 2005 - Houston, TX, United States
Duration: Nov 27 2005Nov 30 2005

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Other

Other5th IEEE International Conference on Data Mining, ICDM 2005
CountryUnited States
CityHouston, TX
Period11/27/0511/30/05

Fingerprint Dive into the research topics of 'Summarization - Compressing data into an informative representation'. Together they form a unique fingerprint.

  • Cite this

    Chandola, V., & Kumar, V. (2005). Summarization - Compressing data into an informative representation. In Proceedings - Fifth IEEE International Conference on Data Mining, ICDM 2005 (pp. 98-105). [1565667] (Proceedings - IEEE International Conference on Data Mining, ICDM). https://doi.org/10.1109/ICDM.2005.137