TY - GEN
T1 - Recovering information from summary data
AU - Faloutsos, Christos
AU - Jagadish, H. V.
AU - Sidiropoulos, N. D.
PY - 1997
Y1 - 1997
N2 - Data is often stored in summarized form, as a histogram of aggregates (COUNTs, SUMs, or AVeraGes) over specified ranges. We study how to estimate the original detail data from the stored summary. We formulate this task as an inverse problem, specifying a well-defined cost function that has to be optimized under constraints. We show that our formulation includes the unifor mity and independence assumptions as a spe cial case, and that it can achieve better recon struction results if we maximize the smooth ness as opposed to the uniformity. In our experiments on real and synthetic datasets, the proposed method almost consistently out performs its competitor, improving the root-mean-square error by up to 20 per cent for stock price data, and up to 90 per cent for smoother data sets. Finally, we show how to apply this theory to a variety of database problems that involve partial information, such as OLAP, data ware housing and histograms in query optimization.
AB - Data is often stored in summarized form, as a histogram of aggregates (COUNTs, SUMs, or AVeraGes) over specified ranges. We study how to estimate the original detail data from the stored summary. We formulate this task as an inverse problem, specifying a well-defined cost function that has to be optimized under constraints. We show that our formulation includes the unifor mity and independence assumptions as a spe cial case, and that it can achieve better recon struction results if we maximize the smooth ness as opposed to the uniformity. In our experiments on real and synthetic datasets, the proposed method almost consistently out performs its competitor, improving the root-mean-square error by up to 20 per cent for stock price data, and up to 90 per cent for smoother data sets. Finally, we show how to apply this theory to a variety of database problems that involve partial information, such as OLAP, data ware housing and histograms in query optimization.
UR - https://www.scopus.com/pages/publications/84994073311
UR - https://www.scopus.com/pages/publications/84994073311#tab=citedBy
M3 - Conference contribution
AN - SCOPUS:84994073311
T3 - Proceedings of the 23rd International Conference on Very Large Databases, VLDB 1997
SP - 36
EP - 45
BT - Proceedings of the 23rd International Conference on Very Large Databases, VLDB 1997
A2 - Lochovsky, Fred
A2 - Carey, Michael J.
A2 - Jarke, Matthias
A2 - Dittrich, Klaus R.
A2 - Loucopoulos, Pericles
A2 - Jeusfeld, Manfred A.
PB - Morgan-Kaufmann
T2 - 23rd International Conference on Very Large Databases, VLDB 1997
Y2 - 26 August 1997 through 29 August 1997
ER -