In recent years, matrix approximation for missing value prediction has emerged as an important problem in a variety of domains such as recommendation systems, e-commerce and online advertisement. While matrix factorization based algorithms typically have good approximation accuracy, such algorithms can be slow especially for large matrices. Further, such algorithms cannot naturally make prediction on new rows or columns. In this paper, we propose residual Bayesian co-clustering (RBC), which learns a generative model corresponding to the matrix from the non-missing entries, and uses the model to predict the missing entries. RBC is an extension of Bayesian co-clustering by taking row and column bias into consideration. The model allows mixed memberships of rows and columns to multiple clusters, and can naturally handle the prediction on new rows and columns which are not used in the training process, given only a few non-missing entries in them. We propose two variational inference based algorithms for learning the model and predicting missing entries. One of the proposed algorithms leads to a parallel RBC which can achieve significant speed-ups. The efficacy of RBC is demonstrated by extensive experimental comparisons with state-of-the-art algorithms on real world datasets.
|Original language||English (US)|
|Number of pages||12|
|State||Published - Dec 1 2010|
|Event||10th SIAM International Conference on Data Mining, SDM 2010 - Columbus, OH, United States|
Duration: Apr 29 2010 → May 1 2010
|Other||10th SIAM International Conference on Data Mining, SDM 2010|
|Period||4/29/10 → 5/1/10|