Towards a scalable kNN CF algorithm: Exploring effective applications of clustering

Al Mamunur Rashid, Shyong K. Lam, Adam LaPitz, George Karypis, John Riedl

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Scopus citations

Abstract

Collaborative Filtering (CF)-based recommender systems bring mutual benefits to both users and the operators of the sites with too much information. Users benefit as they are able to find items of interest from an unmanageable number of available items. On the other hand, e-commerce sites that employ recommender systems can increase sales revenue in at least two ways: a) by drawing customers' attention to items that they are likely to buy, and b) by cross-selling items. However, the sheer number of customers and items typical in e-commerce systems demand specially designed CF algorithms that can gracefully cope with the vast size of the data. Many algorithms proposed thus fax, where the principal concern is recommendation quality, may be too expensive to operate in a large-scale system. We propose CLUSTKNN, a simple and intuitive algorithm that is well suited for large data sets. The method first compresses data tremendously by building a straightforward but efficient clustering model. Recommendations are then generated quickly by using a simple NEAREST NEIGHBOR-based approach. We demonstrate the feasibility of CLUSTKNN both analytically and empirically. We also show, by comparing with a number of other popular CF algorithms that, apart from being highly scalable and intuitive, CLUSTKNN provides very good recommendation accuracy as well.

Original languageEnglish (US)
Title of host publicationAdvances in Web Mining and Web Usage Analysis - 8th International Workshop on Knowledge Discovery on the Web, WebKDD 2006, Revised Papers
Pages147-166
Number of pages20
StatePublished - Dec 1 2007
Event8th International Workshop on Knowledge Discovery on the Web, WebKDD 2006 - Philadelphia, PA, United States
Duration: Aug 20 2006Aug 20 2006

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4811 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other8th International Workshop on Knowledge Discovery on the Web, WebKDD 2006
CountryUnited States
CityPhiladelphia, PA
Period8/20/068/20/06

Fingerprint Dive into the research topics of 'Towards a scalable kNN CF algorithm: Exploring effective applications of clustering'. Together they form a unique fingerprint.

Cite this