Huge amounts of various web items (e.g., images, keywords, and web pages) are being made available on the Web. The popularity of such web items continuously changes over time, and mining for temporal patterns in the popularity of web items is an important problem that is useful for several Web applications; for example, the temporal patterns in the popularity of web search keywords help web search enterprises predict future popular keywords, thus enabling them to make price decisions when marketing search keywords to advertisers. However, the presence of millions of web items makes it difficult to scale up previous techniques for this problem. This paper proposes an efficient method for mining temporal patterns in the popularity of web items. We treat the popularity of web items as time-series and propose a novel measure, a gap measure, to quantify the dissimilarity between the popularity of two web items. To reduce the computational overhead for this measure, an efficient method using the Discrete Fourier Transform (DFT) is presented. We assume that the popularity of web items is not necessarily periodic. For finding clusters of web items with similar popularity trends, we show the limitations of traditional clustering approaches and propose a scalable, efficient, density-based clustering algorithm using the gap measure. Our experiments using the popularity trends of web search keywords obtained from the Google Trends web site illustrate the scalability and usefulness of the proposed approach in real-world applications.
|Original language||English (US)|
|Number of pages||19|
|State||Published - Nov 15 2011|
Bibliographical noteFunding Information:
This work was supported by the Korea Research Foundation (KRF) Grant funded by the Korean Government (Ministry of Education & Human Resources Development, MOEHRD) ( KRF-2006-214-D00130 ). This work was also partially supported by the Korea Research Foundation Grant funded by the Korean Government (MOEHRD, Basic Research Promotion Fund) (KRF-2008-331-D00487). The authors would like to thank Aditya Grandhi for his help on the initial implementation of the approach for experimental work.
- Density-based clustering
- Gap measure
- Popularity trends
- Temporal patterns
- Web items