We present WebACE, an agent for exploring and categorizing documents on the World Wide Web based on a user profile. The heart of the agent is an unsupervised categorization of a set of documents, combined with a process for generating new queries that is used to search for new related documents and for filtering the resulting documents to extract the ones most closely related to the starting set. The document categories are not given a priori. We present the overall architecture and describe two novel algorithms which provide significant improvement over Hierarchical Agglomeration Clustering and AutoClass algorithms and form the basis for the query generation and search component of the agent. We report on the results of our experiments comparing these new algorithms with more traditional clustering algorithms and we show that our algorithms are fast and scalable.
|Original language||English (US)|
|Number of pages||27|
|Journal||Artificial Intelligence Review|
|State||Published - Dec 1 1999|