Document categorization and query generation on the World Wide Web using WebACE

Daniel Boley, Maria Gini, Robert Gross, Eui Hong Han, Kyle Hastings, George Karypis, Vipin Kumar, Bamshad Mobasher, Jerome Moore

Research output: Contribution to journalArticlepeer-review

111 Scopus citations

Abstract

We present WebACE, an agent for exploring and categorizing documents on the World Wide Web based on a user profile. The heart of the agent is an unsupervised categorization of a set of documents, combined with a process for generating new queries that is used to search for new related documents and for filtering the resulting documents to extract the ones most closely related to the starting set. The document categories are not given a priori. We present the overall architecture and describe two novel algorithms which provide significant improvement over Hierarchical Agglomeration Clustering and AutoClass algorithms and form the basis for the query generation and search component of the agent. We report on the results of our experiments comparing these new algorithms with more traditional clustering algorithms and we show that our algorithms are fast and scalable.

Original languageEnglish (US)
Pages (from-to)365-391
Number of pages27
JournalArtificial Intelligence Review
Volume13
Issue number5
DOIs
StatePublished - 1999

Fingerprint

Dive into the research topics of 'Document categorization and query generation on the World Wide Web using WebACE'. Together they form a unique fingerprint.

Cite this