TY - JOUR
T1 - Principal direction divisive partitioning
AU - Boley, Daniel
PY - 1998/1/1
Y1 - 1998/1/1
N2 - We propose a new algorithm capable of partitioning a set of documents or other samples based on an embedding in a high dimensional Euclidean space (i.e., in which every document is a vector of real numbers). The method is unusual in that it is divisive, as opposed to agglomerative, and operates by repeatedly splitting clusters into smaller clusters. The documents are assembled into a matrix which is very sparse. It is this sparsity that permits the algorithm to be very efficient. The performance of the method is illustrated with a set of text documents obtained from the World Wide Web. Some possible extensions are proposed for further investigation.
AB - We propose a new algorithm capable of partitioning a set of documents or other samples based on an embedding in a high dimensional Euclidean space (i.e., in which every document is a vector of real numbers). The method is unusual in that it is divisive, as opposed to agglomerative, and operates by repeatedly splitting clusters into smaller clusters. The documents are assembled into a matrix which is very sparse. It is this sparsity that permits the algorithm to be very efficient. The performance of the method is illustrated with a set of text documents obtained from the World Wide Web. Some possible extensions are proposed for further investigation.
UR - http://www.scopus.com/inward/record.url?scp=22644451496&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=22644451496&partnerID=8YFLogxK
U2 - 10.1023/A:1009740529316
DO - 10.1023/A:1009740529316
M3 - Article
AN - SCOPUS:22644451496
VL - 2
SP - 325
EP - 344
JO - Data Mining and Knowledge Discovery
JF - Data Mining and Knowledge Discovery
SN - 1384-5810
IS - 4
ER -