TY - GEN
T1 - Topic modeling for segment-based documents
AU - Ponti, Giovanni
AU - Tagarelli, Andrea
AU - Karypis, George
PY - 2012
Y1 - 2012
N2 - Statistical topic models have traditionally assumed that a document is an indivisible unit for the generative process, which may not be appropriate to handle documents that are relatively long and show an explicit multi-topic structure. In this paper we describe a generative model that exploits a given decomposition of documents in smaller, topically cohesive text units, or segments. The key-idea is to introduce a new variable in the generative process to model the document segments in order to relate the word generation not only to the topics but also to the segments. Moreover, the topic latent variable is directly associated to the segments, rather than to the document as a whole. Experimental results have shown the significance of the proposed model and its better support for the document clustering task compared to other existing generative models. Copyright (c) 2012 - Edizioni Libreria Progetto and the authors.
AB - Statistical topic models have traditionally assumed that a document is an indivisible unit for the generative process, which may not be appropriate to handle documents that are relatively long and show an explicit multi-topic structure. In this paper we describe a generative model that exploits a given decomposition of documents in smaller, topically cohesive text units, or segments. The key-idea is to introduce a new variable in the generative process to model the document segments in order to relate the word generation not only to the topics but also to the segments. Moreover, the topic latent variable is directly associated to the segments, rather than to the document as a whole. Experimental results have shown the significance of the proposed model and its better support for the document clustering task compared to other existing generative models. Copyright (c) 2012 - Edizioni Libreria Progetto and the authors.
UR - http://www.scopus.com/inward/record.url?scp=84873583755&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84873583755&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84873583755
SN - 9788896477236
T3 - Proceedings of the 20th Italian Symposium on Advanced Database Systems, SEBD 2012
SP - 205
EP - 212
BT - Proceedings of the 20th Italian Symposium on Advanced Database Systems, SEBD 2012
T2 - 20th Italian Symposium on Advanced Database Systems, SEBD 2012
Y2 - 24 June 2012 through 27 June 2012
ER -